# DataCrawler

**Repository Path**: jia20220830/DataCrawler

## Basic Information

- **Project Name**: DataCrawler
- **Description**: 协程 异步 数据爬虫
- **Primary Language**: Python
- **License**: AFL-3.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-05-07
- **Last Updated**: 2023-06-26

## Categories & Tags

**Categories**: Uncategorized

**Tags**: 数据爬虫, 协程异步, Python

## README

## 项目介绍
- 通过协程异步的方式并发请求实现高效数据爬虫
- 使用selenium捕捉html页面无法解析的数据
---
## 爬虫架构
![crawler_frame.png](crawler_frame.png)
---
## 项目环境
- win10
- python3.9
- asyncio~=3.4.3
- aiohttp~=3.8.4
- selenium~=4.8.2
- requests~=2.28.2
- redis~=4.5.1
- beautifulsoup4~=4.12.2
---
## 目录介绍
![dir.png](Config%2Fdir.png)

---
## 环境安装
> pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/

## 第三方库介绍
- asyncio 协程库
- aiohttp 异步请求
- xlwings 操作excel库
- pymysql python 和 mysql的连接器
- cryptography 连接mysql时需要用它加密
- beautifulsoup4 解析页面
---
## 效果预览
![data_crawler.png](Config%2Fdata_crawler.png)