# ZhihuSpider **Repository Path**: lewisliang82/ZhihuSpider ## Basic Information - **Project Name**: ZhihuSpider - **Description**: 多线程知乎用户爬虫,基于python3 - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 195 - **Created**: 2016-12-12 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README 在我的博客里有代码的详细解读:[我用python爬了知乎一百万用户的数据](http://www.jwlchina.cn/2016/11/04/%E6%88%91%E7%94%A8python%E7%88%AC%E4%BA%86%E7%9F%A5%E4%B9%8E%E4%B8%80%E7%99%BE%E4%B8%87%E7%94%A8%E6%88%B7%E7%9A%84%E6%95%B0%E6%8D%AE/) # 这是一个多线程抓取知乎用户的程序 # Requirements 需要用到的包: `beautifulsoup4` `html5lib` `image` `requests` `redis` `PyMySQL` pip安装所有依赖包: ``` bash pip install \ Image \ requests \ beautifulsoup4 \ html5lib \ redis \ PyMySQL ``` 运行环境需要支持中文 测试运行环境python3.5,不保证其他运行环境能完美运行 **需要安装mysql和redis** **配置`config.ini`文件,设置好mysql和redis,并且填写你的知乎帐号** **向数据库导入`init.sql`** # Run 开始抓取数据:`python get_user.py` 查看抓取数量:`python check_redis.py` # 效果 ![效果图1](http://www.jwlchina.cn/uploads/%E7%9F%A5%E4%B9%8E%E7%94%A8%E6%88%B7%E7%88%AC%E8%99%AB4.png) ![效果图2](http://www.jwlchina.cn/uploads/%E7%9F%A5%E4%B9%8E%E7%94%A8%E6%88%B7%E7%88%AC%E8%99%AB5.png) # Docker 嫌麻烦的可以参考一下我用docker简单的搭建一个基础环境: mysql和redis都是官方镜像 ```bash docker run --name mysql -itd mysql:latest docker run --name redis -itd mysql:latest ``` 再利用docker-compose运行python镜像,我的python的docker-compose.yml: ``` bash python: container_name: python build: . ports: - "84:80" external_links: - memcache:memcache - mysql:mysql - redis:redis volumes: - /docker_containers/python/www:/var/www/html tty: true stdin_open: true extra_hosts: - "python:192.168.102.140" environment: PYTHONIOENCODING: utf-8 ``` 我的Dockerfile: ``` bash From kong36088/zhihu-spider:latest ```