下载安装包以及安装步骤 参考上一篇
配置环境变量
3. 查看配置信息 4.安装工具 首先更新pip命令,因为之前版本可能会比较低。 python -m pip install --upgrade pip
相关问题的解决措施 5.爬虫安装的相关工具
工具名称命令功能requestspip install requestsPython HTTP请求工具lxmlpip install lxml解析网页结构工具pyquerypip install pyquery网页文档解析工具pylintpip install pylintPython 代码分析工具selenium’pip3 install seleniumWeb的自动化测试工具python的第三方库。ctrl+F查询
将源码写入文件
import urllib.request import urllib.parse def spider(url,begin_page,end_page): for page in range(begin_page,end_page+1): pn = (page-1)*50 file_name = "第"+str(page)+"页" full_url = url+"&pn="+str(pn) html = load_page(full_url,file_name) write_page(html,file_name) def load_page(url,filename): headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.204 Safari/537.36"} request = urllib.request.Request(url,headers=headers) return urllib.request.urlopen(request).read() def write_page(html,filename): print("正在保存。。。") with open(filename,'w',encoding='utf-8') as file: file.write(html.decode('utf-8')) if __name__ =="__main__": kw = input("请输入需要爬取的名称") begin_page = int(input("请输入起始页")) end_page = int(input("请输入结束页")) url = 'http://tieba.baidu.com/s?' key = urllib.parse.urlencode({"kw":kw}) url = url + key; spider(url,begin_page,end_page)安装框架:pip install scrapy
在自定义目录下,新建一个Scrapy项目 scrapy startproject mySpider