Python之爬虫基础学习

mac2025-12-22 7

爬虫基础学习1

1.requests请求

requests是一个基础请求，它模拟浏览器向服务器发起请求，并得到服务器的响应

import requests url = 'http://xxxxxxxxxxxxxx' response = reuqests.get(url) print(type(response)) 返回<class 'requests.models.Response'> resquests.get(url)返回一个response类型对象，属于requests.models.Response类。

基本用法：

print(response.text)以文本形式打印网页源码，也就是转为字符串数据 print(response.status_code) #打印请求状态码，正常请求成功则返回200 print(response.url) #打印请求网页 print(response.headers) #打印请求头信息 print(response.cookies) #打印cookies信息 print(response.content) #以字节流形式打印也就是二进制数据 print(type(response)) #返回结果是一个response对象，属于response.models.Response类 response.encoding='utf-8' #打印response对象的编码

2. robots协议全称‘网络爬虫排除标准’，它告诉爬虫，哪些网页可以意抓取与不可以抓取使用方法：网页 url+/robots.txt，然后连接

/robots.txt最常出现的英文：Allow / Disallow Allow 表示可以被访问抓取 Disallow 表示禁止被访问抓取

3. 扩展 ----其他请求方式：

resquests.get('http://httpbin.org/get') resquests.post('http://httpbin.org/post') resquests.put('http://httpbin.org/put') resquests.delete('http://httpbi.org/delete') resquests.head('http://httpbin.org/get') resquests.options('http://httpbin.org/get')

暂时写到这里，最后自言自语一句：本人第一次写博客，因为刚学完爬虫，知识储备和结构都不完整，所以这份博客可能有错误地方，望路过的大神指正，也请多多支持，我会持续更新

最新回复(0)