网络爬虫与信息提取(1.Requests库入门)

mac2024-04-04  36

实例1:京东商品页面的爬取

import requests kv={'User-Agent': "Mozilla/5.0"} url="https://www.amazon.cn/gp/product/B01M8L5Z3Y" r=requests.get(url,headers=kv) r.status_code r.encoding r.text[:1000] import requests kv={'User-Agent': "Mozilla/5.0"} url="https://www.amazon.cn/gp/product/B01M8L5Z3Y" try: r=requests.get(url,headers=kv) r.raise_for_status() r.encoding=r.apparent_encoding print(r.text[:1000]) except: print("爬取失败")

实例2:亚马逊商品页面的爬取

import requests kv={'User-Agent': "Mozilla/5.0"} url="https://www.amazon.cn/gp/product/B01M8L5Z3Y" r= requests.get(url,headers=kv) r.status_code r.text[:1000] import requests url="https://www.amazon.cn/gp/product/B01M8L5Z3Y" try: kv={"user-agent":"Mozilla/5.0"} r=requests.get(url,headers=kv) r.raise_for_status() r.encoding=r.apparent_encoding print(r.text[1000:2000]) except: print("爬取失败")

实例3:百度/360搜索关键字提交

import requests kv={"wd":"Python"} r=requests.get("http://www.baidu.com/s",params=kv) r.status_code r.request.url len(r.text) import requests url="http://www.baidu.com/s" keyword="python" try: kv={"wd":keyword} r=requests.get(url,params=kv) print(r.request.url) r.raise_for_status() print(len(r.text)) except: print("爬取失败") import requests url="http://www.so.com/s" keyword="python" try: kv={"q":keyword} r=requests.get(url,params=kv) print(r.request.url) r.raise_for_status() print(len(r.text)) except: print("爬取失败")

实例4:网络图片的爬取

import requests import os url="http://image.nationalgeographic.com.cn/2017/0211/20170211061910157.jpg" root="E://pics//" path=root+url.split("/")[-1] try: if not os.path.exists(root): os.mkdir(root) if not os.path.exists(path): r=requests.get(url) with open(path,"wb") as f: f.write(r.content) f.close() print("文件保存成功") else: print("文件已存在") except: print("爬取失败")

实例5:IP地址归属地的自动查询

import requests url="http://m.ip138.com/ip.asp?ip=" try: r=requests.get(url+"202.204.80.112") r.raise_for_status() r.encoding=r.apparent_encoding print(r.text[-500:]) except: print("爬取失败")
最新回复(0)