python3爬虫超简单实例

mac2022-06-30 108

网站入口：http://wise.xmu.edu.cn/people/faculty 爬取信息：姓名和主页地址 python版本：3.5

import requests r = requests.get('http://www.wise.xmu.edu.cn/people/faculty') html = r.content from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'xml') div_people_list = soup.find('div', attrs={'class': 'people_list'}) a_s = div_people_list.find_all('a', attrs={'target': '_blank'}) for a in a_s: url = a['href'] name = a.get_text() print(name, url)

输出：

敖萌幪 /people/faculty/494d4f1c-0470-4f53-8b7c-d3594241876b.html Bowers, Roslyn /people/faculty/d01fe119-7980-4238-a3ec-abb9b66ec706.html Brown, Katherine /people/faculty/36c6b263-2cc2-4682-9975-02b75e6505f7.html 鲍小佳 /people/faculty/bdc3fd77-84de-4020-846d-344e02f110e9.html Chang, Seong Yeon /people/faculty/0534965d-6393-4e22-a6bb-6ac3b11fe431.html 蔡熙乾 /people/faculty/95d97944-beb6-4a47-af85-a0778e1788b2.html

原文地址：https://zhuanlan.zhihu.com/p/21377121

转载于:https://www.cnblogs.com/fanren224/p/8457235.html

相关资源：图的深度优先遍历和广度优先遍历算法

最新回复(0)