爬虫(二)微博热搜榜

mac2024-10-12  45

爬虫(二)微博热搜榜

目标:https://s.weibo.com/top/summary?Refer=top_hot&topnav=1&wvr=6

元素:排名、热搜语

代码

import requests import time from lxml import etree headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36' } url = 'https://s.weibo.com/top/summary?Refer=top_hot&topnav=1&wvr=6' text = requests.get(url=url, headers=headers,).text selector = etree.HTML(text) for i in range(2,52): title = selector.xpath('//tr[@class=""][{}]/td[2]/a/text()'.format(i)) print(i+1,' ', title[0]) print('______________________', end='\n\n') time.sleep(3)

总结

要加user-agent,避免被识别为爬虫
最新回复(0)