爬虫(二)微博热搜榜
目标:https://s.weibo.com/top/summary?Refer=top_hot&topnav=1&wvr=6
元素:排名、热搜语
代码
import requests
import time
from lxml
import etree
headers
={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
url
= 'https://s.weibo.com/top/summary?Refer=top_hot&topnav=1&wvr=6'
text
= requests
.get
(url
=url
, headers
=headers
,).text
selector
= etree
.HTML
(text
)
for i
in range(2,52):
title
= selector
.xpath
('//tr[@class=""][{}]/td[2]/a/text()'.format(i
))
print(i
+1,' ', title
[0])
print('______________________', end
='\n\n')
time
.sleep
(3)
总结
要加user-agent,避免被识别为爬虫
转载请注明原文地址: https://mac.8miu.com/read-496629.html