微博二级评论爬取

mac2022-06-30 27

def wb_child_comment(self,req): try: main_url = "https://weibo.com/aj/v6/comment/big?ajwvr=6&{}&from=singleWeiBo" # self.get_all_content(req) #https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big&root_comment_id=4095052063593913&is_child_comment=ture&id=4095051414397198&from=singleWeiBo url="https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big&root_comment_id=4213888171751114&is_child_comment=tur&id=4095051414397198&from=singleWeiBo" jsonstr = req.get(url).json() #r"https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big&root_comment_id=4215074627189144&is_child_comment=ture&id=4095051414397198&from=singleWeiBo").json() croot = html.fromstring(jsonstr["data"]["html"]) print(croot) with open("weibocomment3.html", "w", encoding='utf-8') as fs: fs.write(jsonstr["data"]["html"]) hava_more_node = croot.xpath("//div[@class='list_li_v2']/div[@class='WB_text']/a/@action-data") while hava_more_node: hava_more_url = hava_more_node[0] if hava_more_url: next_c_url = main_url.format(hava_more_url) next_jsonstr = req.get(next_c_url).json() chtml = next_jsonstr["data"]["html"] with open("weibocomment4.html", "w", encoding='utf-8') as fs: fs.write(chtml) croot2 = html.fromstring(chtml) hava_more_node = croot2.xpath("//div[@class='list_li_v2']/div[@class='WB_text']/a/@action-data") else: print("no more") except: print("get child comment error")

思路:

1。第一次需要访问的链接是

https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big&root_comment_id=4215074627189144&is_child_comment=ture&id=4095051414397198&from=singleWeiBo参数说明:

https://weibo.com/aj/v6/comment/big?ajwvr=6&more_comment=big& 前面这些固定

root_comment_id:是一级评论的id

is_child_comment=ture 固定的

id=4095051414397198 这个id目前还不知道干嘛，有知道朋友请赐教

from=singleWeiBo 固定的必须加这个后面还会用到

2。循环判断是否有更多

获取更多按钮的xpath

hava_more_node = croot.xpath("//div[@class='list_li_v2']/div[@class='WB_text']/a/@action-data") 然后可以获取一个url，拼接完整的url最后在拼接一个重要的参数from=singleWeiBo，如果不加这个参数将取得是一级评论的列表。

转载于:https://www.cnblogs.com/c-x-a/p/8526753.html

相关资源：Python网络爬虫之爬取微博热搜

最新回复(0)