java 获取微信公众号文章使用搜狗微信搜索

mac2024-03-15 34

1.搜索需要SNUID 获取方法

// 获取snuid （因为sunid有时间和访问次数限制建议每次自动查询数据时更新一次） PHPSESSID IPLOC public String getSnuid() { CloseableHttpClient httpClient = null; CookieStore cookieStore = null; String url = "https://www.sogou.com/web?query=333&_asf=www.sogou.com&_ast=1488955851&w=01019900&p=40040100&ie=utf8&from=index-nologin"; int timeout = 30000; String snuid = null; try { cookieStore = new BasicCookieStore(); HttpClientContext context = HttpClientContext.create(); context.setCookieStore(cookieStore); RequestConfig globalConfig = RequestConfig.custom().setCookieSpec(CookieSpecs.STANDARD).build(); httpClient = HttpClients.custom().setDefaultRequestConfig(globalConfig).setDefaultCookieStore(cookieStore).build(); HttpGet httpGet = new HttpGet(url); httpGet.setConfig(RequestConfig.custom().setSocketTimeout(timeout).setConnectTimeout(timeout).build()); httpGet.setHeader("Cookie", "ABTEST=0|1488956269|v17;IPLOC=CN3301;SUID=E9DA81B7290B940A0000000058BFAB6D;PHPSESSID=rfrcqafv5v74hbgpt98ah20vf3;SUIR=1488956269"); httpClient.execute(httpGet); for (Cookie c : cookieStore.getCookies()) { if (c.getName().equals("SNUID")) { snuid = c.getValue(); } } } catch (Exception e) { e.printStackTrace(); } return snuid;

}

2.通过fiddler 分析出访问需要header 部分其中注意Cookie 部分必须要有SUV SNUID 这两个

我自己使用的是 HttpClients 模拟浏览器访问

Referer 可以通过fiddler 查看其中的值，并填入。

httpGet.setHeader("Host", "weixin.sogou.com"); httpGet.setHeader("Connection", "keep-alive"); httpGet.setHeader("Upgrade-Insecure-Requests", "1"); httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"); httpGet.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"); httpGet.setHeader("Referer", refererUrl); httpGet.setHeader("Accept-Encoding", "gzip, deflate, br"); httpGet.setHeader("Cookie", "SUV=11F4A14A78A4DE3D5DB948F0493C5371;SNUID=" + SNUID + ";"); httpGet.setHeader("Accept-Language", "zh-CN,zh;q=0.9,ja;q=0.8,en;q=0.7");

HttpResponse response = httpClient.execute(httpGet);

以上可以回去到如下图部分的网页

3.查看页面源码发现标题的链接是这样的

这是如果直接点击右边部分的链接会直接出现

通过fiddler 可以发现Cookie 中没有 SUV 和 SNUID 而且原来的Url 跳转链接的最后多了点内容类似于 &k=*&h=* (*代表任意的一个值)

通过分析js 源码并按照其规则使用java 代码写出了可以访问的Url

// 解析微信文章url 动态添加 k 和 h public String createUrl(String url) { double b = Math.floor(100 * Math.random()) + 1; int newB = (new Double(b)).intValue(); int a = url.indexOf("url="); int c = url.indexOf("&k="); if (-1 != a && -1 == c) { int temp = a + 4 + 21 + newB; String tempA = url.substring(temp); String newA = tempA.substring(0, 1); System.out.println(newA); url += "&k=" + newB + "&h=" + newA; } return url; }

这时我们就可以使用上面的url 访问不要忘记添加header

查看返回的内容为

这个url 就是我们的最终跳转的网址。

注：因为搜狗微信的搜索工具功能暂时下线了所以到的文章是搜狗微信的默认规则。

最新回复(0)

java 获取微信公众号文章 使用搜狗微信搜索

java 获取微信公众号文章使用搜狗微信搜索