介绍
pyquery是一个专门用来解析html的库,从名字很容易想到jQuery,没错,这完全是仿照jQuery的语法实现的。如果用过jQuery,俺么pyquery很容易实现
初始化html
pyquery可以接收一个网址,自动下载内容,也可以接收已经下载好的字符串格式的html,当然也可以传入一个本地html文件。但是我们一般都会使用requests下载html页面,然后再将html页面以字符串的格式传进去
python
from pyquery import PyQuery
html = '''
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>古明地觉</title>
</head>
<body>
<p id="bili"><a href="http://www.bilibili.com">想进入基佬的大门吗?还等什么,快点击吧</a></p>
<p class="s1">my name is satori</p>
<div>
<p class="s1">古明地恋</p>
</div>
<table >
<tbody>
<tr>
<td>姓名:</td>
<td><input type="text" name="name"></td>
</tr>
<tr class="tr">
<td>密码:</td>
<td><input type="password" name="password"></td>
</tr>
<tr>
<td></td>
<td><input type="submit" value="提交"></td>
</tr>
</tbody>
</table>
<a href="http://www.baidu.com" target="_blank">百度</a>
<a href="http://www.yahoo.com">雅虎</a>
</body>
</html>
'''
p = PyQuery(html)
使用选择器
python
from pyquery import PyQuery
html = '''
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>古明地觉</title>
</head>
<body>
<p id="bili"><a href="http://www.bilibili.com">想进入基佬的大门吗?还等什么,快点击吧</a></p>
<p class="s1">my name is satori</p>
<div>
<p class="s1">古明地恋</p>
</div>
<table >
<tbody>
<tr>
<td>姓名:</td>
<td><input type="text" name="name"></td>
</tr>
<tr class="tr">
<td>密码:</td>
<td><input type="password" name="password"></td>
</tr>
<tr>
<td></td>
<td><input type="submit" value="提交"></td>
</tr>
</tbody>
</table>
<a href="http://www.baidu.com" target="_blank">百度</a>
<a href="http://www.yahoo.com">雅虎</a>
</body>
</html>
'''
p = PyQuery(html)
filter和find
python
from pyquery import PyQuery
html = '''
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>古明地觉</title>
</head>
<body>
<p id="bili"><a href="http://www.bilibili.com">想进入基佬的大门吗?还等什么,快点击吧</a></p>
<p class="s1">my name is satori</p>
<div>
<p class="s1">古明地恋</p>
</div>
<table >
<tbody>
<tr>
<td>姓名:</td>
<td><input type="text" name="name"></td>
</tr>
<tr class="tr">
<td>密码:</td>
<td><input type="password" name="password"></td>
</tr>
<tr>
<td></td>
<td><input type="submit" value="提交"></td>
</tr>
</tbody>
</table>
<a href="http://www.baidu.com" target="_blank">百度</a>
<a href="http://www.yahoo.com">雅虎</a>
</body>
</html>
'''
p = PyQuery(html)
print(p("p"))
获取属性
python
from pyquery import PyQuery
html = '''
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>古明地觉</title>
</head>
<body>
<p id="bili"><a href="http://www.bilibili.com">想进入基佬的大门吗?还等什么,快点击吧</a></p>
<p class="s1">my name is satori</p>
<div>
<p class="s1">古明地恋</p>
</div>
<table >
<tbody>
<tr>
<td>姓名:</td>
<td><input type="text" name="name"></td>
</tr>
<tr class="tr">
<td>密码:</td>
<td><input type="password" name="password"></td>
</tr>
<tr>
<td></td>
<td><input type="submit" value="提交"></td>
</tr>
</tbody>
</table>
<a href="http://www.baidu.com" target="_blank">百度</a>
<a href="http://www.yahoo.com">雅虎</a>
</body>
</html>
'''
p = PyQuery(html)
a_tag = p("a") print(a_tag.attr("href"))
转载于:https://www.cnblogs.com/valorchang/p/11395435.html
相关资源:JAVA上百实例源码以及开源项目