python 正则相关函数全解析

mac2022-06-30 32

前言：网上有很多关于python正则函数的方法说明，这里尽可能用最简单的demo把所有函数之间的逻辑关系说清楚，供参考。

1.最原始的 re.compile()这个函数一般是需要和其它函数一起使用的，单独存在没有意义，但是要说明的是，这个函数是对正则表达式进一步的使用有很大帮助。eg，测试字符串：

test_str = "I am 18years old,you are 16Years old,so good!"1测试正则表达式：

pattern = r'(\d+)([a-z]+)', 匹配数字小写字母的连接字符串（注：括号里面的匹配可单独获取）1生成正则对象：

p = re.compile(pattern, re.I)，re.I 是忽略大小写的意思

def compile(pattern, flags=0):函数flags有默认值，也可以不传这个参数123这里列举几个常用的模式：

re.I(re.IGNORECASE): 忽略大小写re.M(MULTILINE): 多行模式，改变’^’和’$’的行为re.S(DOTALL): 点任意匹配模式，改变’.’的行为re.L(LOCALE): 使预定字符类 \w \W \b \B \s \S 取决于当前区域设定re.U(UNICODE): 使预定字符类 \w \W \b \B \s \S \d \D 取决于unicode定义的字符属性123452.match，search，findall与re.compile连用match：默认是从字符串开始进行匹配，也可以给参数pos传入整数作为开始匹配位置匹配不到返回None，匹配到第一个就返回匹配对象，用group获取具体字符串eg，

import re

test_str = "I am 18years old,you are 16Years old,so good!"pattern = r'(\d+)([a-z]+)'p = re.compile(pattern, re.I)text_object = p.match(string=test_str, pos=5)print text_object.group()

result:18years12345678910大多时候会把正则放在match函数里面，这样会减少一点代码量eg，

text_object = re.match(r'(\d+)([a-z]+)', test_str, re.I)缺点是不能指定开始匹配的位置，默认从开始位置匹配12search：默认从字符串开始到结束顺序匹配，也可以给参数pos传入整数作为开始匹配位置匹配不到返回None，匹配到第一个就返回匹配对象，用group获取具体字符串eg,

import re

test_str = "I am 18years old,you are 16Years old,so good!"pattern = r'(\d+)([a-z]+)'p = re.compile(pattern, re.I)text_object = p.search(string=test_str, pos=20)print text_object.group()print text_object.group(1)print text_object.group(2)

result:16Years16Years1234567891011121314或者

re.search(r'(\d+)([a-z]+)', test_str, re.I)因为没办法指定开始位置，所以只能匹配到 18years12findall：默认把字符串所有满足条件的字符子串都找出来，返回是一个str类型的组，也可以pos指定开始位置，endpos指定结束位置eg,

import re

test_str = "I am 18years old,you are 16Years old,so good!"pattern = r'(\d+)([a-z]+)'p = re.compile(pattern, re.I)str_arr = p.findall(test_str)print str_arr

result:[('18', 'years'), ('16', 'Years')]因为正则表达式是分组的，所以返回的也是分组的123456789101112或者：

re.findall(pattern=r'(/d+)([a-z]+)', string=test_str, flags=re.I)1另：还有一个 find 函数，有时候会搞混，其实和 findall 没一点关系，find 函数是 str.find()，属于字符串的函数，返回的是字符串的下标。

3.finditer 的特殊性finditer 返回的是一个匹配对象的迭代器，迭代器的好处是减少内存消耗，处理比较简单的文本用不上，一次返回数据量太多的时候用处很大。

import re

test_str = "I am 18years old,you are 16Years old,so good!"pattern = r'(\d+)([a-z]+)'p = re.compile(pattern, re.I)iter_p = p.finditer(test_str)for iter_next in iter_p: print iter_next.group()

result：18years16Years1234567891011124.字符串替换 subimport re

test_str = "I am 18years old,you are 16Years old,so good!"

pattern = r'(\d+)([a-z]+)'p = re.compile(pattern, re.I)new_string = p.sub('17years', test_str)print new_string

result:I am 17years old,you are 17years old,so good!1234567891011或者：

re.sub(pattern=r'(\d+)([a-z]+)', repl='17years', string=test_str, count=1, flags=re.I)count 参数指定替换的个数result:I am 17years old,you are 16Years old,so good!12345.字符串分隔 splitimport re

test_str = "I am 18years old,you are 16Years old,so good!"arr = re.split(r'[,\d]', test_str)print arr

result:['I am ', '', 'years old', 'you are ', '', 'Years old', 'so good!']123456786.有需要再补充————————————————版权声明：本文为博主「清泉影月」的原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接及本声明。原文链接：https://blog.csdn.net/qingquanyingyue/article/details/94300298

转载于:https://www.cnblogs.com/valorchang/p/11474432.html

最新回复(0)