Python 正则表达式re模块的使用

基本上所有的编程语言都会有正则表达式，正则表达式是用来匹配一段字符串的表达式。
在Python中需要通过正则表达式对字符串进行匹配的时候，可以使用内置模块re。

一、re中常用字符的含义
re模块中的字符是非常多的，我们例举如下常用的：
.    匹配任意1个字符（除了换行\n）
[ ]    匹配[ ]中列举的字符，如果是[ ]中的其中一个字符，则匹配成功
\d    匹配数字，即0-9
\D    匹配非数字，即不是数字
\s    匹配空白，即空格，tab键
\S    匹配非空白
\w    匹配单词字符，即a-z、A-Z、0-9、_（字母，数字，下划线）
\W    匹配非单词字符
*    匹配前一个字符出现0次或者无限次，即任意次，可有可无
+    匹配前一个字符出现1次或者无限次，即至少有1次
?    匹配前一个字符出现1次或者0次，即要么有1次，要么没有
{m}    匹配前一个字符出现m次
{m,n} 匹配前一个字符出现从m到n次
^    匹配字符串开头，即从开头开始匹配
$    匹配字符串结尾，即匹配到字符串的结尾

二、match和search的用法区别

match_result = re.match(r"read", "We read the world wrong and say that it deceives us.")

复制代码

print(match_result)

复制代码

# print(match_result.group())

复制代码

search_result = re.search(r"read", "We read the world wrong and say that it deceives us.")

复制代码

print(search_result)

复制代码

print(search_result.group())

复制代码

运行结果：

None

复制代码

[/code][code]read

复制代码

1.match是从字符串的开头开始匹配，而search可以从字符串的任意位置开始匹配。

2.不管是match还是search，匹配成功后都是返回一个re.Match对象，里面包含了匹配结果及匹配结果在字符串中的索引范围。如果没有匹配到结果，则返回None。

3.当匹配到结果后，调用re.Match对象的group()方法，可以返回匹配的结果。如果没有匹配到结果，调用group()方法会报错。

三、findall匹配所有结果

findall_result = re.findall(r'\d+', 'We read the world wrong 777 and 2 say that it deceives 007 us.')

复制代码

print(findall_result)

复制代码

运行结果：

['777', '2', '007']

复制代码

使用findall()方法，会依次匹配字符串中所有满足条件的结果，返回一个列表，如果没有匹配到结果，则返回的是一个空列表。

四、sub对匹配结果进行替换
语法：re.sub(正则表达式，替换内容，字符串)

sub_result = re.sub(r'a', 'A', 'We read the world wrong and say that it deceives us.')

复制代码

print(sub_result)

复制代码

运行结果：

We reAd the world wrong And sAy thAt it deceives us.

复制代码

使用sub()方法，可以对字符串中匹配到的字符进行替换，sub()会从头到尾匹配所有满足正则表达式的结果，然后都进行替换，返回替换后的字符串。结果与str.replace()方法的结果相同。
如果没有匹配到结果，则不做替换。

五、贪婪模式和非贪婪模式

result1 = re.search(r'\d+', 'We read the world wrong 7777777 and 2 say that it deceives 007 us.')

复制代码

print(result1.group())

复制代码

result2 = re.search(r'\d+?', 'We read the world wrong 7777777 and 2 say that it deceives 007 us.')

复制代码

print(result2.group())

复制代码

运行结果：

7777777

复制代码

7

复制代码

上面的代码中，使用\d+会匹配所有所有的7，使用\d+?则只匹配一个7.

在Python中，re默认是贪婪的，即在满足正则表达式的情况下，总是尝试匹配尽可能多的字符；

非贪婪则相反，总是尝试匹配尽可能少的字符。

在"*","?","+","{m,n}"后面加上问号？，可以使贪婪模式变成非贪婪模式。