Python 学习笔记正则表达式-白红宇

Python 学习笔记正则表达式

阅读量：2388 次

发布时间：2019-05-10

本文共 5638 字，大约阅读时间需要 18 分钟。

元字符

. ^ $ * + ? {} [] () \ |

python 的正则表达式需要re 模块支持

定义一个字符串s，通过"r" 定义一个规则'abc' 通过findall 从提供的字符串中匹配

1

2

3

4

5

>>>

import

re

>>> s

=

'abc'

>>> s

=

r

'abc'

>>> re.findall(s,

'abcdfdsajk'

)

[

'abc'

]

[ ]

常用来指定一个字符集: [abc],[a-z]

元字符在字符集中不起作用: [abc$]

例如，[akm$]将匹配字符"a", "b", "c", 或 "$" 中的任意一个

[^string]

匹配指定字符串以外的字符，例如[^a]，表示匹配“a”以外的所有字符

通过元字符“[string]”匹配

1

2

3

4

5

6

7

>>> st

=

'top tip tap tsp tep'

>>> res

=

r

'top'

>>> re.findall(res,st)

[

'top'

]

>>> res

=

r

't[io]p'

>>> re.findall(res,st)

[

'top'

,

'tip'

]

[^string]匹配不包含“io” 的字符串

1

2

3

>>> res

=

r

't[^io]p'

>>> re.findall(res,st)

[

'tap'

,

'tsp'

,

'tep'

]

^ 匹配行首

$ 匹配行尾

>>> s

=

"hello world,hello boy"

>>> r

=

r

"hello"

>>> re.findall(r,s)

[

'hello'

,

'hello'

]

>>> r

=

r

"^hello"

>>> re.findall(r,s)

[

'hello'

]

>>> r

=

r

"boy$"

>>> re.findall(r,s)

[

'boy'

]

. 匹配换行符以外的所有字符

\ 脱义符

\d 匹配任何十进制数，相当于[0-9]

\D 匹配任何非数字字符，相当于[^0-9]

\s 匹配任何空白字符，相当于[\t\n\r\f\v]

\S 匹配任何非空白字符，相当于[^\t\n\r\f\v]

\w 匹配任何字母数字字符，相当于[a-zA-Z0-9]

\W 匹配任何非字母数字字符，相当于[^a-zA-Z0-9]

\\ 匹配"\"

* 匹配指定字符0次或多次，等同于{0,}

+ 匹配指定字符1次或多次，等同于{1,}

？匹配1次或0次，等同于{0,1}

{n,m} 匹配大于等于n，小于等于m次的字符串

{m,} 匹配m次以上的字符串

例子：匹配电话号码

1

2

3

4

>>>

import

re

>>> r1

=

r

"\d{3,4}-?\d{8}"

>>> re.findall(r1,

'020-88776655'

)

[

'020-88776655'

]

（）分组

例子：匹配邮箱

1

2

3

4

5

6

7

>>> email

=

r

'\w{3}@\w+(\.com|\.net)'

>>> re.match(email,

'abc@qq.com'

)

<_sre.SRE_Match

object

at

0x7f81fea30828

>

>>> re.match(email,

'bbb@163.net'

)

<_sre.SRE_Match

object

at

0x7f81fea470a8

>

>>> re.match(email,

'ccc@redhat.org'

)

>>>

编译正则表达式

正则表达式被编译成 `RegexObject` 实例，可以为不同的操作提供方法，如模式匹配搜索或字符串替换。

re 模块提供了一个正则表达式引擎的接口，可以将REstring 编译成对象并用它们来进行匹配，例如：

1

2

3

4

5

6

7

>>>

import

re

>>> r1

=

r

"\d{3,4}-?\d{8}"

>>> p_tel

=

re.

compile

(r1)

>>> p_tel

<_sre.SRE_Pattern

object

at

0x7f81fead6ab0

>

>>> p_tel.findall(

'020-88776655'

)

[

'020-88776655'

]

数量词的贪婪模式与非贪婪模式

正则表达式通常用于在文本中查找匹配的字符串。Python里数量词默认是贪婪的（在少数语言里也可能是默认非贪婪），总是尝试匹配尽可能多的字符；非贪婪的则相反，总是尝试匹配尽可能少的字符。例如：正则表达式"ab*"如果用于查找"abbbc"，将找到"abbb"。而如果使用非贪婪的数量词"ab*?"，将找到"a"。

像 * 这样地重复是“贪婪的”；当重复一个 RE 时，匹配引擎会试着重复尽可能多的次数。如果模式的后面部分没有被匹配，匹配引擎将退回并再次尝试更小的重复。

不贪婪的限定符 *?、+?、?? 或 {m,n}?

贪婪限定符 .*

函数

match() 决定RE是否在字符串刚开始的位置匹配

search() 扫描字符串，找到这个RE匹配的位置

findall() 找到RE匹配的所有子串，并把它们作为一个列表返回

finditer() 找到RE匹配的所有子串，并把它们作为一个迭代器返回

如果没有匹配到，match()和search() 将返回None。匹配到，则返回一个'MatchObject' 实例

>>> string_re.match('pmghong hello')

<_sre.SRE_Match object at 0x7f81fea28578>

>>> string_re.match(

'hello pmghong '

)

>>>

>>> string_re.search('pmghong hello')

<_sre.SRE_Match object at 0x7f81fea285e0>

>>> string_re.search('hello pmghong')

<_sre.SRE_Match

object

at

0x7f81fea28578

>

可以看到match 只能匹配字符串在开头的情况，而search 则不管在开头、结尾都可以。

在实际程序中，最常见的作法是将 `MatchObject` 保存在一个变量里，然後检查它是否为 None，通常如下所示：

>>> string_re.match(

'pmghong hello'

)

<_sre.SRE_Match

object

at

0x7f81fea28648

>

>>> x

=

string_re.match(

'pmghong hello'

)

>>>

if

x:

...

print

'OK'

...

OK

>>> string_re.match(

'hello pmghong'

)

>>> x

=

string_re.match(

'hello pmghong'

)

>>>

if

x:

...

print

'OK'

...

else

:

...

print

'Not OK'

...

Not OK

match() 的方法

group() 返回被 RE 匹配的字符串

start() 返回匹配开始的位置

end() 返回匹配结束的位置

span() 返回一个元组包含匹配 (开始,结束) 的位置

>>> s

=

"hello python"

>>> r1

=

r

'hello'

>>> re.match(r1,s)

<_sre.SRE_Match

object

at

0x7f81fea285e0

>

>>>

>>> x

=

re.match(r1,s)

>>> x.group()

'hello'

>>> x.start()

0

>>> x.end()

5

>>> x.span()

(

0

,

5

)

group() 返回 RE 匹配的子串。start() 和 end() 返回匹配开始和结束时的索引。span() 则用单个元组把开始和结束时的索引一起返回。

re.sub() 替换字符串

>>> s

=

"hello world"

>>> s.replace(

'world'

,

'boy'

)

'hello boy'

>>> s.replace(

'w...d'

,

'boy'

)

'hello world'

>>>

>>> rs

=

r

'w...d'

>>> re.sub(rs,

'boy'

,

'world would woked hello'

)

'boy boy boy hello'

replace() 虽然能替换字符串，但它不支持正则表达式，需要匹配正则表达式的话，需要使用sub() 这个函数

re.subn()

1 2	`>>> re.subn(rs,` `'boy'` `,` `'world would woked hello'` `)` `(` `'boy boy boy hello'` `,` `3` `)`

这个函数也是起到替换字符串的作用，相比于sub() 多了最后一项-- 匹配次数

re.split()切割，相比于split ，可以使用正则表达式匹配

1

2

3

4

5

6

>>> ip

=

'192.168.10.1'

>>> ip.split(

'.'

)

[

'192'

,

'168'

,

'10'

,

'1'

]

>>> s

=

'111+222-333*444/555'

>>> re.split(r

'[\+\-\*\/]'

,s)

[

'111'

,

'222'

,

'333'

,

'444'

,

'555'

]

RE 属性

re.compile() 也接受可选的标志参数，常用来实现不同的特殊功能和语法变更

1	`>>> p` `=` `re.` `compile` `(` `'ab*'` `,re.IGONRECASE)`

IGNORECASE，I 忽略字符串的大小写

1

2

3

4

5

6

7

>>> string_re

=

re.

compile

(r

'pmghong'

,re.I)

>>> string_re.findall(

'PMGHONG'

)

[

'PMGHONG'

]

>>> string_re.findall(

'pmghong'

)

[

'pmghong'

]

>>> string_re.findall(

'Pmghong'

)

[

'Pmghong'

]

DOTALL，S 使“.”匹配包括换行在内的所有字

>>> r1

=

r

"baidu.com"

>>> re.findall(r1,

'baidu.com'

)

[

'baidu.com'

]

>>> re.findall(r1,

'baidu_com'

)

[

'baidu_com'

]

>>> re.findall(r1,

'baidu com'

)

[

'baidu com'

]

>>> re.findall(r1,

'baidu\ncom'

)

[]

>>> re.findall(r1,

'baidu\ncom'

,re.S)

[

'baidu\ncom'

]

>>> re.findall(r1,

'baidu\tcom'

,re.S)

[

'baidu\tcom'

]

可以看到，一般情况下，"." 这个元字符并不能匹配像\n 这种换行符号，要匹配的话，需要加入S 这个属性

MULTILINE，M 多行匹配，影响$和^

比如说，我想匹配docstring中以"hello"开头的句子时，直接通过正则表达式是匹配不到的

>>> s

=

'''

... hello boy

... boys and girls

... hello girl

... what a nice day

... '''

>>> r1

=

r

'^hello'

>>> re.findall(r1,s)

[]

原因是docstring 是这样存放数据的：

1 2	`>>> s` `'\nhello boy\nboys and girls\nhello girl\nwhat a nice day\n'`

所以需要加入M属性，进行多行匹配

1 2	`>>> re.findall(r1,s,re.M)` `[` `'hello'` `,` `'hello'` `]`

VERBOSE，X 能够使用REs 的verbose 状态，使之被组织得更清晰易懂

类似的，有时我们正则太长，我们也可以通过分行写，使得结构更清晰易懂一些，但是直接应用这样的正则表达式去匹配字符串的话，也会出问题，原因跟上一个例子一样，因为docstring 会将\n 的字符也存放进去。

>>> tel

=

r

'''

... \d{3,4}

... -?

... \d{8}

... '''

>>> re.findall(tel,

'020-88776655'

)

[]

>>> tel

'\n\\d{3,4}\n-?\n\\d{8}\n'

解决办法就是加入re.X 属性

1 2	`>>> re.findall(tel,` `'020-88776655'` `,re.X)` `[` `'020-88776655'` `]`

附上网上搜到的一张表

转载地址：http://dmsab.baihongyu.com/

你可能感兴趣的文章

MongoDB 地理位置索引的实现原理

查看>>

MongoDB与MySQL的插入、查询性能测试

查看>>

深入理解OAuth2.0协议

查看>>

https原理：证书传递、验证和数据加密、解密过程解析

查看>>

MySQL在大型网站的应用架构演变

查看>>

sphinx教程1__mysql sphinx引擎插件式热安装

php变量引用和计数_refcount_gc和is_ref_gc

查看>>

windows环境下php和Php扩展编译,扩展dll文件编译

PHP json_encode中文乱码解决方法