今天写个脚本用到正则表达式,查阅了资料加问了gpt老师终于解决,在此记录。
记录两种正则表达式有用的用法:
1、匹配指定了前后文的字符串
如我们需要匹配'on the one hand'中的'one',而不要'on the other hand'中的'other';需要用到正则表达式语法中的“特殊构造”:(?...)
,之所以是特殊构造因为这个(?...)
不会像正常括号算在分组中,而只作为匹配要求。
import re
text = "On the one hand, we have the option to do X. On the other hand, we have the option to do Y."
pattern = "(?<=in the )one(?= hand)"
matches = re.findall(pattern, text)
print(matches) # ['one']
2、有大量文本需要替换,且具有替换规则
如现在
text = "On the one hand, ... On the two hand, ...On the other hand, ..."
我们要把'one'改成'1','two'改成2,则可以用如下较优雅的写法
import re
project = {
'one': '1',
'two': '2'
}
text = "On the one hand, we have the option to do X. On the two hand, we have the option to do Y.On the other hand, we have the option to do Z."
pattern = "(?<=in the )" + '|'.join(project.keys()) + "(?= hand)"
res = re.sub(ptn,
lambda match: project[match],
text)
print(res)
# "On the 1 hand, we have the option to do X. On the 2 hand, we have the option to do Y.On the other hand, we have the option to do Z."
注意此处用到了re.sub(pattern, repl, string, count=0, flags=0)
需要注意的点是参数repl
可以是字符串,也可以是一个函数,若为字符串很好理解;若是函数则输入的参数为match
,是pattern
匹配了string
后的结果。所以上面用lambda match: project[match]
返回匹配了'one','two'映射后的字符串
def sub(pattern, repl, string, count=0, flags=0):
"""Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it's passed the Match object and must return
a replacement string to be used."""
return _compile(pattern, flags).sub(repl, string, count)
正则匹配的规则还是挺多挺复杂的,想要得心应手也不是非常简单,还是多动手吧。
标签:do,string,记录,python,pattern,two,hand,正则表达式,repl From: https://www.cnblogs.com/llllrj/p/17276998.html