首页 > 其他分享 >Runaway Regular Expressions: Catastrophic Backtracking

Runaway Regular Expressions: Catastrophic Backtracking

时间:2024-04-09 17:34:59浏览次数:31  
标签:regex group matches Catastrophic second Regular Expressions match first

Runaway Regular Expressions: Catastrophic Backtracking

Consider the regular expression (x+x+)+y. Before you scream in horror and say this contrived example should be written as xx+y or x{2,}y to match exactly the same without those terribly nested quantifiers: just assume that each “x” represents something more complex, with certain strings being matched by both “x”. See the section on HTML files below for a real example.

Let’s see what happens when you apply this regex to xxxxxxxxxxy. The first x+ will match all 10 x characters. The second x+ fails. The first x+ then backtracks to 9 matches, and the second one picks up the remaining x. The group has now matched once. The group repeats, but fails at the first x+. Since one repetition was sufficient, the group matches. y matches y and an overall match is found. The regex is declared functional, the code is shipped to the customer, and his computer explodes. Almost.

The above regex turns ugly when the y is missing from the subject string. When y fails, the regex engine backtracks. The group has one iteration it can backtrack into. The second x+ matched only one x, so it can’t backtrack. But the first x+ can give up one x. The second x+ promptly matches xx. The group again has one iteration, fails the next one, and the y fails. Backtracking again, the second x+ now has one backtracking position, reducing itself to match x. The group tries a second iteration. The first x+ matches but the second is stuck at the end of the string. Backtracking again, the first x+ in the group’s first iteration reduces itself to 7 characters. The second x+ matches xxx. Failing y, the second x+ is reduced to xx and then x. Now, the group can match a second iteration, with one x for each x+. But this (7,1),(1,1) combination fails too. So it goes to (6,4) and then (6,2)(1,1) and then (6,1),(2,1) and then (6,1),(1,2) and then I think you start to get the drift.

If you try this regex on a 10x string in RegexBuddy’s debugger, it’ll take 2558 steps to figure out the final y is missing. For an 11x string, it needs 5118 steps. For 12, it takes 10238 steps. Clearly we have an exponential complexity of O(2^n) here. At 19x the debugger bows out because it won't show more than one million steps.

RegexBuddy is forgiving in that it detects it’s going in circles and aborts the match attempt. Other regex engines (like .NET) will keep going forever, while others will crash with a stack overflow (like Perl, before version 5.10). Stack overflows are particularly nasty on Windows, since they tend to make your application vanish without a trace or explanation. Be very careful if you run a web service that allows users to supply their own regular expressions. People with little regex experience have surprising skill at coming up with exponentially complex regular expressions.

 

标签:regex,group,matches,Catastrophic,second,Regular,Expressions,match,first
From: https://www.cnblogs.com/chucklu/p/18124409

相关文章

  • Do not nest ternary expressions no-nested-ternary 这个报错什么意思
    ESLint规则no-nested-ternary当检测到代码中存在嵌套的三元表达式时,会发出警告或错误。该规则旨在通过禁止使用嵌套的三元表达式来提升代码的可读性和可维护性,因为随着条件复杂度的增加,深度嵌套的三元表达式往往会变得难以理解和推理。三元表达式:三元表达式是JavaScript中一......
  • 由于JavaScript有两种方式两种写法Creating a regular expression,在线测试网站https:/
    constre=/ab+c/;constre=newRegExp("ab+c");如果要使用第二种方式需要改变flavor和delimiters  RegExp比//需要额外的一次转义可以点击CodeGenerator查看   delimiters的不同会影响所需要的转义......
  • AtCoder Regular Contest 173 E Rearrange and Adjacent XOR
    洛谷传送门AtCoder传送门不妨考虑最后的结果可以成为哪些\(a_i\)的组合。为了方便分析,我们令\(a_i=2^{i-1}\)。进行一次操作后,所有\(\text{popcount}(a_i)\)都为偶数。所以一个\(x\in[0,2^n-1]\)能被生成出来的必要条件是\(\text{popcount}(x)\)为偶数。然......
  • Python——Regular Expression(正则表达式)RE
    正则表达式是一种强大的文本处理工具,它使用一种特殊的语法来匹配、查找以及替换字符串中的字符组合。在Python中,正则表达式,"re模块"。英文叫做"RegularExpression"。re模块是Python中用于处理正则表达式的标准库。它提供了多个函数来执行正则表达式的匹配、查找、替换和分割操......
  • 『LeetCode』10. 正则表达式匹配 Regular Expression Matching
    题目描述给你一个字符串s和一个字符规律p,请你来实现一个支持'.'和'*'的正则表达式匹配。'.'匹配任意单个字符'*'匹配零个或多个前面的那一个元素所谓匹配,是要涵盖整个字符串s的,而不是部分字符串。示例1:输入:s="aa",p="a"输出:false解释:"a"无法匹配"aa"整个字......
  • AtCoder Regular Contest 171
    Preface小补一场ARC,D还是很有意思的但想了半天不会,鉴定为纯纯的彩笔A-NoAttacking考虑怎么放rook能使得留下来能放pawn的格子数量最多,手玩一下会发现先按照\((2,2),(4,4),(6,6),\cdots\)的顺序放,放满后再接着放\((1,1),(3,3),(5,5),\cdots\)是最优的手玩一下可以得出在放......
  • AtCoder Regular Contest 171
    \[\large\text{Round13:AtCoderRegularContest171}\]一言:我并不是要失去自由,而是要去收获那无可替代的不自由。——SSSS.电光机王几年没写了,但是我们仍然要捡回来!没啥好写的,T1,T2能力范围之内,T3不会,T4感觉很好,但是没做出来。\(\text{D:RollingHash}\)这题无论......
  • AtCoder Regular Contest 172
    Preface开学了小溜一下之前没打的ARC,结果这场后面没有计数改成数论了又给我创飞了这场的DE都太玄学了,属于是自己想半天一点屌思路没有然后看一眼题解就顿悟的类型总结就是菜得发昏A-Chocolate挺有意思的签到题考虑从大到小依次切,对于一个原来\(H'\timesW'\)的块,为了尽量......
  • Codeforces 1830C Hyperregular Bracket Strings
    考虑到区间的限制\([l,r]\)就是要求\([l,r]\)里的字符会在\([l,r]\)里找到匹配。假设还有个区间\([l',r']\)满足\(l\lel'\ler\ler'\),能够发现限制变成了\([l,l'),[l',r],(r,r']\)这\(3\)个区间内的字符能在对应区间内找到匹配。继续,假设\(l\lel'\le......
  • Python: Star unpacking expressions in for statements
    今天发现在Python3.11版本中一个很不错的新特性,可以在for循环中使用unpacking,这意味着可以更灵活地组合迭代对象。ls=[1,2,34]foriin1,2,3,*ls,789:print(i)"""1231234789"""其实我第一次知道for循环中可以使用x,y,z这样的结构,想想也是......