Tools
RegExrv2.1 提供 Explain
Matching Single Characters
- 纯文本
.Match any single character except \n (unless /s)\Escape next character, such as\/or\(or\)
Matching Sets of Characters
[][abc] Match one out of a set of characters-[a-z] Match one character from range, often [a-zA-Z]^[^abc] Match one character not in set
^ 用于字符集合求非
Using Metacharacters
\dDigit[0-9]\Dpposite of \d[^0-9]\wWord character (alphanumeric, underscore)[a-zA-Z0-9]\WOpposite of\w[^a-zA-Z0-9]\sWhitespace character (space, tab, etc.)[\f\n\r\t\v]\SOpposite of\s[^\f\n\r\t\v][\b]Backspace (any use of \b in a character set)\nNewline\cControl character\fForm feed\rCarriage return\tTab\vVertical tab\xHexadecimal number;\xf0matches hexf0\0Octal number;\021matches octal 21POSIX字符类
Repeating Matches
*Match 0 or more of previous char/subexpression+Match 1 or more of previous char/subexpression?Match 0 or 1 of previous char/subexpression{m,n}Match m to n (inclusive) of previous char/subex.{n,}Match n or more of previous char/subexpression{n}Match exactly n of previous char/subexpression*?,??Lazy version of same (works for any quantifier)*+,?+Possessive version (works for any quantifier)
防止过度匹配:
默认为 贪婪型元字符,例如 * + {n,},匹配结果是多多益善而不是适可而止,
当不需要这种贪婪特性时,使用懒惰型元字符, *? +? {n,}?,特性是匹配尽可能少的字符.
Position Matching
^Start of string (equivalent: $A unless /m is used)$End of string (equivalent: $Z unless /m is used)\bWord boundary, similar to: (\w\W|\W\w)\BAnything but a word boundary(?m)^or$
分行匹配模式:
分行匹配模式(Multiline Mode) (?m) 使得正则表达式引擎将行分隔符当作一个字符串分隔符来对待。
在这种模式下,^不但匹配正常的字符串开头,还将匹配行分隔符(换行符)后面的开始位置;
类似的还有 $.
Using Subexpressions
()Define a subexpression|OR;(ab|cd)matchesaborcd
常用来对重复次数元字符的作用对象做出精确的设定和控制,子表达式允许嵌套
Using Backreferences
\aa subexpression (use inside match, egs/(.)\1/a/)
回溯引用允许正则表达式引用前面的匹配结果,\1 也代表着模式里的第一个子表达式,\2 代表第二个子表达式,\3 代表着第三个,依次轮推。
回溯引用只能引用模式中的子表达式 ( 和 ) 括起来的正则表达式片段
\lMake next character lowercase\uMake next character uppercase\LMake entire string (up to\E) lowercase\UMake entire string (up to\E) uppercase\EEnd\Lor\U(so they only apply before\E)\u\LCapitalize first char, lowercase rest (sentence)
Looking Ahead and Behind
- 正向前查找
(?=)Look-ahead;m/a(?=b)/matchesab, “eats” a - 正向后查找
(?<=)Look-behind;m/(?<=a)b/matchesab, “eats” b - 负向前查找
(?!)Negative look-ahead - 负向后查找
(?<!)Negative look-behind
Embedding Conditions
?(a)bConditional; if a then b?(a)b|cConditional; if a then b else c
RegExrV2.1 CheatSheet
Character classes
. any character except newline
\w \d \s word, digit, whitespace
\W \D \S not word, digit, whitespace
[abc] any of a, b, or c
[^abc] not a, b, or c
[a-g] character between a & gAnchors
^abc$ start / end of the string
\b \B word, not-word boundaryEscaped Character
\. \* \\ escaped special characters
\t \n \r tab, linefeed, carriage return
\u00A9 unicode escaped ©Groups & Lookaround
(abc) capture group
\1 backreference to group #1
(?:abc) non-capturing group
(?=abc) positive lookahead
(?!abc) negative lookaheadQuantifiers & Alternation
a* a+ a? 0 or more, 1 or more, 0 or 1
a{5} a{2,} exactly five, two or more
a{1,3} between one & three
a+? a{2,}? match as few as possible
ab|cd match ab or cd