Tools
RegExrv2.1 提供 Explain
Matching Single Characters
- 纯文本
.
Match any single character except \n (unless /s)\
Escape next character, such as\/
or\(
or\)
Matching Sets of Characters
[]
[abc] Match one out of a set of characters-
[a-z] Match one character from range, often [a-zA-Z]^
[^abc] Match one character not in set
^
用于字符集合求非
Using Metacharacters
\d
Digit[0-9]
\D
pposite of \d[^0-9]
\w
Word character (alphanumeric, underscore)[a-zA-Z0-9]
\W
Opposite of\w
[^a-zA-Z0-9]
\s
Whitespace character (space, tab, etc.)[\f\n\r\t\v]
\S
Opposite of\s
[^\f\n\r\t\v]
[\b]
Backspace (any use of \b in a character set)\n
Newline\c
Control character\f
Form feed\r
Carriage return\t
Tab\v
Vertical tab\x
Hexadecimal number;\xf0
matches hexf0
\0
Octal number;\021
matches octal 21POSIX字符类
Repeating Matches
*
Match 0 or more of previous char/subexpression+
Match 1 or more of previous char/subexpression?
Match 0 or 1 of previous char/subexpression{m,n}
Match m to n (inclusive) of previous char/subex.{n,}
Match n or more of previous char/subexpression{n}
Match exactly n of previous char/subexpression*?
,??
Lazy version of same (works for any quantifier)*+
,?+
Possessive version (works for any quantifier)
防止过度匹配:
默认为 贪婪型元字符,例如 *
+
{n,}
,匹配结果是多多益善而不是适可而止,
当不需要这种贪婪特性时,使用懒惰型元字符, *?
+?
{n,}?
,特性是匹配尽可能少的字符.
Position Matching
^
Start of string (equivalent: $A unless /m is used)$
End of string (equivalent: $Z unless /m is used)\b
Word boundary, similar to: (\w\W|\W\w)\B
Anything but a word boundary(?m)
^
or$
分行匹配模式:
分行匹配模式(Multiline Mode) (?m)
使得正则表达式引擎将行分隔符当作一个字符串分隔符来对待。
在这种模式下,^
不但匹配正常的字符串开头,还将匹配行分隔符(换行符)后面的开始位置;
类似的还有 $
.
Using Subexpressions
()
Define a subexpression|
OR;(ab|cd)
matchesab
orcd
常用来对重复次数元字符的作用对象做出精确的设定和控制,子表达式允许嵌套
Using Backreferences
\a
a subexpression (use inside match, egs/(.)\1/a/
)
回溯引用允许正则表达式引用前面的匹配结果,\1
也代表着模式里的第一个子表达式,\2
代表第二个子表达式,\3
代表着第三个,依次轮推。
回溯引用只能引用模式中的子表达式 ( 和 ) 括起来的正则表达式片段
\l
Make next character lowercase\u
Make next character uppercase\L
Make entire string (up to\E
) lowercase\U
Make entire string (up to\E
) uppercase\E
End\L
or\U
(so they only apply before\E
)\u\L
Capitalize first char, lowercase rest (sentence)
Looking Ahead and Behind
- 正向前查找
(?=)
Look-ahead;m/a(?=b)/
matchesab
, “eats” a - 正向后查找
(?<=)
Look-behind;m/(?<=a)b/
matchesab
, “eats” b - 负向前查找
(?!)
Negative look-ahead - 负向后查找
(?<!)
Negative look-behind
Embedding Conditions
?(a)b
Conditional; if a then b?(a)b|c
Conditional; if a then b else c
RegExrV2.1 CheatSheet
Character classes
. any character except newline
\w \d \s word, digit, whitespace
\W \D \S not word, digit, whitespace
[abc] any of a, b, or c
[^abc] not a, b, or c
[a-g] character between a & g
Anchors
^abc$ start / end of the string
\b \B word, not-word boundary
Escaped Character
\. \* \\ escaped special characters
\t \n \r tab, linefeed, carriage return
\u00A9 unicode escaped ©
Groups & Lookaround
(abc) capture group
\1 backreference to group #1
(?:abc) non-capturing group
(?=abc) positive lookahead
(?!abc) negative lookahead
Quantifiers & Alternation
a* a+ a? 0 or more, 1 or more, 0 or 1
a{5} a{2,} exactly five, two or more
a{1,3} between one & three
a+? a{2,}? match as few as possible
ab|cd match ab or cd