0%

正则表达式 BRE ERE PCRE 区别

了解正则表达式标准,各个之间存在一些差异

  • BRE (Basic Regular Expressions)
  • ERE (Extended Regular Expressions)
  • PCRE (Perl Compatible Regular Expressions)

常用工具

WhatSyntaxComments/gotchas
Programming languages
PerlPCREPCRE is actually a separate implementation from Perl's, with slight differences
Python's re standard libPython's own syntax (Perl-inspired)
RubyRuby's own syntax (Perl-inspired)
Java's java.util.regexAlmost PCRE
Boost.RegexPCRE
Text editors
EclipsePCRE
Emacs?
NetbeansPCRE
Notepad++PCRE (Boost.Regex)
PyCharmPCREPerl-inspired
Sublime Text?
UltraEditPCRE
ViMViM
Command-line tools
awkEREmight depend on the implementation
grepBRE, egrep for ERE, grep -P for PCRE (optional)
lessEREusually; man page says "regular expression library supplied by your system"
screenplain text
sedBRE, -E switches to ERE

语法差异

WhatPerl/PCREPython's rePOSIX (BRE)POSIX extended (ERE)ViM
Basics
Custom character class[...][...][...][...][...]
Negated custom character class[^...][^...][^...][^...][^...]
\ special in class?yesyesno, ] escaped if comes firstno, ] escaped if comes firstyes
Ranges[a-z], - escaped if comes last[a-z], - escaped if first or last[a-z], - escaped if comes last[a-z], - escaped if comes last
Alternation||\||\| \& (low precedence)
Escaped character\033 \x1B \x{1234} \N{name} \N{U+263D}\x12\%d123 \%x2A \%u1234 \%U1234ABCD
Character classes
Any character (except newline).....
Any character (including newline)\_.
Match a "word" character (alphanumeric plus _)\w [[:word:]]\w\w\w\w
Case[[:upper:]] / [[:lower:]][[:upper:]] / [[:lower:]][[:upper:]] / [[:lower:]]\u [[:upper:]] / \l [[:lower:]]
Match a non-"word" character\W\W\W
Match a whitespace character (except newline)\s [[:space:]]\s [[:space:]]\s [[:space:]]
Whitespace including newline\s [[:space:]]\s\_s
Match a non-whitespace character\S\S[^[:space:]][^[:space:]]\S [^[:space:]]
Match a digit character\d [[:digit:]]\d[[:digit:]][[:digit:]]\d [[:digit:]]
Match a non-digit character\D\D[^[:digit:]][^[:digit:]]\D [^[:digit:]]
Any hexadecimal digit[[:xdigit:]][[:xdigit:]][[:xdigit:]]\x [[:xdigit:]]
Any octal digit\o
Any graphical character excluding "word" characters[[:punct:]][[:punct:]][[:punct:]][[:punct:]]
Any alphabetical character[[:alpha:]][[:alpha:]][[:alpha:]]\a [[:alpha:]]
Non-alphabetical character[^[:alpha:]][^[:alpha:]]\A [^[:alpha:]]
Any alphanumerical character[[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]]
ASCII[[:ascii:]]
Character equivalents (e = é = è) (as per locale)[[=e=]][[=e=]][[=e=]]
Zero-width assertions
Word boundary\b\b\b\b\< / \>
Anywhere but word boundary\B\B\B\B
Beginning of line/string^ / \A^ / \A^^^ (beginning of pattern ) \_^
End of line/string$ / \Z$ / \Z$$$ (end of pattern) \_$
Captures and groups
Capturing group(...) (?<name>...)(...) (?P<name>...)\(...\)(...)\(...\)
Non-capturing group(?:...)(?:...)\%(...\)
Backreference to a specific group.\1 \g1 \g{-1}\1\1\1 non-official\1
Named backreference\g{name} \k<name>(?P=name)
Look-around
Positive look-ahead(?=...)(?=...)\(...\)\@=
Negative look-ahead(?!...)(?!...)\(...\)\@!
Positive look-behind(?<=...)(?<=...)\(...\)\@<=
Negative look-behind(?<!...)(?<!...)\(...\)\@<!
Multiplicity
0 or 1??\??\?
0 or more*****
1 or more++\++\+
Specific number{n} {n,m} {n,}{n} {n,m} {n,}\{n\} \{n,m\} \{n,\}{n} {n,m} {n,}\{n} \{n,m} \{n,}
0 or 1, non-greedy????
0 or more, non-greedy*?*?\{-}
1 or more, non-greedy+?+?
Specific number, non-greedy{n,m}? {n,}?{n,m}? {n,}?\{-n,m} \{-n,}
0 or 1, don't give back on backtrack?+
0 or more, don't give back on backtrack*+
1 or more, don't give back on backtrack++
Specific number, don't give back on backtrack{n,m}+ {n,}+
Other
Independent non-backtracking pattern(?>...)\(...\)\@>
Make case-sensitive/insensitive(?i) / (?-i)(?i) / (?-i)\c / \C

BRE ERE 需要注意 () {} 的区别,另外都不支持 \d\D

Ref

  1. 正则表达式“派别”简述
  2. Regular expression
  3. grep中使用”\d”匹配数字不成功的原因
  4. Regex cheatsheet