Monday, August 2, 2010

regular expression

Regular Expression Class Type Meaning
_
. all Character Set A single character (except newline)
^ all Anchor Beginning of line
$ all Anchor End of line
[...] all Character Set Range of characters
* all Modifier zero or more duplicates
\< Basic Anchor Beginning of word
\> Basic Anchor End of word
\(..\) Basic Backreference Remembers pattern
\1..\9 Basic Reference Recalls pattern
_+ Extended Modifier One or more duplicates
? Extended Modifier Zero or one duplicate
\{M,N\} Extended Modifier M to N Duplicates
(...|...) Extended Anchor Shows alteration
_
\(...\|...\) EMACS Anchor Shows alteration
\w EMACS Character set Matches a letter in a word
\W EMACS Character set Opposite of \w

POSIX character sets


POSIX added newer and more portable ways to search for character sets. Instead of using [a-zA-Z] you can replace 'a-zA-Z' with [:alpha:], or to be more complete. replace [a-zA-Z] with [[:alpha:]]. The advantage is that this will match internetional character sets. You can mix the old style and new POSIX styles, such as
grep '[1-9[:alpha:]]'
Here is the fill list

Character Group Meaning
[:alnum:] Alphanumeric
[:cntrl:] Control Character
[:lower:] Lower case character
[:space:] Whitespace
[:alpha:] Alphabetic
[:digit:] Digit
[:print:] Printable character
[:upper:] Upper Case Character
[:blank:] whitespace, tabe, etc.
[:graph:] Printable and visible characters
[:punct:] Puctuation
[:xdigit:] Extended Digit
Note that some people use [[:alpha:]] as a notation, but the outer '[...]' specifies a character set.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.