Regular Expression	Class	Type	Meaning
_
.	all	Character Set	A single character (except newline)
^	all	Anchor	Beginning of line
$	all	Anchor	End of line
[...]	all	Character Set	Range of characters
*	all	Modifier	zero or more duplicates
\<	Basic	Anchor	Beginning of word
\>	Basic	Anchor	End of word
$..$	Basic	Backreference	Remembers pattern
\1..\9	Basic	Reference	Recalls pattern
_+	Extended	Modifier	One or more duplicates
?	Extended	Modifier	Zero or one duplicate
\{M,N\}	Extended	Modifier	M to N Duplicates
(...\|...)	Extended	Anchor	Shows alteration
_
$...\\|...$	EMACS	Anchor	Shows alteration
\w	EMACS	Character set	Matches a letter in a word
\W	EMACS	Character set	Opposite of \w

POSIX character sets

POSIX added newer and more portable ways to search for character sets. Instead of using [a-zA-Z] you can replace 'a-zA-Z' with [:alpha:], or to be more complete. replace [a-zA-Z] with [[:alpha:]]. The advantage is that this will match internetional character sets. You can mix the old style and new POSIX styles, such as
grep '[1-9[:alpha:]]'
Here is the fill list

Character Group	Meaning
[:alnum:]	Alphanumeric
[:cntrl:]	Control Character
[:lower:]	Lower case character
[:space:]	Whitespace
[:alpha:]	Alphabetic
[:digit:]	Digit
[:print:]	Printable character
[:upper:]	Upper Case Character
[:blank:]	whitespace, tabe, etc.
[:graph:]	Printable and visible characters
[:punct:]	Puctuation
[:xdigit:]	Extended Digit

Note that some people use [[:alpha:]] as a notation, but the outer '[...]' specifies a character set.

tiny thoughts

Monday, August 2, 2010

regular expression

POSIX character sets

No comments:

Post a Comment

Followers

Blog Archive