A character class is used to represent a set of characters. The following combinations are allowed in describing a character class:
^$()%.[]*+-?
)
represents the character x itself.
.
: (a dot) represents all characters.%a
: represents all letters.%c
: represents all control characters.%d
: represents all digits.%l
: represents all lowercase letters.%p
: represents all punctuation characters.%s
: represents all space characters.%u
: represents all uppercase letters.%w
: represents all alphanumeric characters.%x
: represents all hexadecimal digits.%z
: represents the character with representation 0.%x
: (where x is any non-alphanumeric character)
represents the character x.
This is the standard way to escape the magic characters.
Any punctuation character (even the non magic)
can be preceded by a '%
'
when used to represent itself in a pattern.
[set]
:
represents the class which is the union of all
characters in set.
A range of characters may be specified by
separating the end characters of the range with a '-
'.
All classes %
x described above may also be used as
components in set.
All other characters in set represent themselves.
For example, [%w_]
(or [_%w]
)
represents all alphanumeric characters plus the underscore,
[0-7]
represents the octal digits,
and [0-7%l%-]
represents the octal digits plus
the lowercase letters plus the '-
' character.
The interaction between ranges and classes is not defined.
Therefore, patterns like [%a-z]
or [a-%%]
have no meaning.
[^set]
:
represents the complement of set,
where set is interpreted as above.
For all classes represented by single letters (%a
, %c
, etc.),
the corresponding uppercase letter represents the complement of the class.
For instance, %S
represents all non-space characters.
The definitions of letter, space, and other character groups
depend on the current locale.
In particular, the class [a-z]
may not be equivalent to %l
.