FANDOM



References

Syntax

Character Classes

Concept Description Syntax Example
Bracket expression matches a single collating element contained in the non-empty set of collating elements represented by the bracket expression [expression] [abc] [0-9a-zA-Z] [^0-9]
Character class expression represents the set of characters belonging to a character class [:name:] [:alpha:] [:digit:] [:xdigit:] [:alnum:] [:punct:] [:space:] [:blank:]
Shorthand character class \d \D \s \S \w \W

Character class expressions

  • POSIX Bracket Expressions
    • POSIX bracket expressions are a special kind of character classe.
    • [:alnum:] = [a-zA-Z0-9], [:alpha:] = [a-zA-Z], [:digit:] = [0-9], [:lower:] = [a-z]
POSIX Description ASCII Shorthand Java Remarks
[:alnum:] Alphanumeric characters [a-zA-Z0-9] \p{Alnum}
[:alpha:] Alphabetic characters [a-zA-Z] \p{Alpha}
[:digit:] Digits [0-9] \d \p{Digit}
[:punct:] Punctuation and symbols [!"\#$%&'()*+,\-./:;<=>?@\[\\\]^_`{|}~] \p{Punct}
[:space:] All whitespace characters [ \t\r\n\v\f] \s \p{Space}
[:word:] Word characters [A-Za-z0-9_] \w

Shorthand Character Classes

Shorthand Description Braket Expressions Remarks
\d digit [0-9]
\w word character [A-Za-z0-9_]
\s whitespace character [ \t\r\n\f]
\D non-digit [^\d]
\W non-word character [^\w]
\S non-whitespace character [^\s]

Quantifier

Quantity Greedy Lazy Possessive
zero or one occurrences ? ?? ?+
zero or more occurrences * *? *+
one or more occurrences + +? ++
exactly n times {n} {n}? {n}+
n or more times {n,} {n,}? {n,}+
at least n times, but not more than m times {n,m} {n,m}? {n,m}+

Anchor

Symbol Name BRE ERE Java Perl GNU sed
^ Start of Line O O O O O
$ End of Line O O O O O
\b Word Boundary O O O
\B Non Word Boundary O O O

Readings

BRE vs ERE

BRE ERE
Special characters . [ \ * ^ $ . [ \ ( * + ? { | ^ $

Regex Dialects

Java

Perl

.NET

sed

Special Topics

Special Characters

ERE special characters

An ERE special character has special properties in certain contexts. Outside those contexts, or when preceded by a backslash, such a character is an ERE that matches the special character itself. The extended regular expression special characters and the contexts in which they have their special meaning are:

. \ [ (
The period, left-bracket, backslash and left-parenthesis are special except when used in a bracket expression. Outside a bracket expression, a left-parenthesis immediately followed by a right-parenthesis produces undefined results.
)
The right-parenthesis is special when matched with a preceding left-parenthesis, both outside a bracket expression.
* + ? {
The asterisk, plus-sign, question-mark and left-brace are special except when used in a bracket expression (see RE Bracket Expression ). Any of the following uses produce undefined results:
  • if these characters appear first in an ERE, or immediately following a vertical-line, circumflex or left-parenthesis.
  • if a left-brace is not part of a valid interval expression.
|
The vertical-line is special except when used in a bracket expression. A vertical-line appearing first or last in an ERE, or immediately following a vertical-line or a left-parenthesis, or immediately preceding a right-parenthesis, produces undefined results.
^
The circumflex is special when used:
  • as an anchor
  • as the first character of a bracket expression
$
The dollar sign is special when used as an anchor.

BRE

A BRE special character has special properties in certain contexts. Outside those contexts, or when preceded by a backslash, such a character will be a BRE that matches the special character itself. The BRE special characters and the contexts in which they have their special meaning are:

. [ \
The period, left-bracket and backslash is special except when used in a bracket expression (see RE Bracket Expression ). An expression containing a [ that is not preceded by a backslash and is not part of a bracket expression produces undefined results.
*
The asterisk is special except when used:
  • in a bracket expression
  • as the first character of an entire BRE (after an initial ^, if any)
  • as the first character of a subexpression (after an initial ^, if any); see BREs Matching Multiple Characters .
^
The circumflex is special when used:
  • as an anchor (see BRE Expression Anchoring )
  • as the first character of a bracket expression (see RE Bracket Expression ).
$
The dollar sign is special when used as an anchor.

Formal rules for bracket expression

Bracket expressions such as [0-9a-zA-Z], [^0-9a-zA-Z], or [0-9a-zA-Z.?*+-] are kind of different from normal expressions. One of the most important differences is metacharacters or special characters. Including that, more formal detailed description for bracket expression can be found in the following

Capturing, Grouping and Backreferences

NOT operator in Regex

Nested pairs search

Lookaround : lookahead and lookbehind

Greedy, Reluctant, or Possessive Quantifiers