Escaping Metacharacters
In a regular expression (regex), most ordinary characters match
themselves. For example, ab%
would match anywhere a
followed by b
followed by %
appeared in the text. Other characters don't match
themselves, but are metacharacters. For example, backslash is a special
metacharacter which 'escapes' or changes the meaning of the character
following it. Thus, to match a literal backslash would require a regular
expression to have two backslashes in sequence. NEdit-ng provides the
following escape sequences so that metacharacters that are used by the
regex syntax can be specified as ordinary characters.
\( \) \- \[ \] \< \> \{ \}
\. \| \^ \$ \* \+ \? \& \\
Special Control Characters
There are some special characters that are difficult or impossible to type. Many of these characters can be constructed as a sort of metacharacter or sequence by preceding a literal character with a backslash. NEdit-ng recognizes the following special character sequences:
\a
alert (bell)\b
backspace\e
ASCII escape character\f
form feed (new page)\n
newline\r
carriage return\t
horizontal tab\v
vertical tab
Octal and Hex Escape Sequences
Any ASCII character, except NUL
, can be specified by using
either an octal escape or a hexadecimal escape, each beginning with \0
or \x
(or \X
), respectively. For example, \052
and \X2A
both specify
the *
character. Escapes for NUL
(\00
or \x0
) are not valid and
will generate an error message. Also, any escape that exceeds \0377
or
\xFF
will either cause an error or have any additional character(s)
interpreted literally. For example, \0777
will be interpreted as \077
(a ?
character) followed by 7
since \0777
is greater than \0377
.
An invalid digit will also end an octal or hexadecimal escape. For
example, \091
will cause an error since 9
is not within an octal
escape's range of allowable digits (0-7) and truncation before the 9
yields \0
which is invalid.
Shortcut Escape Sequences
NEdit-ng defines some escape sequences that are handy shortcuts for commonly used character classes.
Shortcut | Class | Equivalent to |
---|---|---|
\d |
digits | [0-9] |
\l |
letters | [a-zA-Z] plus locale dependent letters |
\s |
whitespace | [ \t\r\v\f] |
\w |
word characters | [\d\l_] |
\D
, \L
, \S
, and \W
are the same as the lowercase versions except
that the resulting character class is negated. For example, \d
is
equivalent to [0-9]
, while \D
is equivalent to [^0-9]
.
These escape sequences can also be used within a character class. For
example, [\l_]
is the same as [a-zA-Z_]
, extended with
possible locale dependent letters. The escape sequences for special
characters, and octal and hexadecimal escapes are also valid within a
class.
Word Delimiter Tokens
Although not strictly a character class, the following escape sequences behave similarly to character classes:
\y
Word delimiter character\Y
Not a word delimiter character
The \y
token matches any single character that is one of the
characters that NEdit-ng recognizes as a word delimiter character, while
the \Y
token matches any character that is not a word delimiter
character. Word delimiter characters are dynamic in nature, meaning that
the user can change them through preference settings. For this reason,
they must be handled differently by the regular expression engine. As a
consequence of this, \y
and \Y
cannot be used within a character
class specification.