|
$string1 = "Hello World\n";
if ($string1 =~ m/\w/) {
print "There is at least one alphanumeric ";
print "character in $string1 (A-Z, a-z, 0-9, _)\n";
}
|
\W
|
Matches a non-alphanumeric character, excluding "_";
same as [^A-Za-z0-9_] in ASCII, and
-
[^\p{Alphabetic} \p{GC=Mark}\p{GC=Decimal_Number}\p{GC=Connector_Punctuation}]
in Unicode.
|
$string1 = "Hello World\n";
if ($string1 =~ m/\W/) {
print "The space between Hello and ";
print "World is not alphanumeric\n";
}
|
\s
|
Matches a whitespace character,
which in ASCII are tab, line feed, form feed, carriage return, and space;
in Unicode, also matches no-break spaces, next line, and the variable-width spaces (amongst others).
|
$string1 = "Hello World\n";
if ($string1 =~ m/\s.*\s/) {
print "There are TWO whitespace characters, which may";
print " be separated by other characters, in $string1";
}
|
\S
|
Matches anything BUT a whitespace.
|
$string1 = "Hello World\n";
if ($string1 =~ m/\S.*\S/) {
print "There are TWO non-whitespace characters, which";
print " may be separated by other characters, in $string1";
}
|
\d
|
Matches a digit;
same as [0-9] in ASCII;
in Unicode, same as the \p{Digit} or \p{GC=Decimal_Number} property, which itself the same as the \p{Numeric_Type=Decimal} property.
|
$string1 = "99 bottles of beer on the wall.";
if ($string1 =~ m/(\d+)/) {
print "$1 is the first number in '$string1'\n";
}
Output:
99 is the first number in '99 bottles of beer on the wall.'
|
\D
|
Matches a non-digit;
same as [^0-9] in ASCII or \P{Digit} in Unicode.
|
$string1 = "Hello World\n";
if ($string1 =~ m/\D/) {
print "There is at least one character in $string1";
print " that is not a digit.\n";
}
|
^
|
Matches the beginning of a line or string.
|
$string1 = "Hello World\n";
if ($string1 =~ m/^He/) {
print "$string1 starts with the characters 'He'\n";
}
|
$
|
Matches the end of a line or string.
|
$string1 = "Hello World\n";
if ($string1 =~ m/rld$/) {
print "$string1 is a line or string ";
print "that ends with 'rld'\n";
}
|
\A
|
Matches the beginning of a string (but not an internal line).
|
$string1 = "Hello\nWorld\n";
if ($string1 =~ m/\AH/) {
print "$string1 is a string ";
print "that starts with 'H'\n";
}
|
\z
|
Matches the end of a string (but not an internal line).[37]
|
$string1 = "Hello\nWorld\n";
if ($string1 =~ m/d\n\z/) {
print "$string1 is a string ";
print "that ends with 'd\\n'\n";
}
|
[^...]
|
Matches every character except the ones inside brackets.
|
$string1 = "Hello World\n";
if ($string1 =~ m/[^abc]/) {
print "$string1 contains a character other than ";
print "a, b, and c\n";
}
|
Induction
Regular expressions can often be created ("induced" or "learned") based on a set of example strings. This is known as the induction of regular languages, and is part of the general problem of grammar induction in computational learning theory. Formally, given examples of strings in a regular language, and perhaps also given examples of strings not in that regular language, it is possible to induce a grammar for the language, i.e., a regular expression that generates that language. Not all regular languages can be induced in this way (see language identification in the limit), but many can. For example, the set of examples {1, 10, 100}, and negative set (of counterexamples) {11, 1001, 101, 0} can be used to induce the regular expression 1⋅0* (1 followed by zero or more 0s).
See also
Notes
-
^
-
^
-
^
-
^
-
^
-
^
-
^
-
^
-
^
-
^
-
^
-
^ Wall (2002)
-
^ a b c grep(1) man page
-
^ a b Hopcroft, Motwani & Ullman (2000)
-
^ Sipser (1998)
-
^ Gelade & Neven (2008)
-
^ Gruber & Holzer (2008)
-
^ Kozen (1991)
-
^ ISO/IEC 9945-2:1993 Information technology – Portable Operating System Interface (POSIX) – Part 2: Shell and Utilities, successively revised as ISO/IEC 9945-2:2002 Information technology – Portable Operating System Interface (POSIX) – Part 2: System Interfaces, ISO/IEC 9945-2:2003, and currently ISO/IEC/IEEE 9945:2009 Information technology – Portable Operating System Interface (POSIX®) Base Specifications, Issue 7
-
^ The Single Unix Specification (Version 2)
-
^
-
^ a b
-
^ Theorem 3 (p.9)
-
^ Cox (2007)
-
^ Laurikari (2009)
-
^
-
^ a b
-
^ http://stackoverflow.com/questions/7778034/replacement-for-google-code-search
-
^ The character 'm' is not always required to specify a Perl match operation. For example,
m/[^abc]/ could also be rendered as /[^abc]/ . The 'm' is only necessary if the user wishes to specify a match operation without using a forward-slash as the regex delimiter. Sometimes it is useful to specify an alternate regex delimiter in order to avoid "delimiter collision". See 'perldoc perlre' for more details.
-
^ e.g., see Java in a Nutshell — Page 213, Python Scripting for Computational Science — Page 320, Programming PHP — Page 106
-
^ Note that all the if statements return a TRUE value
-
^
References
External links
-
Regular Expressions at DMOZ
-
ISO/IEC 9945-2:1993 Information technology – Portable Operating System Interface (POSIX) – Part 2: Shell and Utilities
-
ISO/IEC 9945-2:2002 Information technology – Portable Operating System Interface (POSIX) – Part 2: System Interfaces
-
ISO/IEC 9945-2:2003 Information technology – Portable Operating System Interface (POSIX) – Part 2: System Interfaces
-
ISO/IEC/IEEE 9945:2009 Information technology – Portable Operating System Interface (POSIX®) Base Specifications, Issue 7
|
|
|
|
Each category of languages, except those marked by a *, is a proper subset of the category directly above it. Any language in each category is generated by a grammar and by an automaton in the category in the same line.
|
|
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.
|