|
|
PhpDig.net
|
What is PhpDig?
PhpDig is a PHP MySQL based
Web Spider & Search Engine.
|
Regular Expression Functions
OverviewRegular
expressions (often shortened to just regex)
are patterns that describe a set of strings. Regular
expressions are constructed analogously to arithmetic
expressions, by using various operators to combine
smaller expressions. They are a very powerful tool for
matching or replacing text. PHP has a function-oriented
interface to regular expressions, as opposed to Perl,
where regular expressions are implemented at the language
level. PHP supports two kinds of regular expressions out
of the box: POSIX Extended and Perl-compatible. The
functions for both are similar, but Perl-compatible
expressions support many more options and are considered
to be faster and more powerful in general.
POSIX Extended Regular Expression
FunctionsThese functions all take a regular
expression string as their first argument. PHP uses the
POSIX extended regular expressions as defined by POSIX
1003.2. For a full description of POSIX regular
expressions, see the regex man pages included in the
regex directory in the PHP distribution. These man pages
are usually in section 7, so to view them run one of the
following commands (depending on your system):
man 7 regex
man -s 7 regex
A basic lesson on POSIX regular expressions is available
at http://www.delorie.com/gnu/docs/rx/rx_3.html,
and a tutorial-style introduction can be found at http://www.htmlwizard.net/resources/tutorials/regex_intro.html.
The POSIX Extended Regular Expression functions all (save
split() and spliti() have names beginning
with ereg.
Perl-Compatible Regular Expression
FunctionsPerl-compatible regular expression (PCRE)
functions bring Perl's parsing power to PHP. The syntax
of the regular expression patterns is almost the same as
Perl's, except for a few custom PHP-specific
modifications. Every pattern should be enclosed by the
delimiters. Any character can be used as a delimiter as
long as it's not alphanumeric or a backslash. When the
delimiter character has to be used in the pattern itself,
it needs to be escaped by a backslash. As in Perl, the
ending delimiter may be followed by optional modifiers
that affect how the matching and pattern processing is
done.
The Perl-compatible Regular Expression functions all have
names beginning with preg.
Pattern ModifiersThe following
list contains all possible modifiers supported by the
PCRE functions in PHP. Some Perl-specific modifiers are
not supported, and conversely, there are some modifiers
that Perl doesn't have.
-
i
-
When this modifier is used,
the matching of alphabetic characters in the
pattern becomes non-case-sensitive; for example,
"/sgi/i" matches both "sgi" and "SGI." This is
equivalent to Perl's /i modifier.
-
m
-
By default, PCRE treats the
subject string as consisting of a single "line" of
characters (even if it actually contains several
newlines). The "start of line" metacharacter
(^) matches only at the start of the
string, while the "end of line" metacharacter
($) matches only at the end of the string,
or before a terminating newline (unless the
D modifier is also set). This is the same
as in Perl.
When this modifier is used, the "start of line" and
"end of line" constructs match immediately
following or immediately before any newline in the
subject string, respectively, as well as at the
very start and end. This is equivalent to Perl's
/m modifier. If there are no "\n"
characters in a subject string, or no occurrences
of ^ or $ in a pattern, setting
this modifier has no effect.
-
s
-
When this modifier is used,
a dot metacharacter (.) in the pattern
matches all characters, including newlines. Without
it, newlines are excluded. This modifier is
equivalent to Perl's /s modifier. A
negative class such as [^a] always matches
a newline character, independent of the setting of
this modifier.
-
x
-
When this modifier is used,
whitespace data characters in the pattern are
ignored except when escaped or inside a character
class, and characters between an unescaped
# outside a character class and the next
newline character, inclusive, are also ignored.
This is equivalent to Perl's /x modifier,
and makes it possible to include comments inside
complicated patterns. Note, however, that this
applies only to data characters. Whitespace
characters cannot appear within special character
sequences in a pattern, for example within the
sequence (?(, which introduces a
conditional subpattern.
-
e
-
When this modifier is used,
preg_replace() does normal substitution of
references in the replacement string, evaluates it
as PHP code, and uses the result of the evaluation
for replacing the match found by the pattern.
Only preg_replace() uses this modifier; it's
ignored by other PCRE functions.
-
A
-
When this modifier is used,
the pattern is forced to be "anchored"; that is,
it's constrained to match only at the start of the
string that's being searched (the "subject
string"). This effect can also be achieved by
appropriate constructs in the pattern itself, which
is the only way to do it in Perl.
-
D
-
When this modifier is used,
a dollar metacharacter ($) in the pattern
matches only at the end of the subject string.
Without this modifier, a dollar sign also matches
immediately before the final character if it's a
newline (but not before any other newlines). This
modifier is ignored if the /m modifier is
set. There is no equivalent to this modifier in
Perl.
-
S
-
When a pattern is going to
be used several times, it's worth spending more
time analyzing it in order to speed up the time
taken for matching. When this modifier is used,
this extra analysis is performed. At present,
studying a pattern is useful only for non-anchored
patterns that don't have a single fixed starting
character. This is equivalent to the study()
function in Perl.
-
U
-
This modifier inverts the
"greediness" of the quantifiers so that they're not
greedy by default, but become greedy if followed by
"?". Greedy quantifiers attempt to match as much of
the target string as they legally can. The only
limit on this behavior is that the greediness of
one quantifier cannot cause the following other
quantifiers in the pattern to fail. This
modifier is not compatible with Perl.
-
X
-
This modifier turns on
additional functionality of PCRE that is
incompatible with Perl. Any backslash in a pattern
that's followed by a letter that has no special
meaning causes an error, thus reserving these
combinations for future expansion. By default, as
in Perl, a backslash followed by a letter with no
special meaning is treated as a literal. At
present, no other features are controlled by this
modifier.
PHP Functions Essential Reference. Copyright © 2002 by New Riders Publishing
(Authors: Zak Greant, Graeme Merrall, Torben Wilson, Brett Michlitsch).
This material may be distributed only subject to the terms and conditions set forth
in the Open Publication License, v1.0 or later (the latest version is presently available at
http://www.opencontent.org/openpub/).
The authors of this book have elected not to choose any options under the OPL. This online book was obtained
from http://www.fooassociates.com/phpfer/
and is designed to provide information about the PHP programming language, focusing on PHP version 4.0.4
for the most part. The information is provided on an as-is basis, and no warranty or fitness is implied. All
persons and entities shall have neither liability nor responsibility to any person or entity with respect to
any loss or damage arising from the information contained in this book.
|