About
Multilingual Regular Expression Syntax (Pattern) in Php are an implementation of Perl (PCRE) with the PCRE library (See library).
Therefore, the syntax for patterns used in these functions closely resembles to Perl (PCRE) but not totally. See Perl Differences
Articles Related
Syntax
The pattern must be enclosed by delimiters.
DelimiterPatternDelimiter[Modifiers]
where:
Delimiter
A delimiter:
- is generally a forward slash /.
- can be any non-alphanumeric, non-backslash, non-whitespace character.
- can not be the backslash (\) and the null byte.
If the delimiter character has to be used in the expression itself, it needs to be escaped by backslash.
Bracket (), {}, [] and <> must be escaped when they are used as literal characters.
The preg-quote function can be used to pre-escape a string pattern with a specified delimiter:
Example:
$keywords = '$40 for a g3/400';
$keywords = preg_quote($keywords, '/'); // The delimiter is a forward slash
echo $keywords; // returns \$40 for a g3\/400 // The dollar and / characters were quoted
Modifiers
The pattern modifiers are the regular expression flags and are located after the ending delimiter.
Example of case-insensitive matching:
#[a-z]#i
List:
- g modifier: global. All matches (don't return on first match)
- m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
- i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
- x modifier: extended. Spaces and text after a # in the pattern are ignored
- X modifier: eXtra. A \ followed by a letter with no special meaning is faulted
- s modifier: single line. Dot matches newline characters
- u modifier: unicode: Pattern strings are treated as UTF-16. Also causes escape sequences to match unicode characters
- U modifier: Ungreedy. The match becomes lazy by default. Now a ? following a quantifier makes it greedy
- A modifier: Anchored. Pattern is forced to ^
- J modifier: Allow duplicate subpattern names
Example
File Root Detection
From the doc: dirname, detection of the file system root.
dirname('.'); // Will return '.'.
dirname('/'); // Will return `\` on Windows and '/' on *nix systems.
dirname('\\'); // Will return `\` on Windows and '.' on *nix systems.
dirname('C:\\'); // Will return 'C:\' on Windows and '.' on *nix systems.
The pattern is then
$isRoot = preg_match("/(^\.|\\\\|[a-z]:\\\\)$/i", $path)
Valid Pattern
/foo bar/
#^[^0-9]$#
+php+
%[a-zA-Z0-9_-]%
(this [is] a (pattern))
{this [is] a (pattern)}
[this [is] a (pattern)]
<this [is] a (pattern)>
/<\/\w+>/
|(\d{3})-\d+|Sm
/^(?i)php[34]/
{^\s+(\s+)?$}
Invalid patterns
/href='(.*)' // missing ending delimiter
/\w+\s*\w+/J // unknown modifier 'J'
1-\d3-\d3-\d4| // missing starting delimiter
Functions
See book.pcre
preg_match
- returns a boolean if there is a match
- returns the group in the third arguments matches
Example:
$pattern = "carbon|eva";
if (preg_match("/$pattern/i",$pathString) === 1){
echo "We have a match";
}
Management
Configuration
Library
By default, this extension is compiled using the bundled PCRE library. Alternatively, an external PCRE library can be used by passing in the –with-pcre-regex=DIR configuration option where DIR is the location of PCRE's include and library files.
Runtime
Escape
escape character = backslash \
Example: Separate an HTML page with the p element node '</p>'
$localCount = count(preg_split("/<\/p>/",$section['content']));