Matching

Subpatterns do a bit more than let you define parts of a pattern to which alternation or repetition apply. When a match is found, it’s possible to obtain not only the substring from the original string that matched the entire pattern, but also substrings that were matched by subpatterns.

<?php

if (preg_match('/foo(bar)?(baz)?/', $string, $match) == 1) { print_r($match);

}

?>

The third parameter to preg_match(), $match, will be set to an array of match data if a match is found. That array will contain at least one element: the entire substring that matched the pattern. Any elements that follow will be subpattern matches with an index matching that subpattern’s position within the pattern. That is, the first subpattern will have the index 1, the second subpattern will have the index 2, and so on.

If a pattern is conditional (i.e. uses ?) and not present, it will either have an empty element value in the array or no array element at all.

<?php
if (preg_match('/foo(bar)?/', 'foo', $match) == 1) {

// $match == array('foo');
}
if (preg_match('/foo(bar)?(baz)?/', 'foobaz', $match) == 1) { // $match == array('foo', '', 'baz');
}
?>
  • In the first example, the (bar)? subpattern ends the entire pattern and is not matched. Thus, it has no entry in $match.
  • In the second example, the (bar)? subpattern does not end the entire pattern and is not matched. Thus, it has an empty entry in $match.

Subpatterns can also contain other subpatterns.

<?php
if (preg_match('/foo(ba(r|z))?/', 'foobar', $match) == 1) {

// $match == array( 'foobar', 'bar', 'r');
}
?>

Aside from passing $match to print_r() or a similar function, an easy way to tell what a subpattern’s position will be in $match is to count the number of opening parentheses in the pattern from left to right until you reach the desired subpattern.

Using the syntax shown above, any subpattern will be captured (i.e. have its own element in the array of matches). Captured subpatterns are limited to 99 and total subpatterns, captured or no, is limited to 200. While this realistically shouldn’t become an issue, it’s best to denote subpatterns that do not require capture using (?: instead of (to begin them.

Additionally, since PHP 4.3.3, subpatterns may be assigned meaningful names to be used as their indices in the array of matches when they are captured. To assign a name to a subpattern, begin it with the syntax (?P<name> instead of ( where name is the name you want to assign to that subpattern. This has the advantage of making code more expressive and easier to maintain as a result.


© PCRE Extension — Web Scraping

>>> Back to TABLE OF CONTENTS <<<
Category: Article | Added by: Marsipan (03.09.2014)
Views: 331 | Rating: 0.0/0
Total comments: 0
avatar