Matching
Subpatterns do a bit more than let you define parts of a pattern to which alternation or repetition apply. When a match is found, it’s possible to obtain not only the substring from the original string that matched the entire pattern, but also substrings that were matched by subpatterns. <?php if (preg_match('/foo(bar)?(baz)?/', $string, $match) == 1) { print_r($match); } ?> The third parameter to preg_match(), $match, will be set to an array of match data if a match is found. That array will contain at least one element: the entire substring that matched the pattern. Any elements that follow will be subpattern matches with an index matching that subpattern’s position within the pattern. That is, the first subpattern will have the index 1, the second subpattern will have the index 2, and so on. If a pattern is conditional (i.e. uses ?) and not present, it will either have an empty element value in the array or no array element at all. <?php if (preg_match('/foo(bar)?/', 'foo', $match) == 1) { // $match == array('foo'); } if (preg_match('/foo(bar)?(baz)?/', 'foobaz', $match) == 1) { // $match == array('foo', '', 'baz'); } ?>
Subpatterns can also contain other subpatterns. <?php if (preg_match('/foo(ba(r|z))?/', 'foobar', $match) == 1) { // $match == array( 'foobar', 'bar', 'r'); } ?> Aside from passing $match to print_r() or a similar function, an easy way to tell what a subpattern’s position will be in $match is to count the number of opening parentheses in the pattern from left to right until you reach the desired subpattern. Using the syntax shown above, any subpattern will be captured (i.e. have its own element in the array of matches). Captured subpatterns are limited to 99 and total subpatterns, captured or no, is limited to 200. While this realistically shouldn’t become an issue, it’s best to denote subpatterns that do not require capture using (?: instead of (to begin them. Additionally, since PHP 4.3.3, subpatterns may be assigned meaningful names to be used as their indices in the array of matches when they are captured. To assign a name to a subpattern, begin it with the syntax (?P<name> instead of ( where name is the name you want to assign to that subpattern. This has the advantage of making code more expressive and easier to maintain as a result. © PCRE Extension — Web Scraping >>> Back to TABLE OF CONTENTS <<< | |
Views: 331 | |
Total comments: 0 | |