Modifiers

The reason for having pattern delimiters to denote the start and end of a pattern is that the pattern precedes modifiers that affect the matching behavior of metacharacters. Here are a few modifiers that may prove useful in web scraping applications.

  • i: Any letters in the pattern will match both uppercase and lowercase regardless of the case of the letter used in the pattern.
  • m: ~ and $ will match the beginning and ends of lines within the string (delimited by line feed characters) rather than the beginning and end of the entire string.
  • s (lowercase): The . meta-character will match line feeds, which it does not by default.
  • S (uppercase): Additional time will be spent to analyze the pattern in order to speed up subsequent matches with that pattern. Useful for patterns used multiple times.
  • U: By default, the quantifiers * and + behave in a manner referred to as “greedy.” That is, they match as many characters as possible rather than as few as possible. This modifier forces the latter behavior.
  • u: Forces pattern strings to be treated as UTF-8 encoded strings.

The example below matches because the i modifier is used, which means that the pattern matches 'a’ and 'A’.

<?php
$matches = (preg_match('/a/i', 'A') == 1);
?>

© PCRE Extension — Web Scraping

>>> Back to TABLE OF CONTENTS <<<
Category: Article | Added by: Marsipan (03.09.2014)
Views: 395 | Rating: 0.0/0
Total comments: 0
avatar