The reason for having pattern delimiters to denote the start and end of a pattern is that the pattern precedes modifiers that affect the matching behavior of metacharacters. Here are a few modifiers that may prove useful in web scraping applications.
- i: Any letters in the pattern will match both uppercase and lowercase regardless of the case of the letter used in the pattern.
- m: ~ and $ will match the beginning and ends of lines within the string (delimited by line feed characters) rather than the beginning and end of the entire string.
- s (lowercase): The . meta-character will match line feeds, which it does not by default.
- S (uppercase): Additional time will be spent to analyze the pattern in order to speed up subsequent matches with that pattern. Useful for patterns used multiple times.
- U: By default, the quantifiers * and + behave in a manner referred to as “greedy.” That is, they match as many characters as possible rather than as few as possible. This modifier forces the latter behavior.
- u: Forces pattern strings to be treated as UTF-8 encoded strings.
The example below matches because the i modifier is used, which means that the pattern matches 'a’ and 'A’.
<?php
$matches = (preg_match('/a/i', 'A') == 1);
?>
© PCRE Extension — Web Scraping
>>> Back to TABLE OF CONTENTS <<< |