Locating Nodes

Two methods of the DOMDocument class allow you to reduce the number of nodes you have to traverse to find the data you want fairly quickly.

getElementById attempts to locate a single element that meets two criteria: 1) it is a descendant of the document’s root element; 2) it has a given id attribute value. If such an element is found, it is returned as a DOMElement instance; if not, null is returned.

getElementsByTagName attempts to locate all elements that meet two criteria: 1) it is a descendant of the document’s root element; 2) it has a given element name (such as ul). This method always returns a DOMNodeList of any found elements. The DOMNodeList class has a length property that will be equal to 0 if no elements are found. It is also iterable, so it can be used as the subject of a foreach loop.

The DOMElement class also has a getElementsByTagName method, which functions the same way with the exception that located elements will be descendants of that element instead of the document’s root element.

<?php
// One way get the list items in the last example
$listItems = $doc->getElementsByTagName('li');

// A slightly more specific way (better if there are multiple lists) if ($list = $doc->getElementById('thelist')) {
$listItems = $list->getElementsByTagName('li');
}

// Yet another way if the list doesn't have an id $lists = $doc->getElementsByTagName('ul'); if ($lists->length) {
$list = $lists->item(0);
$listItems = $list->getElementsByTagName('li');
}

// Outputs "thelist" (without quotes) echo $list->getAttribute('id');
// Outputs "Foo" on one line, then "Bar" on another foreach ($listItems as $listItem) {
echo $listItem->nodeValue, PHP_EOL;
}

// Outputs text content inside <ul id="thelist"> and </ul>
echo $list->nodeValue;
?>

© DOM Extension — Web Scraping

>>> Back to TABLE OF CONTENTS <<<
Category: Article | Added by: Marsipan (01.09.2014)
Views: 361 | Rating: 0.0/0
Total comments: 0
avatar