Loading Documents

The DOMDocument class is where use of the DOM extension begins. The first thing to do is instantiate it and then feed it the validated markup data. Note that the DOM extension will emit warnings when a document is loaded if that document is not valid or well-formed. To avoid this, see the previous chapter on using the tidy extension. If tidy does not eliminate the issue, errors can be controlled as shown in the example below. Note that errors are buffered until manually cleared, so make a point of clearing them after each load operation if they are not needed to avoid wasting memory.

<?php
// Buffer DOM errors rather than emitting them as warnings $oldSetting = libxml_use_internal_errors(true);

// Instantiate a container for the document
$doc = new DOMDocument;

// Load markup already contained within a string
$doc->loadHTML($htmlString);

// Load markup saved to an external file
$doc->loadHTMLFile($htmlFilePath);

// Get all errors if needed
$errors = libxml_get_errors();

// Get only the last error
$error = libxml_get_last_error();

// Clear any existing errors from previous operations libxml_clear_errors();

// Revert error buffering to its previous setting libxml_use_internal_errors ($oldSetting);
?>

© DOM Extension — Web Scraping

>>> Back to TABLE OF CONTENTS <<<
Category: Article | Added by: Marsipan (01.09.2014)
Views: 344 | Rating: 0.0/0
Total comments: 0
avatar