Parallel Processing

Two of the HTTP client libraries previously covered, cURL and pecl_http, support running requests in parallel using a single connection. While the same feature cannot be replicated exactly using other libraries, it is possible to run multiple requests on separate connections using processes that are executed in parallel.

Even if you are using a library supporting connection pooling, this technique is useful for situations when multiple hosts are being scraped since each host will require a separate connection anyway. By contrast, doing so in a single process means it is possible for requests sent earlier to a host with a lower response rate to block those sent later to another more responsive host.

See Appendix B for a more detailed example this.


© Tips and Tricks — Web Scraping

>>> Back to TABLE OF CONTENTS <<<
Category: Article | Added by: Marsipan (03.09.2014)
Views: 295 | Rating: 0.0/0
Total comments: 0
avatar