Availability

Regardless of whether a web scraping application takes a real-time or batch approach, it should treat the remote service as as potential point of failure and account for cases where it does not return a response. Once a tested web scraping application goes into production, common causes for this are either service downtime or modification. Symptoms of these include connection timeouts and responses with a status code above the 2xx range.

An advantage of the batch approach in this situation is that the web scraping application’s front-facing interface can remain unaffected. Cached data can be used or updates can be stored locally and synchronization can be initiated once the service becomes available again or the web scraping application has been fixed to account for changes in the remote service.


© Tips and Tricks — Web Scraping

>>> Back to TABLE OF CONTENTS <<<
Category: Article | Added by: Marsipan (03.09.2014)
Views: 307 | Rating: 0.0/0
Total comments: 0
avatar