Cookies

<?php
$cookiejar = '/path/to/file';
$ch = curl_init();
$url = 'http://localhost.example'; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiejar); curl_exec($ch);
$url = 'http://localhost.example/path/to/form'; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookiejar); curl_exec($ch);
curl_close($ch);
?>

Here is a quick list of pertinent points.

  • After the first curl_exec call, cURL will have stored the value of the the Set-Cookie response header returned by the server in the file referenced by '/path/to/file’ on the local filesystem as per the CURLOPT_COOKIEJAR setting. This setting value will persist through the second cu rl_exec call.
  • When the second curl_exec call takes place, the CURLOPT_COOKIEFILE setting will also point to ' /path/to/file’. This will cause cURL to read the contents of that file and use it as the value for the Cookie request header when the request is constructed.
  • If $cookiej ar is set to an empty string, cookie data will persist in memory rather than a local file. This improves performance (memory access is faster than disk) and security (file storage may be more open to access by other users and processes than memory depending on the server environment).

In some instances it may be desirable for the CURLOPT_COOKIEJAR value to have a different value per request, such as for debugging. In most cases, however, CURLOPT_COOKIEJAR will be set for the first request to receive the initial cookie data and its value will persist for subsequent requests. In most cases, CURLOPT_COOKIEFILE will be assigned the same value as CURLOPT_COOKIEJAR after the first request. This will result in cookie data being read to include in the request, followed by cookie data from the response being written back (and overwriting any existing data at that location) for use in subsequent requests. On a related note, if you want cURL to begin a new session in order to have it discard data for session cookies (i.e. cookies without an expiration date), you can set the CURLOPT_COOKIESESSION setting to true.

If you want to handle cookie data manually for any reason, you can set the value of the Cookie request header via the CURLOPT_COOKIE setting. To get access to the response headers, set the CURLOPT_HEADER and CURLOPT_RETURNTRANSFER settings to t rue. This will cause the cu rl_exec call to return the entire response including the headers and the body. Recall that there is a single blank line between the headers and the body and that a colon separates each header name from its corresponding value. This information combined with the basic string handling functions in PHP should be all you need. Also, you’ll need to set CURLOPT_FOLLOWLOCATION to false in order to prevent cURL from processing redirections automatically. Not doing this would cause any cookies set by requests resulting in redirections to be lost.


© cURL Extension — Web Scraping

>>> Back to TABLE OF CONTENTS <<<
Category: Article | Added by: Marsipan (30.08.2014)
Views: 433 | Rating: 0.0/0
Total comments: 0
avatar