让CURL和Guzzle跟踪301跳转的重定向链接

页面抓取的时候少不了碰到301/302重定向的情况,重定向页面的内容源码:

<html><head><title>302 Moved Temporarily</title><script type="text/javascript" src="/store/dtagent628_ICA23r_1019.js" data-dtconfig="rid=RID_-647728319|rpid=207841282|domain=***.com|lab=1|async=1|tp=1000,50,3,1|reportUrl=dynaTraceMonitor"></script></head>
<body bgcolor="#FFFFFF">
<p>This document you requested has moved temporarily.</p>
<p>It's now at <a href="http://www.***.com/home/more-countries?currPageURL=http://www.***.com/en-us/products">http://www.***.com/home/more-countries?currPageURL=http://www.***.com/en-us/products</a>.</p>
</body></html>

这样的情况下你是抓不到任何有价值的数据的。

解决方法

1.使用 curl

需要加入一个参数即可:

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

参考链接:http://stackoverflow.com/questions/3519939/make-curl-follow-redirects

2.使用Guzzle

$response = $client->request('GET', 'http://github.com', [
    'allow_redirects' => true
]);

参考链接: http://docs.guzzlephp.org/en/latest/quickstart.html#redirects

Post Comment