让CURL和Guzzle跟踪301跳转的重定向链接
页面抓取的时候少不了碰到301/302重定向的情况,重定向页面的内容源码:
<html><head><title>302 Moved Temporarily</title><script type="text/javascript" src="/store/dtagent628_ICA23r_1019.js" data-dtconfig="rid=RID_-647728319|rpid=207841282|domain=***.com|lab=1|async=1|tp=1000,50,3,1|reportUrl=dynaTraceMonitor"></script></head> <body bgcolor="#FFFFFF"> <p>This document you requested has moved temporarily.</p> <p>It's now at <a href="http://www.***.com/home/more-countries?currPageURL=http://www.***.com/en-us/products">http://www.***.com/home/more-countries?currPageURL=http://www.***.com/en-us/products</a>.</p> </body></html>
这样的情况下你是抓不到任何有价值的数据的。
解决方法
1.使用 curl
需要加入一个参数即可:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
参考链接:http://stackoverflow.com/questions/3519939/make-curl-follow-redirects
2.使用Guzzle
$response = $client->request('GET', 'http://github.com', [ 'allow_redirects' => true ]);
参考链接: http://docs.guzzlephp.org/en/latest/quickstart.html#redirects