php curl 抓網頁數據問題,聽說內地神人很多,求神人解
因工作需求,需要抓取别人网站的数据,使用php + curl 但是遇到问题无解
听说内地的神人很多,请各位神人帮帮小弟,来自台湾的小弟已经爬文爬文三天了。
网址:http://www.cbssports.com/mlb/scoreboard
然后,选择下方正在比赛中的赛事,点选GAMETRACKER 就可以看到直播
问题来了
以这个网址为例:(当各位大大看到时,也许赛事已经结束了)
ttp://www.cbssports.com/mlb/gametracker/live/MLB_20140527_TB@TOR
小弟写的程序如下:
<code>$game=array(); $ch = curl_init(); $search1=$_GET['searcharg']; $url ="http://www.cbssports.com/mlb/gametracker/live/MLB_20140527_TB@TOR"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11"); $data = curl_exec($ch); curl_close($ch); preg_match_all('/<span class="teamLocation">(.*?)/is',$data,$teamCity); </span></code>
….(进行字符串解析)
目前已知问题:
不管是 「另存新檔」save as ,还是 检视原始档 ,一些该出现的html都没有出现,例如:
原网站为:
<code><tr id="current-pitcher"> <td><img src="/static/imghw/default1.png" data-src="http://sports.cbsimg.net/images/baseball/mlb/players/60x80/1961062.jpg" class="lazy" border="0" alt="php curl 抓網頁數據問題,聽說內地神人很多,求神人解" ></td> <td> <span class="label">Pitcher:</span><span class="name"><b>M. Mariot</b> | # 48 RP</span> <br> <a href="#" class="statOpt" data-playerid="1" data-position="pitcher">Game Stats</a> <div class="game-stats">0.1 IP</div> <div class="season-stats">0-0, 5.73 ERA, 11.0 IP, 9 K's, 6 BB</div> </td> </tr> </code>
不管是另存新档的网页或是curl 抓出来的结果为
<ul class="nav">
<li class="active ingame" data-filter="current"><a href="#">Current Situation</a></li>
<li data-filter="hitchart"><a href="#">Hitting Charts</a></li>
<li data-filter="pitchchart"><a href="#">Pitching Charts</a></li>
</ul>
<div class="currentSituation ingame">
<div class="batter-pitcher fLeft">
<table>
<tr id="current-pitcher">
<td><img src="/static/imghw/default1.png" data-src="http://sports.cbsimg.net/images/baseball/mlb/players/60x80/no-photo-available.jpg" class="lazy" border="0" alt="php curl 抓網頁數據問題,聽說內地神人很多,求神人解" ></td>
<td>
<span class="label">Pitcher:</span><span class="name"> </span>
<br>
<a href="#" class="statOpt" data-playerid="1" data-position="pitcher">Game Stats</a>
<div class="game-stats">
<p>上面蓝色代表没有显示出来的,</p>
<p>目前我试过的方式,送cookie!模拟浏览器 ,还是没效,<br>
不知道各位内地的神人有没有解?请给小弟一个方向吧(跪求)</p>
<h2 id="回复内容">回复内容:</h2>
<p>因工作需求,需要抓取别人网站的数据,使用php + curl 但是遇到问题无解</p>
<p>听说内地的神人很多,请各位神人帮帮小弟,来自台湾的小弟已经爬文爬文三天了。</p>
<hr>
<p>网址:http://www.cbssports.com/mlb/scoreboard</p>
<p>然后,选择下方正在比赛中的赛事,点选GAMETRACKER 就可以看到直播</p>
<p>问题来了</p>
<p>以这个网址为例:(当各位大大看到时,也许赛事已经结束了)</p>
<p>ttp://www.cbssports.com/mlb/gametracker/live/MLB_20140527_TB@TOR</p>
<p>小弟写的程序如下:</p>
<div class="code" style="position:relative; padding:0px; margin:0px;"><pre class="brush:php;toolbar:false"><code>$game=array();
$ch = curl_init();
$search1=$_GET['searcharg'];
$url ="http://www.cbssports.com/mlb/gametracker/live/MLB_20140527_TB@TOR";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11");
$data = curl_exec($ch);
curl_close($ch);
preg_match_all('/<span class="teamLocation">(.*?)/is',$data,$teamCity);
</span></code></pre><div class="contentsignin">Copy after login</div></div>
<p>….(进行字符串解析)</p>
<p>目前已知问题:<br>
不管是 「另存新檔」save as ,还是 检视原始档 ,一些该出现的html都没有出现,例如:<br>
原网站为:</p>
<div class="code" style="position:relative; padding:0px; margin:0px;"><div class="code" style="position:relative; padding:0px; margin:0px;"><pre class="brush:php;toolbar:false"><code><tr id="current-pitcher">
<td><img src="/static/imghw/default1.png" data-src="http://sports.cbsimg.net/images/baseball/mlb/players/60x80/1961062.jpg" class="lazy" border="0" alt="php curl 抓網頁數據問題,聽說內地神人很多,求神人解" ></td>
<td>
<span class="label">Pitcher:</span><span class="name"><b>M. Mariot</b> | # 48 RP</span>
<br>
<a href="#" class="statOpt" data-playerid="1" data-position="pitcher">Game Stats</a>
<div class="game-stats">0.1 IP</div>
<div class="season-stats">0-0, 5.73 ERA, 11.0 IP, 9 K's, 6 BB</div>
</td>
</tr>
</code></pre><div class="contentsignin">Copy after login</div></div><div class="contentsignin">Copy after login</div></div>
<p>不管是另存新档的网页或是curl 抓出来的结果为</p>
<div class="code" style="position:relative; padding:0px; margin:0px;"><pre class="brush:php;toolbar:false"><code><ul class="nav">
<li class="active ingame" data-filter="current"><a href="#">Current Situation</a></li>
<li data-filter="hitchart"><a href="#">Hitting Charts</a></li>
<li data-filter="pitchchart"><a href="#">Pitching Charts</a></li>
</ul>
<div class="currentSituation ingame">
<div class="batter-pitcher fLeft">
<table>
<tr id="current-pitcher">
<td><img src="/static/imghw/default1.png" data-src="http://sports.cbsimg.net/images/baseball/mlb/players/60x80/no-photo-available.jpg" class="lazy" border="0" alt="php curl 抓網頁數據問題,聽說內地神人很多,求神人解" ></td>
<td>
<span class="label">Pitcher:</span><span class="name"> </span>
<br>
<a href="#" class="statOpt" data-playerid="1" data-position="pitcher">Game Stats</a>
<div class="game-stats">
<p>上面蓝色代表没有显示出来的,</p>
<p>目前我试过的方式,送cookie!模拟浏览器 ,还是没效,<br>
不知道各位内地的神人有没有解?请给小弟一个方向吧(跪求)</p>
<p class="answer fmt" data-id="1020000000522290">
</p>
<p>额,不知道你碰到了什么问题,不过我看了下就是简单的抓取,完全没问题啊。另外,解析HTML请不要在用正则了,推荐你用一下 phpQuery 这个库,PHP抓取利器。以你给的网址为例:</p>
<pre class='brush:php;toolbar:false;'>include "phpQuery.php";
phpQuery::newDocumentFile("http://www.cbssports.com/mlb/gametracker/live/MLB_20140527_TB@TOR");
echo pq("#current-pitcher")->html();
</pre><div class="contentsignin">Copy after login</div></div>
<p><img src="/static/imghw/default1.png" data-src="http://segmentfault.com/img/bVcl2b" class="lazy" data- alt="php curl 抓網頁數據問題,聽說內地神人很多,求神人解" ></p>
<p class="answer fmt" data-id="1020000000522314">
</p>
<p>我遇到的问题 其实说穿了就是</p>
<p>...我用firbug 或 chrome debug时,所追踪的html 跟我 「检视原始档」和「另存新档」时所看到的资料不同:</p>
<p>目前有赛事 http://www.cbssports.com/mlb/gametracker/live/MLB_20140527_DET@OAK</p>
<p>神人求解</p>
<p class="answer fmt" data-id="1020000000522593">
</p>
<p>你想要抓去的html内容,是由javascript生成的,抓包工具都不会解析执行javascript的。<br>
解决方法就是用phantomjs,可以跑一个受脚本控制的,无界面的webkit。</p>
</div>
</td>
</tr>
</table>
</div>
</div>

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



PHP 8.4 brings several new features, security improvements, and performance improvements with healthy amounts of feature deprecations and removals. This guide explains how to install PHP 8.4 or upgrade to PHP 8.4 on Ubuntu, Debian, or their derivati

Visual Studio Code, also known as VS Code, is a free source code editor — or integrated development environment (IDE) — available for all major operating systems. With a large collection of extensions for many programming languages, VS Code can be c

If you are an experienced PHP developer, you might have the feeling that you’ve been there and done that already.You have developed a significant number of applications, debugged millions of lines of code, and tweaked a bunch of scripts to achieve op

This tutorial demonstrates how to efficiently process XML documents using PHP. XML (eXtensible Markup Language) is a versatile text-based markup language designed for both human readability and machine parsing. It's commonly used for data storage an

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

A string is a sequence of characters, including letters, numbers, and symbols. This tutorial will learn how to calculate the number of vowels in a given string in PHP using different methods. The vowels in English are a, e, i, o, u, and they can be uppercase or lowercase. What is a vowel? Vowels are alphabetic characters that represent a specific pronunciation. There are five vowels in English, including uppercase and lowercase: a, e, i, o, u Example 1 Input: String = "Tutorialspoint" Output: 6 explain The vowels in the string "Tutorialspoint" are u, o, i, a, o, i. There are 6 yuan in total

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

What are the magic methods of PHP? PHP's magic methods include: 1.\_\_construct, used to initialize objects; 2.\_\_destruct, used to clean up resources; 3.\_\_call, handle non-existent method calls; 4.\_\_get, implement dynamic attribute access; 5.\_\_set, implement dynamic attribute settings. These methods are automatically called in certain situations, improving code flexibility and efficiency.
