file_get_contents
curl
PHP Simple HTML DOM parser
Three methods to get html, the image will not be displayed, curl also simulates the browser.
The following image and text page link is an example
WeChat image and text page
For example, get the code in html dom mode:
<code>$html = new simple_html_dom(); $html->load_file($artical_url); echo "$html";</code>
After php gets the code, the code of the first picture:
<code><img data-type="gif" data-ratio="0.29676258992805754" data-w="" width="100%" data-src="http://mmbiz.qpic.cn/mmbiz/zynprs47B4SSmGjHh9gJq59bct0TbDmksGMe4kRiaFTspugicmSwLVVfK13HdQbKIR7gaxxwF6icEVT3tCp33IOtg/0?wx_fmt=gif" style="margin: 0px; padding: 0px; width: 670px; height: auto !important; box-sizing: border-box !important; word-wrap: break-word !important; visibility: visible !important;"/></code>
Code for the browser to access the page and display the image normally:
<code><img data-type="gif" data-ratio="0.29676258992805754" data-w="" width="100%" data-src="http://mmbiz.qpic.cn/mmbiz/zynprs47B4SSmGjHh9gJq59bct0TbDmksGMe4kRiaFTspugicmSwLVVfK13HdQbKIR7gaxxwF6icEVT3tCp33IOtg/0?wx_fmt=gif" style="width: 670px !important; box-sizing: border-box !important; word-wrap: break-word !important; visibility: visible !important; height: auto !important;" _width="670px" src="http://mmbiz.qpic.cn/mmbiz/zynprs47B4SSmGjHh9gJq59bct0TbDmksGMe4kRiaFTspugicmSwLVVfK13HdQbKIR7gaxxwF6icEVT3tCp33IOtg/0?wx_fmt=gif&wxfrom=5&wx_lazy=1"></code>
What to do? ?
file_get_contents
curl
PHP Simple HTML DOM parser
Three methods to get html, the image will not be displayed, curl also simulates the browser.
The following image and text page link is an example
WeChat image and text page
For example, get the code in html dom mode:
<code>$html = new simple_html_dom(); $html->load_file($artical_url); echo "$html";</code>
After php gets the code, the code of the first picture:
<code><img data-type="gif" data-ratio="0.29676258992805754" data-w="" width="100%" data-src="http://mmbiz.qpic.cn/mmbiz/zynprs47B4SSmGjHh9gJq59bct0TbDmksGMe4kRiaFTspugicmSwLVVfK13HdQbKIR7gaxxwF6icEVT3tCp33IOtg/0?wx_fmt=gif" style="margin: 0px; padding: 0px; width: 670px; height: auto !important; box-sizing: border-box !important; word-wrap: break-word !important; visibility: visible !important;"/></code>
Code for the browser to access the page and display the image normally:
<code><img data-type="gif" data-ratio="0.29676258992805754" data-w="" width="100%" data-src="http://mmbiz.qpic.cn/mmbiz/zynprs47B4SSmGjHh9gJq59bct0TbDmksGMe4kRiaFTspugicmSwLVVfK13HdQbKIR7gaxxwF6icEVT3tCp33IOtg/0?wx_fmt=gif" style="width: 670px !important; box-sizing: border-box !important; word-wrap: break-word !important; visibility: visible !important; height: auto !important;" _width="670px" src="http://mmbiz.qpic.cn/mmbiz/zynprs47B4SSmGjHh9gJq59bct0TbDmksGMe4kRiaFTspugicmSwLVVfK13HdQbKIR7gaxxwF6icEVT3tCp33IOtg/0?wx_fmt=gif&wxfrom=5&wx_lazy=1"></code>
What to do? ?
Thanks for the answer upstairs. It should not be a problem with anti-leeching. It seems that the DOM rules cannot determine the attributes of data-src and src when crawling. After researching for a long time, I found that simple_html_dom is indeed a good thing, and it should be possible to replace it after crawling it back. Unfortunately, because I am not very familiar with PHP, the statements are always written incorrectly. Later, I used js native methods to save the country, obtained the contents of php variables and replaced them with regular rules to solve the problem.
In addition, after using html_dom, don’t forget $html->clear.
Modify curl header parameters and try it
https://segmentfault.com/q/1010000005046169
I feel like your problem is similar to this one, give it a try