PHP method to solve DOM garbled code example code

高洛峰
Release: 2023-03-03 13:28:01
Original
1347 people have browsed it

Foreword

DOM is a relatively new XML and HTML processing class in PHP. It can operate the DOM tree as conveniently as JavaScript. There are more introductions on the Internet about how it handles XML. Today, this article will introduce how PHP solves DOM garbled characters. Method, not much to say below, just look at the solution below.

The solution is as follows

/**
 * 请求url页面信息
 * @param str $url
 * @return str mixed|boolean
 */
function curl_get($url) {
  $curl = curl_init();
  curl_setopt($curl, CURLOPT_URL, $url);
  curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
  //302跳转
  curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
  curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0');
  curl_setopt($curl, CURLOPT_REFERER, $url);
  $data = curl_exec($curl);
  $code = curl_getinfo($curl,CURLINFO_HTTP_CODE); //输出请求状态码
  curl_close($curl);
  if(200 == $code) {
    //解决乱码
    if (preg_match(&#39;#<meta[^>]*charset="?gb2312"[^>]*>#&#39;, $data)) {
      $data = iconv("gb2312","utf-8//IGNORE",$data);
      $data = preg_replace(&#39;#<meta[^>]*charset="?gb2312"[^>]*>#is&#39;, &#39;<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">&#39;, $data);
    }
 
    if (!preg_match(&#39;#<meta charset="utf-8"[^>]*>#is&#39;, $data)) {
      $data = str_replace(&#39;<head>&#39;, &#39;<head><meta http-equiv="Content-Type" content="text/html;charset=UTF-8">&#39;, $data);
    }
 
    if (preg_match(&#39;#<meta charset="utf-8"[^>]*>#is&#39;, $data)) {
      $data = preg_replace(&#39;#<meta charset="utf-8"[^>]*>#is&#39;, &#39;<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">&#39;, $data);
    }
 
    return $data;
  } else {
    return false;
  }
 
}
Copy after login

​​​​

/**
 * 获取 DOMDocument 对象
 * @param str $url
 * @return boolean|DOM
 */
function getDom($url) {
  $html_content = curl_get($url);
  if(empty($html_content)) {
    //saveLog($url, &#39;请求失败&#39;);
    return false;
  }
  $dom = new DOMDocument(&#39;1.0&#39;, &#39;utf-8&#39;);
  libxml_use_internal_errors(true);
  $dom->loadHTML($html_content);
  return $dom;
}
Copy after login

​​​

$html_content = mb_convert_encoding($html_content, &#39;UTF-8&#39;, &#39;gb2312&#39;);
Copy after login

​​​



Related labels:
php
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template