PHP method to solve DOM garbled code

墨辰丷
Release: 2023-03-28 14:18:02
Original
1442 people have browsed it

I recently encountered a problem at work. When using DOM, I discovered the problem of garbled characters. Later, I finally solved it by searching for information on the Internet. Now I will share the solution with everyone. Interested friends can For reference, friends in need can come and learn together.

Preface

DOM is a relatively new xml and html processing class in PHP. It can operate the DOM tree as conveniently as javascript. More on the Internet The purpose of this article is to introduce its processing of XML. Today’s article will introduce PHP’s method of solving DOM garbled characters. I won’t go into details below and just look at the solution below.

The solution is as follows

/**
 * 请求url页面信息
 * @param str $url
 * @return str mixed|boolean
 */
function curl_get($url) {
  $curl = curl_init();
  curl_setopt($curl, CURLOPT_URL, $url);
  curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
  //302跳转
  curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
  curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0');
  curl_setopt($curl, CURLOPT_REFERER, $url);
  $data = curl_exec($curl);
  $code = curl_getinfo($curl,CURLINFO_HTTP_CODE); //输出请求状态码
  curl_close($curl);
  if(200 == $code) {
    //解决乱码
    if (preg_match(&#39;#<meta[^>]*charset="?gb2312"[^>]*>#&#39;, $data)) {
      $data = iconv("gb2312","utf-8//IGNORE",$data);
      $data = preg_replace(&#39;#<meta[^>]*charset="?gb2312"[^>]*>#is&#39;, &#39;<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">&#39;, $data);
    }

    if (!preg_match(&#39;#<meta charset="utf-8"[^>]*>#is&#39;, $data)) {
      $data = str_replace(&#39;<head>&#39;, &#39;<head><meta http-equiv="Content-Type" content="text/html;charset=UTF-8">&#39;, $data);
    }

    if (preg_match(&#39;#<meta charset="utf-8"[^>]*>#is&#39;, $data)) {
      $data = preg_replace(&#39;#<meta charset="utf-8"[^>]*>#is&#39;, &#39;<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">&#39;, $data);
    }

    return $data;
  } else {
    return false;
  }

}
Copy after login

/**
 * 获取 DOMDocument 对象
 * @param str $url
 * @return boolean|DOM
 */
function getDom($url) {
  $html_content = curl_get($url);
  if(empty($html_content)) {
    //saveLog($url, &#39;请求失败&#39;);
    return false;
  }
  $dom = new DOMDocument(&#39;1.0&#39;, &#39;utf-8&#39;);
  libxml_use_internal_errors(true);
  $dom->loadHTML($html_content);
  return $dom;
}
Copy after login

$html_content = mb_convert_encoding($html_content, &#39;UTF-8&#39;, &#39;gb2312&#39;);
Copy after login

The above is the entire content of this article, I hope it will be helpful to everyone's study.


Related recommendations:

phpDetailed explanation of avatar upload preview example

phpDetailed explanation of avatar upload preview example

Detailed graphic explanation of PHP serialization and deserialization functions

The above is the detailed content of PHP method to solve DOM garbled code. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template