Home > Backend Development > PHP Tutorial > php 获取数据乱码问题

php 获取数据乱码问题

WBOY
Release: 2016-06-23 13:41:38
Original
1150 people have browsed it

file_get_contents 采集一个页面的数据,获取的数据是乱码,已经使用了检测编码的方式,
检测的是utf-8,我的页面编码也是utf-8,但是还是显示乱码,不知道为什么

$url="xxx";$opts = array(   'http'=>array(     'user_agent' => "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)",  ) ); $context = stream_context_create($opts); $neirong = file_get_contents($url, false, $context); header("content-Type: text/html; charset=Utf-8"); 	 ob_end_flush();	 $encode = mb_detect_encoding($neirong, array("ASCII","UTF-8","GB2312","GBK","BIG5")); 	         echo $encode."<br>";				if ($encode!="UTF-8")		{     $neirong=mb_convert_encoding($neirong, "UTF-8", $encode); 	  	 } 	  	 echo $neirong;
Copy after login


$encode 输出:utf-8
$neirong 输出是乱码
我的页面编码是utf-8


回复讨论(解决方案)

...$neirong = file_get_contents($url, false, $context);echo base64_encode($neirong);
Copy after login
Copy after login

贴出结果

...$neirong = file_get_contents($url, false, $context);echo base64_encode($neirong);
Copy after login
Copy after login

贴出结果



是一篇文章,结果太长了,我贴一段吧

77u/ICAgIOiwjeeWlOahteaYlOaZhOixkeebqOaak++8jOmAveS6juWHkeS+k+WyqeeGmueeg+ebqOa0headremAv
Copy after login

$c = '77u/ICAgIOiwjeeWlOahteaYlOaZhOixkeebqOaak++8jOmAveS6juWHkeS+k+WyqeeGmueeg+ebqOa0headremAv';echo base64_decode($c);
Copy after login
Copy after login


谍疔?昔????,?于凑?岩????杭??
?有??啊。最後那?是因?你的base64 不完整。

$c = '77u/ICAgIOiwjeeWlOahteaYlOaZhOixkeebqOaak++8jOmAveS6juWHkeS+k+WyqeeGmueeg+ebqOa0headremAv';echo base64_decode($c);
Copy after login
Copy after login


谍疔?昔????,?于凑?岩????杭??
?有??啊。最後那?是因?你的base64 不完整。


正确的输出应该是 “让田树新无语的是,这个六伯居然真的没有这方面的意思。 ”
是乱码啊

把?集的地址?出?。

把?集的地址?出?。



这个是采集数据地址
http://www.ziyouge.com/conbdhekbefiab

这个是它网站的显示页面
http://www.ziyouge.com/zy/4/4980/1333249.html

采集地址的数据不正常,但它的页面显示正常

需要??集的??做些?理才行。

<?php  $content =<<<TXT?缳?暾?厥??仪,?逝???塞析泶伤京伞仪,乱?逼杭?趋??嘁??鬼???,???杭乱廖?妍?,?乱?????, ??趋,??????伞,伺佐嘁仪?于??晚,于于膜缆??彤???侠?, ????艰??:“?啉,????俄仪,泶??俄???京?,浅?仃?京叭?,???,酡?????……” “茱螫呻,” ?于??勿?奔?伲呲?侄俪, 运膜??,变??于?竣诊?,????,??鬼???姗,腾???,?垌硗??,咋嘶,咋嘶?唱,??姝?性??乱?,雳晚酡杭呓??吵剞俪, ????浆予伞??瞠瞿酡?仪,??,??豕??, 径???仪伺佐?瞠,予伞?功拮膜运仪坂俪, “?乱?,酡?仪?于?仪,径?护?逼???,”????彤?畚??,?????, “?垌?径护?,”径??俪????春逝垴?,泶????:“酡?俄于?减???,乱???忻??京?暾??逼剞俪幅?呻,” “??京仪,”?????面?,乱??姝?午??幕杭伤京??,??拮?幕德?仪猓翎伞?,雳晚??拷?, 径??妞?伺?云???,????仪予??,?仪奔“?”,昝??????:“俄亏?妍??,??叭?,撂剞仪?????斩,酡盟?妒??仪,?仲析??按缜、疙?呔??乱怩?仪,” “??俄?暾?幅?杭伤京??,”???????陛:“?斩??仃??妍??,拢?雍?呜趸趸锕,拷拷?佟伞乱?寰仪呻,” “?,”径??创??奔?:“俄??迟?,??京尝晷?列,侪??护?觥幅?,逼乱????佚?床俪????唱,乱????拮?淬?侄彤仪,???杭于?缋??鬼?,?京?京叭?,仲??于??????酡?仪,妍妍暾?垌?????,??垌呵?于鬼???,役伺剞俪,俄?,?????佟,” “吃俪?啉逼佾???,杭竞施施?佐,”咐????逼佾??,????浆?功?伺???嵫?,岈????,???????仪, “施俄于?伞?,旆??,?K?俄佐,”径????械宴,恶奔?:“俄佐?呲?旆垌????????,?仪诊?,觥俄佐??,” “????,?忻???呻,”?浆乱??径???叭?,飚?????:“???姗诊茕,俄?京?迟潦?,乱拢于?杯启,??悯?,??京???廖?,觥乱觥?蚵?蚵?,?逼泶杭姗杯启,” “?,” 坂箸伺???于?畚, ?浆逼泶吱??俪?京坂?,?于伞??径???搴仪跛俪:“?浅??,?冬???幕??予,?俄???,予剪锃乱??寰?仪,” “?仪,”???乱???:“伺乱??彤于?筵,俄?仲?,” 径???仪?奔,拮?浆??,伲??锕卩?捣剞予??锕,??伺:“坂吟?于原疃,泶?乱觥??,?性郑靛?,” ???????咪?板?姝,运膜?仪伺?瞠,??永?枞烙,?功??????染嘁?,??“?嗉仃泶??”???, ?浆???,伲????剞缜幢拮袤?仪?,变乱旆??豁,?垌伺缬仲????勿诊茕?叭?仪, ????筵?:“???缜,?佐析俪??????嘁?,俄???俪,???佐???,??逼???哗,护??佐逼?????咋,??,俄床??浆吟原集,茕??锕,坂俪拢?按镤,” 拢?揶????仪越,运??忤?尚?吟, 径???晚仃???幕??杭??仪,???尚?凶猓仪?幕,呓????彤于?筵,听?????杯启,???逼?仪?幕,???京??畚碜?吟,?????佾疃?, 仲?呵????仪?吟,???拮??????仪, “逼杭伤京伤京?呻,”???拐???, 径??析俪????,侪????伺???,性??功?乱?仪,??京?,逼?俪泶杭伞???????幕?豁,性?????,乱?忻??仪跛俪, ?仪?奔?:“??厮??,乱??咪垌尚?,垌???昝?湛,脓??杭??,拮俄??谥捣剞俪,谍?椤桉??,” ?????撑?,运???, 杭?啖伞?“?伞?????????,” ????奔乱唱?运???,???恶仪径??,??励吟,???勿吟搌???:“俄吟?咋,” ??????,?俪杭“厮???厮?泽”??呛,???护????,春逝?杭??洗,?乱吮???,???径??搌仪?吟, ????洗??乱?京?,????仃乱佾????杭伤京??,侪??府伞佐仄?忤洗?了?垢,听?女妍,茕茕???,床???,听?听??,??泶拮?????偶, ???????泶拮径??仃??俪,捭垌??谍姝嘁??, ??硗夥?剜??仪?俪,恶??径???:“俄?仪呻,” “???仪,?京?,”径??仃恶仪跛俪, “?缳瘩,殄?施,”???伲????剞?枞,?仪??,运???, ?于?枞,??垌解???滢?????,呲俪拮疙泄拇?俪,?廖仪,????,乱????逼?乱?,?板乱觥??, 径??变乱?乱??恶??:“????缳瘩,???吵?,??,??乱??,?姬乱??,???泶伞?施,??,” ?????乱?豁,??忤励?, 径???伺泶杭坂?,???仪剞俪, ?逝?,?逝髦?:“???,俄于?减?,??咐?俄佐?另??杭伞??,??伞???于伞?,咐??俄,彤?拢仪俄??,拮护杭??诊酡?仪?邱,?呲,俄变???膜?,聱??,俄??俄???,俄仪乱跛?,” ????乱???仪腾?,坂??俪, ??????,?于?竣诊???姗,?膜酡?济?,?挡浇唤侈???,?谭???,?乱???于??姗?,??伤京护?,仃乱?伤京??娴,???于????姗?,?于???勿,?性?徕,尝晷吻侈?姗???, ???坂??俪,????姝??励, “俄坂俪廖伤京,?????呻,”径??????侈性仪,佣便?京????岬,凌酡卺?仪济?, 剞妫?,?呵?恹姗伞瞠济????,?泶杭????京悉?,吱?膜?厥剪???,???????姝?浇,?泶杭?伤京豁, 苓亩,径??逼垌?于乱?,???杭仿?陛仪,?姗伞逼?????,?挡??,????,乱?忻?筵仪剞俪,?凌酡??仪,逼挡伤京, “?????,俄??京亩?乱??,”???怒怒筵仪筵?:“觥乱觥嘁染??,??溘?呲??网,” “?减?……”径??析俪?缳??仪谜?,侪??咐???乱侪乱?懔?幕,吱???幕,?奔听吼?仪,乱?变???靛乱??,???抱???????仪,?垌泶?趋仪, ???怒筵仪???:“?????俄咐咋,” “?觥咐俄???,”径??髦仪?奔,乱?变?床???板????,?板伺????懔伞??, ????仪??宴彤??仪跛俪,变???????杖:“Ha,垌???????,杭?于???垌?浇……” “湛?,俄?筵?,”径???咐?谱,?乱?听髦?,析??伺觥?懔?幕哆,变?垌?筵?幕????,垌挝龟宴?浇, 侪????,变泶杭??,???网?仪?吟:“仃乱??姝伲??俪,ha,???????侈,乱????拮姝拿?,姝?垌觥???吟,” ?奔?刭,??遛??匍????侈???,徉?彤俪,径??乱?咐?咪仪,?怼仪?髦, “??????,?俄乱觥乱觥?浇,”垌?于晚?,?奔苦???,?????嘟????听???奔侄逢仪剞俪, 径???功??,?幕?性???仪??性???,伲???,??泶??佾???瘠?、???,垌????,?佾?仪, 济?乱?忻,??妞??剞,侪???变乱???侈、?岬?济?,????、嘀查?浇?济?, 德??????呲:“?,垌???,谍?床俄床俄坂吟,” 径???仃搋?乱??幕,仃乱?????????,?励?勿?拮伺??, 姝泶杭????,????瞠瞿?仃徉?仪?仿?渊,??励,?佚?…… “?仪,??仪,?遥俄坂尚,”迟迟?仪?径????脍,????勿谍?幕?????????, 径??柢彤仪???,???:“俄???京?,?京???垴???,” 侪???????膜??缣,乱旆??伤京,人人?床????吟姝????昝, 析今?吵??17K???,箐?晚???犬?尝!TXT;$result = '';$str_length = mb_strlen($content);$i=0;while ($i<=$str_length){	$temp_str=mb_substr($content,$i,1);	$ascnum=Ord($temp_str);	if ($ascnum>=224){		$result .= change(mb_substr($content,$i,3));		$i=$i+3;	}else{		$result .= mb_substr($content,$i,1);		$i=$i+1;	}}echo $result;// ?理function change($str){	$ignore = array('“','”','!','…',':',',',',');	if(in_array($str, $ignore)){		return $str;	}	$prefix = "%u";	$postfix = "";    $str = iconv('UTF-8', 'UCS-2', $str);    $arrstr = str_split($str, 2);    $unistr = '';    for($i = 0, $len = count($arrstr); $i < $len; $i++) {        $tmp = hexdec(bin2hex($arrstr[$i]));		$tmp = str_pad(dechex($tmp),4,'0',STR_PAD_LEFT);		$tmp = decrypt(substr($tmp,2,2).substr($tmp,0,2));		$unistr .= $prefix . $tmp . $postfix;    }	return unescape($unistr);}// 解密function decrypt($d){	$result .= str_pad(dechex(hexdec($d)-100),4,'0',STR_PAD_LEFT);	return $result;}// ?中文function unescape($str) {      $ret = '';      $len = strlen ( $str );      for($i = 0; $i < $len; $i ++) {          if ($str [$i] == '%' && $str [$i + 1] == 'u') {              $val = hexdec ( substr ( $str, $i + 2, 4 ) );              if ($val < 0x7f)                  $ret .= chr ( $val );              else if ($val < 0x800)                  $ret .= chr ( 0xc0 | ($val >> 6) ) . chr ( 0x80 | ($val & 0x3f) );              else                  $ret .= chr ( 0xe0 | ($val >> 12) ) . chr ( 0x80 | (($val >> 6) & 0x3f) ) . chr ( 0x80 | ($val & 0x3f) );              $i += 5;          } else if ($str [$i] == '%') {              $ret .= urldecode ( substr ( $str, $i, 3 ) );              $i += 2;          } else              $ret .= $str [$i];      }      return $ret;  }   ?> 
Copy after login



已经是晚上十一点了,路边的拍档基本?什么人了,不过还有三货依然喝的兴高采烈的,而且大有不干到天明,誓不罢休的意思,

fdipzone ,用你的方法还是输出的乱码,对解密不太了解

你在?出的html中加入

它源??是做了些??的,我那?程序已?是????了。

我把?集的也?出?,直接?行就可以了。

<?php  //http://www.ziyouge.com/conbdhekbefiab//http://www.ziyouge.com/zy/4/4980/1333249.html// ?取程序$url = 'http://www.ziyouge.com/conbdhekbefiab';$headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36';$headerArr = array();  foreach( $headers as $n => $v ) {      $headerArr[] = $n .':' . $v;   }$ch = curl_init();curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);	curl_setopt($ch, CURLOPT_URL, $url);curl_setopt($ch, CURLOPT_HTTPHEADER , $headerArr );  //构造IPcurl_setopt($ch, CURLOPT_REFERER, 'http://www.ziyouge.com/');   //构造来路$content = curl_exec($ch);$content = substr($content,3);if($error=curl_error($ch)){	die($error);}curl_close($ch);// 分析程序$result = '';$str_length = mb_strlen($content);$i=0;while ($i<=$str_length){	$temp_str=mb_substr($content,$i,1);	$ascnum=Ord($temp_str);	if ($ascnum>=224){		$result .= change(mb_substr($content,$i,3));		$i=$i+3;	}else{		$result .= mb_substr($content,$i,1);		$i=$i+1;	}}echo '<meta http-equiv="content-type" content="text/html;charset=utf-8">';echo $result;// ?理function change($str){	$ignore = array('“','”','!','…',':',',',',');	if(in_array($str, $ignore)){		return $str;	}	$prefix = "%u";	$postfix = "";    $str = iconv('UTF-8', 'UCS-2', $str);    $arrstr = str_split($str, 2);    $unistr = '';    for($i = 0, $len = count($arrstr); $i < $len; $i++) {        $tmp = hexdec(bin2hex($arrstr[$i]));		$tmp = str_pad(dechex($tmp),4,'0',STR_PAD_LEFT);		$tmp = decrypt(substr($tmp,2,2).substr($tmp,0,2));		$unistr .= $prefix . $tmp . $postfix;    }	return unescape($unistr);}// 解密function decrypt($d){	$result = str_pad(dechex(hexdec($d)-100),4,'0',STR_PAD_LEFT);	return $result;}// ?中文function unescape($str) {      $ret = '';      $len = strlen ( $str );      for($i = 0; $i < $len; $i ++) {          if ($str [$i] == '%' && $str [$i + 1] == 'u') {              $val = hexdec ( substr ( $str, $i + 2, 4 ) );              if ($val < 0x7f)                  $ret .= chr ( $val );              else if ($val < 0x800)                  $ret .= chr ( 0xc0 | ($val >> 6) ) . chr ( 0x80 | ($val & 0x3f) );              else                  $ret .= chr ( 0xe0 | ($val >> 12) ) . chr ( 0x80 | (($val >> 6) & 0x3f) ) . chr ( 0x80 | ($val & 0x3f) );              $i += 5;          } else if ($str [$i] == '%') {              $ret .= urldecode ( substr ( $str, $i, 3 ) );              $i += 2;          } else              $ret .= $str [$i];      }      return $ret;  }   ?> 
Copy after login

你在?出的html中加入

它源??是做了些??的,我那?程序已?是????了。

我把?集的也?出?,直接?行就可以了。

[/code]



出现乱码是因为php版本不同,我在5.3.28测试正常,在PHP 6.0.0-dev 中测试就是乱码,是不是PHP 6.0.0-dev缺少了什么组件

可能吧,dev版。。

可能吧,dev版。。


本地 5.3.28正常,换到服务器5.3.28又出现乱码了。。。
liunx的环境 ,本地是 ubuntu,服务器是Debian

估??php mb string 的版本有?。
?境??只能靠你自己?理了,我???有??多?境。

估??php mb string 的版本有?。
?境??只能靠你自己?理了,我???有??多?境。


已经发现问题所在。不同平台下
$str = iconv('UTF-8', 'UCS-2', $str); //这句输出的结果不同
//例如:$str="?"; $str = iconv('UTF-8', 'UCS-2', $str);正常的结果是“V^”;不正常的结果是“^V”;请教这个该怎么解决
Copy after login

找到方法了。。不同平台转换的usc-2编码不同
对于 UCS-2, linux 下默认是 UCS-2BE。用iconv(指定UCS-2)来转换生成的是 UCS-2BE 的 unicode。如果转换windows平台过来的 UCS-2, 需要指定 UCS-2LE。



$str = iconv('UTF-8', 'UCS-2', $str);
改?
$str = iconv('UTF-8', 'UCS-2LE', $str);

就可以了。

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template