為什麼使用'file_get_contents()”時 UTF-8 字元會損壞？-php教程-PHP中文網

為什麼使用'file_get_contents()”時 UTF-8 字元會損壞？

Susan Sarandon

發布： 2024-12-09 22:42:13

原創

477 人瀏覽過

Why are UTF-8 Characters Corrupted When Using `file_get_contents()`?

file_get_contents() 中斷 UTF-8 字元

從使用 UTF-8 編碼的外部伺服器載入 HTML 時會出現此問題。 ľ、š、č、ť、ž 等字元已損壞並替換為無效字元。

問題的根源

file_get_contents() 函數可能會遇到程式設計問題。預設情況下，它將資料解釋為 ASCII，無法正確處理 UTF-8 字元。

建議的解決方案

要解決此問題，請考慮使用替代編碼方法.

1.手動編碼轉換

使用mb_convert_encoding() 函式將取得的HTML 轉換為UTF-8：

$html = file_get_contents('http://example.com/foreign.html');
$utf8_html = mb_convert_encoding($html, 'UTF-8', mb_detect_encoding($html, 'UTF-8', true));

登入後複製

2.輸出編碼

透過將以下行加入腳本中來確保輸出正確編碼：

header('Content-Type: text/html; charset=UTF-8');

登入後複製

3. HTML實體轉換

在輸出之前將獲取的HTML 轉換為HTML 實體：

$html = file_get_contents('http://example.com/foreign.html');
$html_entities = htmlentities($html, ENT_COMPAT, 'UTF-8');
echo $html_entities;

登入後複製

4. JSON 解碼

如果外部HTML 儲存為JSON，請使用JSON類別對其進行解碼：

$json = file_get_contents('http://example.com/foreign.html');
$decoded_json = json_decode($json, true);
$html = $decoded_json['html'];

登入後複製

透過利用這些技術，您可以規避 file_get_contents 引起的編碼問題() 並確保 UTF-8 字元的正確顯示。

以上是為什麼使用'file_get_contents()”時 UTF-8 字元會損壞？的詳細內容。更多資訊請關注PHP中文網其他相關文章！