How to Display UTF-8 Strings Correctly on Windows Console: A Comprehensive Guide?-C++-php.cn

Home

Backend Development

C++

How to Display UTF-8 Strings Correctly on Windows Console: A Comprehensive Guide?

Oct 29, 2024 pm 06:41 PM

How to Display UTF-8 Strings Correctly on Windows Console: A Comprehensive Guide?

UTF-8 Output in Windows: A Comprehensive Guide

Windows presents a unique challenge when writing cross-platform applications in C that rely on UTF-8 encoded strings. Unlike Unix systems, which automatically interpret 8-bit strings as UTF-8, Windows requires a specific configuration to do the same.

Consider the following code:

<code class="cpp">#include &lt;string&gt;
#include &lt;iostream&gt;

int main() {
    std::string test = u8"Greek: αβγδ; German: Übergrößenträger";
    std::cout &lt;&lt; test;
    return 0;
}</code>

Copy after login

On Unix systems, this code will render the desired characters correctly. However, on Windows, it will display garbled text due to std::cout's default expectation of 8-bit strings in Latin-1 or similar non-Unicode format.

To resolve this issue, Windows requires two configuration steps:

1. Setting Console Code Page to UTF-8

This informs the console to interpret the byte stream it receives as UTF-8:

<code class="cpp">SetConsoleOutputCP(CP_UTF8);</code>

Copy after login

2. Enabling Buffering in std::cout

The Visual Studio STL implementation of std::basic_filebuf can pass UTF-8 sequences as individual bytes, leading to incorrect console interpretation. By enabling buffering, we ensure that strings are passed in their entirety:

<code class="cpp">setvbuf(stdout, nullptr, _IOFBF, 1000);</code>

Copy after login

With these configurations, UTF-8 strings will be accurately displayed on the Windows console. However, it's important to note that Windows consoles still have legacy issues:

Raster Fonts: These fonts ignore the console's code page, requiring the use of TrueType fonts for non-ASCII Unicode characters.
Windows 7 Default Font: Until Windows 10, the default font was a raster font, requiring manual configuration to a TrueType font.

By incorporating both code and context, this revised answer provides a comprehensive step-by-step solution for printing UTF-8 strings on Windows, addressing both historical and modern considerations.

The above is the detailed content of How to Display UTF-8 Strings Correctly on Windows Console: A Comprehensive Guide?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn