UTF-8 Output in Windows: A Comprehensive Guide
Windows presents a unique challenge when writing cross-platform applications in C that rely on UTF-8 encoded strings. Unlike Unix systems, which automatically interpret 8-bit strings as UTF-8, Windows requires a specific configuration to do the same.
Consider the following code:
<code class="cpp">#include <string> #include <iostream> int main() { std::string test = u8"Greek: αβγδ; German: Übergrößenträger"; std::cout << test; return 0; }</code>
On Unix systems, this code will render the desired characters correctly. However, on Windows, it will display garbled text due to std::cout's default expectation of 8-bit strings in Latin-1 or similar non-Unicode format.
To resolve this issue, Windows requires two configuration steps:
1. Setting Console Code Page to UTF-8
This informs the console to interpret the byte stream it receives as UTF-8:
<code class="cpp">SetConsoleOutputCP(CP_UTF8);</code>
2. Enabling Buffering in std::cout
The Visual Studio STL implementation of std::basic_filebuf can pass UTF-8 sequences as individual bytes, leading to incorrect console interpretation. By enabling buffering, we ensure that strings are passed in their entirety:
<code class="cpp">setvbuf(stdout, nullptr, _IOFBF, 1000);</code>
With these configurations, UTF-8 strings will be accurately displayed on the Windows console. However, it's important to note that Windows consoles still have legacy issues:
By incorporating both code and context, this revised answer provides a comprehensive step-by-step solution for printing UTF-8 strings on Windows, addressing both historical and modern considerations.
The above is the detailed content of How to Display UTF-8 Strings Correctly on Windows Console: A Comprehensive Guide?. For more information, please follow other related articles on the PHP Chinese website!