Wide characters (wchar_t) and wide strings (wstring) have drawn criticism within the C community, particularly due to their use in the Windows API. This article examines the shortcomings of these concepts and explores alternative approaches for internationalization.
wchar_t is designed to represent character codes in all locales, providing a one-to-one mapping between code units and characters. However, its specification assumes a direct relationship between characters and code points, which Unicode violates. This discrepancy makes it impractical to use wchar_t as a universal text representation or for simplifying text algorithms.
For portable code, wchar_t is of limited use. The presence of __STDC_ISO_10646__ indicates a direct mapping to Unicode code points, but it cannot be relied upon consistently across platforms. Windows, for instance, employs UTF-16 as its wchar_t encoding, introducing additional complexities.
UTF-8 Encoded C Strings:
This alternative provides a portable text representation and avoids the complications of wide characters. Most modern platforms adopt UTF-8 natively, and while it lacks simple text algorithm support, it facilitates error detection and correction.
Cross-Platform Representations:
Some software uses custom representations like UTF-16-encoded unsigned short arrays, assuming the necessary library support and language limitations.
C 11 Wide Characters:
C 11 introduces char16_t and char32_t as alternatives to wchar_t. While not explicitly guaranteed to represent UTF-16 and UTF-32 respectively, it is highly probable that major implementations will adhere to this convention. Improved UTF-8 support, including UTF-8 string literals, further enhances the utility of C 11 for internationalized applications.
TCHAR:
TCHAR, primarily used for migrating legacy Windows programs, is not portable, lacks specificity in its encoding and data type, and has no value outside of TCHAR-based APIs.
In conclusion, wchar_t and wstrings pose challenges for cross-platform internationalization efforts due to their non-universal applicability. The alternatives discussed provide more versatile and portable solutions for handling internationalized text.
The above is the detailed content of Why Are C 's `wchar_t` and `wstring` Considered Problematic for Internationalization?. For more information, please follow other related articles on the PHP Chinese website!