This question stems from the suggestion that casting to unsigned char is necessary before calling character manipulation functions like std::toupper and std::tolower. However, Bjarne Stroustrup's code appears to use these functions without casting.
Char Representation
Char, signed char, and unsigned char are distinct types in C . Char may represent a range equivalent to either signed char or unsigned char. In systems where char is signed, the entire character set has non-negative values.
toupper Function
According to the C standard, toupper accepts an int argument and returns an int result. The input value must be representable as an unsigned char or equal to EOF. If not, the behavior is undefined.
Undefined Behavior
If plain char is signed and the value passed to toupper is negative, undefined behavior occurs. This is because the implicit conversion to int yields a negative value.
Casting to Unsigned Char
Casting the char argument to unsigned char ensures that the value is non-negative, avoiding undefined behavior. Even though char and unsigned char have the same size, they represent different ranges of values.
Implementation
These functions are typically implemented using lookup tables. Indexing beyond the bounds of the table can lead to undefined behavior. Converting to unsigned char directly does not avoid this issue if the result is implicitly converted back to a negative value.
Exception: EOF
The functions in
C Modifications
The C standard only modifies certain C standard library functions, and there have been no adjustments to the functions in
Conclusion
To avoid undefined behavior, it is necessary to cast the char argument to unsigned char before calling toupper, tolower, or similar functions, even if char is a non-negative signed type.
The above is the detailed content of Should I Cast to `unsigned char` Before Using `toupper()`, `tolower()`, etc.?. For more information, please follow other related articles on the PHP Chinese website!