Stripping Control Characters from PHP Strings
Q: Modifying Regular Expression for Control Character Removal
A PHP developer previously employed the following expression to purge control characters like STX from a string:
preg_replace("/[^a-zA-Z0-9 .\-_;!:?äÄöÖüÜß<>='\"]/","",$pString)
However, the result was overly restrictive, prompting the question: how can control characters be selectively removed?
A: Utilizing Specific Character Classes for Control Character Identification
To precisely target control characters, a more specific character class can be utilized:
preg_replace('/[\x00-\x1F\x7F]/', '', $input);
This expression matches characters with ASCII codes within the range x00-x1F and x7F, encompassing the first 32 characters and x7F, which includes characters like carriage returns.
Preserving Essential Characters like Line Breaks
If specific characters, such as line feeds or carriage returns, need to be preserved, their escape sequences can be excluded:
preg_replace('/[\x00-\x09\x0B\x0C\x0E-\x1F\x7F]/', '', $input);
In this modified expression, x0A (line feed) and x0D (carriage return) are excluded from the character class.
Modern Syntax and Deprecation
It is crucial to note that the deprecation of ereg_replace in PHP 5.3.0 and its subsequent removal in PHP 7.0.0 mandate the use of preg_replace over ereg_replace.
Finally, a Character Class for Control Characters
For a concise and portable alternative, the character class [:cntrl:] can be employed:
preg_replace('/[[:cntrl:]]/', '', $input);
The above is the detailed content of How to Effectively Remove Control Characters from PHP Strings?. For more information, please follow other related articles on the PHP Chinese website!