PHP trim unicode spaces

Question

I'm trying to trim unicode spaces such as this character and I was able to do it using this solution. The problem with this solution is that it does not trim unicode spaces between normal characters. For example, this uses thin space $string=" test test string ";echopreg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u','',$string);//outputs:test str

P粉557957970 · Answer

To remove all Unicode whitespace with control characters at the beginning and end of a string, and to remove all Unicode whitespace with control characters except regular spaces anywhere within the string, you can use

preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$|(?! )[\pZ\pC]/u', '', $string)
// Or, simply
preg_replace('/^\s+|\s+$|[^\S ]/u', '', $string)

See Regular Expression Demo #1 and Regular Expression Demo #2.

details

^[\pZ\pC] - One or more spaces or control characters at the beginning of the string
| - or
[\pZ\pC] $ - One or more spaces or control characters
| - or
(?! )[\pZ\pC] - One or more spaces or control characters other than regular spaces anywhere within the string
[^\S ] - Any whitespace except regular whitespace (\x20)

If you also need to "exclude" common newlines, replace (?! )[\pZ\pC] with (?![ ])[ \pZ \pC] (suggested by @MonkeyZeus), in the second regex, this means you need to use [^\S ].

View PHP Demo:

echo preg_replace('~^[\pZ\pC]+|[\pZ\pC]+$|(?! )[\pZ\pC]~u', '', 'abc def ghi      ');
// => abc defghi
echo preg_replace('/^\s+|\s+$|[^\S ]/u', '', 'abc def ghi     ');
// => abc defghi

P粉445750942 · Answer

How such Unicode spaces \u{2009} can cause problems in different places. So I would replace all unicode spaces with regular spaces and then apply trim().

$string = "   test   string and XY 	 ";
//\u{2009}\u{2009}\u{2009}test\u{2009}\u{2009}\u{2009}string\u{2009}and\x20XY\x20\x09\u{2009}

$trimString = trim(preg_replace('/[\pZ\pC]/u', ' ', $string));
//test\x20\x20\x20string\x20and\x20XY

Note: The string in the comment is represented by debug::writeUni($string, $trimString);. Implemented from this class.