Home Backend Development PHP Tutorial Filter utf8 characters that exceed three bytes, or non-utf8 characters

Filter utf8 characters that exceed three bytes, or non-utf8 characters

Jul 29, 2016 am 09:03 AM
amp continue str

function filterUtf8($str)
    {
        /*utf8 编码表:
        * Unicode符号范围           | UTF-8编码方式
        * u0000 0000 - u0000 007F   | 0xxxxxxx
        * u0000 0080 - u0000 07FF   | 110xxxxx 10xxxxxx
        * u0000 0800 - u0000 FFFF   | 1110xxxx 10xxxxxx 10xxxxxx
        *
        */
        $re = '';
        $str = str_split(bin2hex($str), 2);

        $mo =  1<<7;
        $mo2 = $mo | (1 << 6);
        $mo3 = $mo2 | (1 << 5);         //三个字节
        $mo4 = $mo3 | (1 << 4);          //四个字节
        $mo5 = $mo4 | (1 << 3);          //五个字节
        $mo6 = $mo5 | (1 << 2);          //六个字节


        for ($i = 0; $i < count($str); $i++)
        {
            if ((hexdec($str[$i]) & ($mo)) == 0)
            {
                $re .=  chr(hexdec($str[$i]));
                continue;
            }

            //4字节 及其以上舍去
            if ((hexdec($str[$i]) & ($mo6) )  == $mo6)
            {
                $i = $i +5;
                continue;
            }

            if ((hexdec($str[$i]) & ($mo5) )  == $mo5)
            {
                $i = $i +4;
                continue;
            }

            if ((hexdec($str[$i]) & ($mo4) )  == $mo4)
            {
                $i = $i +3;
                continue;
            }

            if ((hexdec($str[$i]) & ($mo3) )  == $mo3 )
            {
                $i = $i +2;
                if (((hexdec($str[$i]) & ($mo) )  == $mo) &&  ((hexdec($str[$i - 1]) & ($mo) )  == $mo)  )
                {
                    $r = chr(hexdec($str[$i - 2])).
                        chr(hexdec($str[$i - 1])).
                        chr(hexdec($str[$i]));
                    $re .= $r;
                }
                continue;
            }



            if ((hexdec($str[$i]) & ($mo2) )  == $mo2 )
            {
                $i = $i +1;
                if ((hexdec($str[$i]) & ($mo) )  == $mo)
                {
                    $re .= chr(hexdec($str[$i - 1])) . chr(hexdec($str[$i]));
                }
                continue;
            }
        }
        return $re;
    }
Copy after login

The above introduces the filtering of characters exceeding three bytes in utf8 characters, or non-utf8 characters, including the content. I hope it will be helpful to friends who are interested in PHP tutorials.

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What coin is AMP? What coin is AMP? Feb 24, 2024 pm 09:16 PM

What is AMP Coin? The AMP token was created by the Synereo team in 2015 as the main trading currency of the Synereo platform. AMP token aims to provide users with a better digital economic experience through multiple functions and uses. Purpose of AMP Token The AMP Token has multiple roles and functions in the Synereo platform. First, as part of the platform’s cryptocurrency reward system, users are able to earn AMP rewards by sharing and promoting content, a mechanism that encourages users to participate more actively in the platform’s activities. AMP tokens can also be used to promote and distribute content on the Synereo platform. Users can increase the visibility of their content on the platform by using AMP tokens to attract more viewers to view and share

Does continue jump out of the current loop or all loops? Does continue jump out of the current loop or all loops? Feb 02, 2023 pm 04:20 PM

continue is to jump out of the current loop. The continue statement is used to skip this loop and execute the next loop; when encountering the continue statement, the program will immediately recheck the conditional expression. If the expression result is true, the next loop will start. If the expression result is false, Exit the loop.

The role and use of the continue keyword in PHP The role and use of the continue keyword in PHP Jun 28, 2023 pm 08:07 PM

The role and use of the continue keyword in PHP In PHP programming, continue is a very useful keyword. It is used to control the execution flow of loop statements, allowing to skip the remaining code in the current loop and directly enter the execution of the next loop. The function of continue is to skip the code in the current iteration in the loop statement and start the next iteration directly. When the continue statement is executed, loop control will immediately go to the beginning of the loop body without executing continue.

Python built-in type str source code analysis Python built-in type str source code analysis May 09, 2023 pm 02:16 PM

1The basic unit of Unicode computer storage is the byte, which is composed of 8 bits. Since English only consists of 26 letters plus a number of symbols, English characters can be stored directly in bytes. But other languages ​​(such as Chinese, Japanese, Korean, etc.) have to use multiple bytes for encoding due to the large number of characters. With the spread of computer technology, non-Latin character encoding technology continues to develop, but there are still two major limitations: no multi-language support: the encoding scheme of one language cannot be used in another language and there is no unified standard: for example There are many encoding standards in Chinese such as GBK, GB2312, GB18030, etc. Since the encoding methods are not unified, developers need to convert back and forth between different encodings, and many errors will inevitably occur.

What are the similarities and differences between __str__ and __repr__ in Python? What are the similarities and differences between __str__ and __repr__ in Python? Apr 29, 2023 pm 07:58 PM

What are the similarities and differences between __str__ and __repr__? We all know the representation of strings. Python's built-in function repr() can express objects in the form of strings to facilitate our identification. This is the "string representation". repr() obtains the string representation of an object through the special method __repr__. If __repr__ is not implemented, when we print an instance of a vector to the console, the resulting string may be. >>>classExample:pass>>>print(str(Example()))>>>

JS loop learning: break out of loop statements break and continue JS loop learning: break out of loop statements break and continue Aug 03, 2022 pm 07:08 PM

In the previous article, we took you to learn several loop control structures in JS (while and do-while loops, for loops). Let’s talk about the break and continue statements to jump out of the loop. I hope it will be helpful to everyone!

How to use the continue statement in java How to use the continue statement in java Apr 26, 2023 am 11:43 AM

Note 1. The continue statement refers to skipping the remaining statements in the loop and forcing the execution of the next loop. Its function is to end the loop, that is, skip the unexecuted statements below in the loop, and then determine whether the next loop is executed. 2. The continue statement is similar to the break statement, but it can only appear in a loop. Example intsum=0;for(inti=1;i

Revealing the secret of str in Go language Revealing the secret of str in Go language Mar 29, 2024 am 11:27 AM

Go language is a fast, concise and efficient open source programming language, which is becoming more and more popular among programmers. String (string) is one of the data types frequently used in programs, and it also has a unique processing method in the Go language. This article will lead readers to uncover the mystery of strings (str) in Go language, and analyze its usage and characteristics through specific code examples. Definition and initialization of string In Go language, a string is a sequence of characters wrapped in double quotes "", for example: str

See all articles