Home php教程 php手册 混合编码的文本段落重排

混合编码的文本段落重排

Jun 06, 2016 pm 07:34 PM
http text paragraph coding rearrange

http://www.sgcha.cn/cha.php 在文本处理中,尤其是大段的文本处理,由于html语言的规则以及文本编码的不同,使得文字容易出现乱码。同时由于换行的不同,使得文本要经常进行段落重排。 段落重排的标识主要是根据特定的标点符号来进行。具体参考代码中的注释

http://www.sgcha.cn/cha.php
在文本处理中,尤其是大段的文本处理,由于html语言的规则以及文本编码的不同,使得文字容易出现乱码。同时由于换行的不同,使得文本要经常进行段落重排。
段落重排的标识主要是根据特定的标点符号来进行。具体参考代码中的注释

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

$strtest = '这个是第一个

句子,

的第一部分。

的反对法 的飞洒?

\u3434,

';

$strtest =cut_str_by_mb ($strtest );

 

echo "<pre class="brush:php;toolbar:false">";

echo $strtest;

/*********************************************************************

输入一个字串

此处的关键是段落的标点,是一个utf-8的编码 

返回排版后的字串。

 

*********************************************************************/

function cut_str_by_mb ($str,$arr_tag=NULL){

    if($arr_tag==NULL){

        $arr_tag=array(

                    '\u2026',

                    '\u201d',

                    '\u302',

                    '\uff1f',

                    ':',

                    '\uff1a',

                  );

    }

    $str=set_char_set($str);//不管先检查字符格式,转化成utf-8的再说

    $str=unescape($str);//把里面16进制编码的转化成utf-8的格式

    $tmp_array=preg_split("/((\r(?!\n))|((?<!\r)\n)|(\r\n))/", $str);//根据换行符拆分成数组

    $tmp_val='';

    foreach($tmp_array as  $v){

        if(!empty($tmp_val)){

            $v=trim($v);//去掉字符的首尾空格

        }

        $tmp_val=$tmp_val.$v;//链接后面的值,组成新的字串

        $len=mb_strlen( $tmp_val, 'utf-8') ;

        $endtag=mb_substr($tmp_val,$len-1,1,'utf-8');

        $u_tag=unicode_encode($endtag);

        if (in_array($u_tag, $arr_tag)) {

            $return_arr[]=$tmp_val;

            $tmp_val='';

        }

    }

    $return_str=implode("\r\n",$return_arr);

    return $return_str;

}

 

/****************************************************************

    检查编码,统一用utf-8

**********************************************************************/

    function set_char_set($data){

      if( !empty($data) ){

        $fileType = mb_detect_encoding($data , array('UTF-8','GBK','LATIN1','BIG5')) ;

        if( $fileType != 'UTF-8'){

          $data = mb_convert_encoding($data ,'utf-8' , $fileType);

        }

      }

      return $data;

    }

/****************************************************************

    把其中的&# 以及joson格式转化成中文

**********************************************************************/

function unescape($str) {

    $str = rawurldecode($str);

    preg_match_all("/(?:%u.{4})|&#x.{4};|&#\d+;|.+/U",$str,$r);

    $ar = $r[0];

//    print_r($ar);

    foreach($ar as $k=>$v) {

        if(substr($v,0,2) == "%u"){

            $ar[$k] = iconv("UCS-2BE","UTF-8",pack("H4",substr($v,-4)));

  }

        elseif(substr($v,0,3) == "&#x"){

            $ar[$k] = iconv("UCS-2BE","UTF-8",pack("H4",substr($v,3,-1)));

  }

        elseif(substr($v,0,2) == "&#") {

              

            $ar[$k] = iconv("UCS-2BE","UTF-8",pack("n",substr($v,2,-1)));

        }

    }

    return join("",$ar);

}  

 

/****************************************************************

    把utf-8编码的字符返回 unicode的字串

**********************************************************************/

function unicode_encode($name){

    $name = iconv('UTF-8', 'UCS-2', $name);

    $len = strlen($name);

    $str = '';

    for ($i = 0; $i < $len - 1; $i = $i + 2){

        $c = $name[$i];

        $c2 = $name[$i + 1];

        if (ord($c) > 0){   

            // 两个字节的文字

            $str .= '\u'.base_convert(ord($c), 10, 16).base_convert(ord($c2), 10, 16);

        }else{

            $str .= $c2;

        }

    }

    return $str;

}

Copy after login
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to search for text across all tabs in Chrome and Edge How to search for text across all tabs in Chrome and Edge Feb 19, 2024 am 11:30 AM

This tutorial shows you how to find specific text or phrases on all open tabs in Chrome or Edge on Windows. Is there a way to do a text search on all open tabs in Chrome? Yes, you can use a free external web extension in Chrome to perform text searches on all open tabs without having to switch tabs manually. Some extensions like TabSearch and Ctrl-FPlus can help you achieve this easily. How to search text across all tabs in Google Chrome? Ctrl-FPlus is a free extension that makes it easy for users to search for a specific word, phrase or text across all tabs of their browser window. This expansion

How to leave two spaces empty in a paragraph in html How to leave two spaces empty in a paragraph in html Mar 27, 2024 pm 04:39 PM

Methods to leave two spaces blank in an HTML paragraph: 1. Use the text-indent attribute of CSS; 2. Use the padding-left attribute of CSS; 3. Use non-breaking spaces or full-width spaces; 4. Use the "pre" tag or white-space Attributes.

Knowledge graph: the ideal partner for large models Knowledge graph: the ideal partner for large models Jan 29, 2024 am 09:21 AM

Large language models (LLMs) have the ability to generate smooth and coherent text, bringing new prospects to areas such as artificial intelligence conversation and creative writing. However, LLM also has some key limitations. First, their knowledge is limited to patterns recognized from training data, lacking a true understanding of the world. Second, reasoning skills are limited and cannot make logical inferences or fuse facts from multiple data sources. When faced with more complex and open-ended questions, LLM's answers may become absurd or contradictory, known as "illusions." Therefore, although LLM is very useful in some aspects, it still has certain limitations when dealing with complex problems and real-world situations. In order to bridge these gaps, retrieval-augmented generation (RAG) systems have emerged in recent years. The core idea is

Understand common application scenarios of web page redirection and understand the HTTP 301 status code Understand common application scenarios of web page redirection and understand the HTTP 301 status code Feb 18, 2024 pm 08:41 PM

Understand the meaning of HTTP 301 status code: common application scenarios of web page redirection. With the rapid development of the Internet, people's requirements for web page interaction are becoming higher and higher. In the field of web design, web page redirection is a common and important technology, implemented through the HTTP 301 status code. This article will explore the meaning of HTTP 301 status code and common application scenarios in web page redirection. HTTP301 status code refers to permanent redirect (PermanentRedirect). When the server receives the client's

http request 415 error solution http request 415 error solution Nov 14, 2023 am 10:49 AM

Solution: 1. Check the Content-Type in the request header; 2. Check the data format in the request body; 3. Use the appropriate encoding format; 4. Use the appropriate request method; 5. Check the server-side support.

HTTP 200 OK: Understand the meaning and purpose of a successful response HTTP 200 OK: Understand the meaning and purpose of a successful response Dec 26, 2023 am 10:25 AM

HTTP Status Code 200: Explore the Meaning and Purpose of Successful Responses HTTP status codes are numeric codes used to indicate the status of a server's response. Among them, status code 200 indicates that the request has been successfully processed by the server. This article will explore the specific meaning and use of HTTP status code 200. First, let us understand the classification of HTTP status codes. Status codes are divided into five categories, namely 1xx, 2xx, 3xx, 4xx and 5xx. Among them, 2xx indicates a successful response. And 200 is the most common status code in 2xx

How to implement HTTP streaming using C++? How to implement HTTP streaming using C++? May 31, 2024 am 11:06 AM

How to implement HTTP streaming in C++? Create an SSL stream socket using Boost.Asio and the asiohttps client library. Connect to the server and send an HTTP request. Receive HTTP response headers and print them. Receives the HTTP response body and prints it.

Learn how to open win11 text documents Learn how to open win11 text documents Jan 02, 2024 pm 03:54 PM

Text documents are very important files in the system. They not only allow us to view a lot of text content, but also provide programming functions. However, after the win11 system was updated, many friends found that text documents could not be opened. At this time, we can open them directly by running them. Let’s take a look. Where to open a text document in win11 1. First press "win+r" on the keyboard to call up run. 2. Then enter "notepad" to create a new text document directly. 3. If we want to open an existing text document, we can also click on the file in the upper left corner and then click "Open".

See all articles