Author | Sun Yue, Unit: China Mobile (Hangzhou) Information Technology Co., Ltd. | China Mobile Hangzhou R&D Center
With the development of 5G network As it continues to gain popularity, a large number of users are beginning to come into contact with and use 5G networks. 5G networks can not only transmit voice, video, text and other information of traditional networks, but can also be used in more practical application scenarios with lower latency and high-precision positioning capabilities, such as: live battlefield information, satellite Positioning, navigation, etc.
Internet information is often mixed with bad information, such as political-related information, pornographic information, and gang-related information , fraudulent information, commercial advertising information, etc., and the number of bad information is increasing year by year, causing huge harassment to users. In order to purify the network environment and effectively control the spread of bad information, China Mobile's 5G bad news security management and control platform came into being.
Data source: China Mobile Group Information Security Center
##1. Application scenarios of the 5G bad information management and control platform##When faced with a complex network information environment, this platform Such as text messages, voice messages, video messages, rich media messages, etc., classify the messages into: politics-related, pornographic, gang-related, fraud-related, commercial advertising messages, normal messages, etc., and then intercept them in a timely manner through corresponding strategies. And follow-up punishment will be carried out according to the severity of the bad news, so as to purify the network environment from the root and create a good network space.
2. Existing 5G bad information management and control platform technology Key points
##The platform mainly intercepts bad information through the following methods:
①Set first-level keywords
: First-level keywords are usually set to some extremely sensitive words. If the user sends a message containing first-level keyword content, the message will be intercepted immediately. , the information content cannot be delivered, and the user is marked.
② Set common keywords
: Common keywords are set to some more sensitive words. If the user sends a message that contains common keyword content, and within a certain period of time Within a certain period of time, if the number of times the user sends the sensitive message exceeds the system's preset interception threshold, the system will pull the user into the blacklist, and within a certain period of time, the user will not be able to use full 5G network services.
③Set complex text information monitoring
: If the user sends a PDF file, which contains text and pictures, extract the text in the file and filter it Advanced keywords and ordinary keyword mechanisms, and pictures are filtered by rich media mechanisms. According to the filtering results of text and pictures respectively, the principle of heavy processing is adopted as the processing result of the file.3. Technical weaknesses of the existing 5G bad management and control platform
The filtering mechanism of the existing 5G bad news security control platform can only filter specified and limited phrases and short sentences. With the popularity of the Internet, new words will emerge in large numbers every day, and only manual addition is required. Vocabulary, it is no longer possible to update the vocabulary library in a timely and rapid manner. Moreover, when a large number of users today send text messages, although the entire text message does not contain illegal words, the thoughts and emotions expressed may contain a large number of negative emotional tendencies. Words and short sentences alone cannot successfully intercept negative emotional content. Therefore, using text sentiment analysis to submit sentences rich in negative emotional tendencies for review and interception can further strengthen the effect of bad information control and reduce the erosion and poisoning of users by spam information. By establishing a text emotion library containing popular Internet phrases and news messages, the emotions rich in the text are divided into three categories: positive emotions, neutral emotions, and negative emotions, and Add corresponding labels to each text according to these three categories, and use the deep learning network to train the text in the emotional library. The trained model can be used in the 5G bad news management and control platform to intercept bad emotional messages. 4. Technical implementation details of 5G defect management and control system based on deep learning This technology contains three major subjects: jieba word segmentation system, phrase vectorization, and text emotion recognition algorithm. The interaction between each subject is as follows:
Interaction flow chart of each module
Use crawler technology to crawl Internet words and news messages as original text, and divide the original text into a training set and a test set in a ratio of 8:2, label the text information in the training set, and then divide the text in the test set into The information is segmented through jieba word segmentation tool, for example: He came to Mobile Hangyan Building. After word segmentation through the jieba word segmentation tool, the result is: he/came/moved/Hangyan/building, and finally the data after word segmentation was organized into a corpus. Since the amount of text information in the training set and test set is very large (usually millions of data), the amount of data in the post-word segmentation corpus will also be very large (tens of millions of data). Although these corpora can be stored in a numbered form in the corpus, due to the huge amount of data, it is easy to suffer from the disaster of dimensionality. Therefore, for the modal particles that appear in text information, such as: "le", "的", "我", etc., although these words appear very frequently, they have little contribution to the emotional effect, so we will choose to eliminate these words from the corpus Phrases to achieve the purpose of reducing dimensions.
We send the vectorized phrases in the training set into the deep learning network for learning and training, obtain the corresponding model, and finally put the data in the test set into the model to view the corresponding recognition As a result, when the model can obtain a better accuracy rate, the model is connected to the 5G bad management and control platform, and the user sends end-to-end information for filtering. During the filtering process, if bad information is found, it will be intercepted in a timely manner, making the 5G bad information management and control system's interception of bad information more systematic and comprehensive.
Specific steps are as follows:
Compared with the existing 5G interception system, the 5G interception system integrated with deep learning has the following advantages:
At present, the application field of deep learning is very broad, relying on its repeated training and self-learning methods. It can greatly reduce manual workload and improve efficiency and accuracy. Not only is it suitable for the above-mentioned bad information interception system, I believe that in the near future, this technology will also shine in other emerging fields. Of course, deep learning itself is not perfect and cannot solve all thorny problems. Because of this, we should continue to invest deep learning technology in new scenarios and new fields in order to achieve new breakthroughs and create a better future smart life.
The above is the detailed content of Application of text emotion recognition technology based on deep learning in 5G bad news security management and control platform. For more information, please follow other related articles on the PHP Chinese website!