How SpringBoot implements filtering sensitive words-javaTutorial-php.cn

Home

Java

javaTutorial

How SpringBoot implements filtering sensitive words

王林

May 20, 2023 pm 07:28 PM

springboot

Filter sensitive words

How SpringBoot implements filtering sensitive words

1. Create a text file that stores the sensitive words to be filtered

First create a text file to store the sensitive words to be filtered

How SpringBoot implements filtering sensitive words

In the following tool class we will read this text file, here is given in advance

@PostConstruct   // 这个注解表示当容器实例化这个bean（服务启动的时候）之后在调用构造器之后这个方法会自动的调用
public void init(){
    try(
            // 读取写有“敏感词”的文件，getClass表示从程序编译之后的target/classes读配置文件，读之后是字节流
            // java7语法，在这里的句子最后会自动执行close语句
            InputStream is = this.getClass().getClassLoader().getResourceAsStream("sensitive-words.txt");
            // 字节流  ->   字符流  ->  缓冲流
            BufferedReader reader = new BufferedReader(new InputStreamReader(is));

    ) {
        String keyword;
        // 从文件中一行一行读
        while ((keyword = reader.readLine()) != null){
            // 添加到前缀树
            this.addKeyword(keyword);
        }
    } catch (IOException e) {
        logger.error("加载敏感词文件失败: " + e.getMessage());
    }
}

Copy after login

2. Develop a tool class for filtering sensitive words

Develop a component for filtering sensitive words

In order to facilitate future reuse, we write a tool class for filtering sensitive words SensitiveFilter.

@Component
public class SensitiveFilter {

    private static final Logger logger = LoggerFactory.getLogger(SensitiveFilter.class);

    // 当检测到敏感词后我们要把敏感词替换成什么符号
    private static final String REPLACEMENT = "***";

    // 根节点
    private TrieNode rootNode = new TrieNode();

    @PostConstruct   // 这个注解表示当容器实例化这个bean（服务启动的时候）之后在调用构造器之后这个方法会自动的调用
    public void init(){
        try(
                // 读取写有“敏感词”的文件，getClass表示从程序编译之后的target/classes读配置文件，读之后是字节流
                // java7语法，在这里的句子最后会自动执行close语句
                InputStream is = this.getClass().getClassLoader().getResourceAsStream("sensitive-words.txt");
                // 字节流  ->   字符流  ->  缓冲流
                BufferedReader reader = new BufferedReader(new InputStreamReader(is));

        ) {
            String keyword;
            // 从文件中一行一行读
            while ((keyword = reader.readLine()) != null){
                // 添加到前缀树
                this.addKeyword(keyword);
            }
        } catch (IOException e) {
            logger.error("加载敏感词文件失败: " + e.getMessage());
        }
    }

    // 将一个敏感词添加到前缀树中
    private void addKeyword(String keyword){
        // 首先默认指向根
        TrieNode tempNode = rootNode;
        for (int i = 0; i < keyword.length(); i++) {
            char c = keyword.charAt(i);
            TrieNode subNode = tempNode.getSubNode(c);
            if(subNode == null){
                // subNode为空，初始化子节点；subNode不为空，直接用就可以了
                subNode = new TrieNode();
                tempNode.addSubNode(c, subNode);
            }
            // 指针指向子节点，进入下一轮循环
            tempNode = subNode;
        }
        // 最后要设置结束标识
        tempNode.setKeywordEnd(true);
    }

    /**
     * 过滤敏感词
     * @param text 待过滤的文本
     * @return  过滤后的文本
     */
    public String filter(String text){
        if(StringUtils.isBlank(text)){
            // 待过滤的文本为空，直接返回null
            return null;
        }
        // 指针1，指向树
        TrieNode tempNode = rootNode;
        // 指针2，指向正在检测的字符串段的首
        int begin = 0;
        // 指针3，指向正在检测的字符串段的尾
        int position = 0;
        // 储存过滤后的文本
        StringBuilder sb = new StringBuilder();
        while (begin < text.length()){
            char c = text.charAt(position);

            // 跳过符号，比如 “开票”是敏感词 #开#票# 这个字符串中间的 &#39;#&#39; 应该跳过
            if(isSymbol(c)){
                // 是特殊字符
                // 若指针1处于根节点，将此符号计入结果，指针2、3向右走一步
                if(tempNode == rootNode){
                    sb.append(c);
                    begin++;
                }
                // 无论符号在开头或中间，指针3都向下走一步
                position++;
                // 符号处理完，进入下一轮循环
                continue;
            }
            // 执行到这里说明字符不是特殊符号
            // 检查下级节点
            tempNode = tempNode.getSubNode(c);
            if(tempNode == null){
                // 以begin开头的字符串不是敏感词
                sb.append(text.charAt(begin));
                // 进入下一个位置
                position = ++begin;
                // 重新指向根节点
                tempNode = rootNode;
            } else if(tempNode.isKeywordEnd()){
                // 发现敏感词，将begin~position字符串替换掉，存 REPLACEMENT （里面是***）
                sb.append(REPLACEMENT);
                // 进入下一个位置
                begin = ++position;
                // 重新指向根节点
                tempNode = rootNode;
            } else {
                // 检查下一个字符
                position++;
            }
        }
        return sb.toString();
    }

    // 判断是否为特殊符号，是则返回true，不是则返回false
    private boolean isSymbol(Character c){
        // CharUtils.isAsciiAlphanumeric(c)方法：a、b、1、2···返回true，特殊字符返回false
        // 0x2E80  ~  0x9FFF 是东亚的文字范围，东亚文字范围我们不认为是符号
        return  !CharUtils.isAsciiAlphanumeric(c) && (c < 0x2E80 || c > 0x9FFF);
    }

    // 前缀树
    private class TrieNode{

        // 关键词结束标识
        private boolean isKeywordEnd = false;

        // 当前节点的子节点(key是下级字符、value是下级节点)
        private Map<Character, TrieNode> subNodes = new HashMap<>();

        public boolean isKeywordEnd() {
            return isKeywordEnd;
        }

        public void setKeywordEnd(boolean keywordEnd) {
            isKeywordEnd = keywordEnd;
        }

        // 添加子节点
        public void addSubNode(Character c, TrieNode node){
            subNodes.put(c, node);
        }

        // 获取子节点
        public TrieNode getSubNode(Character c){
            return subNodes.get(c);
        }
    }
}

Copy after login

How SpringBoot implements filtering sensitive words

The above is all the code for the filtering sensitive word tool class. Next, let’s explain the development steps

There are three steps to developing the sensitive word filtering component ：

1. Define the prefix tree (Tree)

We will define the prefix tree and write it as the internal class# of the SensitiveFilter tool class ##

// 前缀树
private class TrieNode{

    // 关键词结束标识
    private boolean isKeywordEnd = false;

    // 当前节点的子节点(key是下级字符、value是下级节点)
    private Map<Character, TrieNode> subNodes = new HashMap<>();

    public boolean isKeywordEnd() {
        return isKeywordEnd;
    }

    public void setKeywordEnd(boolean keywordEnd) {
        isKeywordEnd = keywordEnd;
    }

    // 添加子节点
    public void addSubNode(Character c, TrieNode node){
        subNodes.put(c, node);
    }

    // 获取子节点
    public TrieNode getSubNode(Character c){
        return subNodes.get(c);
    }
}

Copy after login

How SpringBoot implements filtering sensitive words

2. Initialize the prefix tree based on sensitive words

Add sensitive words to the prefix tree

// 将一个敏感词添加到前缀树中
private void addKeyword(String keyword){
    // 首先默认指向根
    TrieNode tempNode = rootNode;
    for (int i = 0; i < keyword.length(); i++) {
        char c = keyword.charAt(i);
        TrieNode subNode = tempNode.getSubNode(c);
        if(subNode == null){
            // subNode为空，初始化子节点；subNode不为空，直接用就可以了
            subNode = new TrieNode();
            tempNode.addSubNode(c, subNode);
        }
        // 指针指向子节点，进入下一轮循环
        tempNode = subNode;
    }
    // 最后要设置结束标识
    tempNode.setKeywordEnd(true);
}

Copy after login

How SpringBoot implements filtering sensitive words

3. Write a method to filter sensitive words

How to filter sensitive words in text:

How SpringBoot implements filtering sensitive words

How to deal with special symbols:

How SpringBoot implements filtering sensitive words ##After the sensitive word prefix tree is initialized, the algorithm for filtering sensitive words in the text should be as follows:

Define three pointers:

Pointer 1
points to the Tree tree
Pointer 2
points to the header of string segment to be filtered

points to

to be filtered Filter the tail of the string segment ##

/**
 * 过滤敏感词
 * @param text 待过滤的文本
 * @return  过滤后的文本
 */
public String filter(String text){
    if(StringUtils.isBlank(text)){
        // 待过滤的文本为空，直接返回null
        return null;
    }
    // 指针1，指向树
    TrieNode tempNode = rootNode;
    // 指针2，指向正在检测的字符串段的首
    int begin = 0;
    // 指针3，指向正在检测的字符串段的尾
    int position = 0;
    // 储存过滤后的文本
    StringBuilder sb = new StringBuilder();
    while (begin < text.length()){
        char c = text.charAt(position);

        // 跳过符号，比如 “开票”是敏感词 #开#票# 这个字符串中间的 &#39;#&#39; 应该跳过
        if(isSymbol(c)){
            // 是特殊字符
            // 若指针1处于根节点，将此符号计入结果，指针2、3向右走一步
            if(tempNode == rootNode){
                sb.append(c);
                begin++;
            }
            // 无论符号在开头或中间，指针3都向下走一步
            position++;
            // 符号处理完，进入下一轮循环
            continue;
        }
        // 执行到这里说明字符不是特殊符号
        // 检查下级节点
        tempNode = tempNode.getSubNode(c);
        if(tempNode == null){
            // 以begin开头的字符串不是敏感词
            sb.append(text.charAt(begin));
            // 进入下一个位置
            position = ++begin;
            // 重新指向根节点
            tempNode = rootNode;
        } else if(tempNode.isKeywordEnd()){
            // 发现敏感词，将begin~position字符串替换掉，存 REPLACEMENT （里面是***）
            sb.append(REPLACEMENT);
            // 进入下一个位置
            begin = ++position;
            // 重新指向根节点
            tempNode = rootNode;
        } else {
            // 检查下一个字符
            position++;
        }
    }
    return sb.toString();
}

// 判断是否为特殊符号，是则返回true，不是则返回false
private boolean isSymbol(Character c){
    // CharUtils.isAsciiAlphanumeric(c)方法：a、b、1、2···返回true，特殊字符返回false
    // 0x2E80  ~  0x9FFF 是东亚的文字范围，东亚文字范围我们不认为是符号
    return  !CharUtils.isAsciiAlphanumeric(c) && (c < 0x2E80 || c > 0x9FFF);
}

Copy after login

##Finally: It is recommended to test it in the test class

After testing, the development of a tool for filtering sensitive words has been completed. This tool will be used in the next
publishing posts
function.
The above is the detailed content of How SpringBoot implements filtering sensitive words. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Show More

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution
1 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Where to find the Crane Control Keycard in Atomfall
1 weeks ago By DDD

Show More

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Show More

Hot Topics

Where is the login entrance for gmail email?

7430

15

CakePHP Tutorial

1359

52

What is the format of the account name of steam

76

11

win11 activation key permanent

28

19

Show More

Related knowledge

How Springboot integrates Jasypt to implement configuration file encryption Jun 01, 2023 am 08:55 AM
Introduction to Jasypt Jasypt is a java library that allows a developer to add basic encryption functionality to his/her project with minimal effort and does not require a deep understanding of how encryption works. High security for one-way and two-way encryption. , standards-based encryption technology. Encrypt passwords, text, numbers, binaries... Suitable for integration into Spring-based applications, open API, for use with any JCE provider... Add the following dependency: com.github.ulisesbocchiojasypt-spring-boot-starter2. 1.1Jasypt benefits protect our system security. Even if the code is leaked, the data source can be guaranteed.

How SpringBoot integrates Redisson to implement delay queue May 30, 2023 pm 02:40 PM
Usage scenario 1. The order was placed successfully but the payment was not made within 30 minutes. The payment timed out and the order was automatically canceled. 2. The order was signed and no evaluation was conducted for 7 days after signing. If the order times out and is not evaluated, the system defaults to a positive rating. 3. The order is placed successfully. If the merchant does not receive the order for 5 minutes, the order is cancelled. 4. The delivery times out, and push SMS reminder... For scenarios with long delays and low real-time performance, we can Use task scheduling to perform regular polling processing. For example: xxl-job Today we will pick

How to use Redis to implement distributed locks in SpringBoot Jun 03, 2023 am 08:16 AM
1. Redis implements distributed lock principle and why distributed locks are needed. Before talking about distributed locks, it is necessary to explain why distributed locks are needed. The opposite of distributed locks is stand-alone locks. When we write multi-threaded programs, we avoid data problems caused by operating a shared variable at the same time. We usually use a lock to mutually exclude the shared variables to ensure the correctness of the shared variables. Its scope of use is in the same process. If there are multiple processes that need to operate a shared resource at the same time, how can they be mutually exclusive? Today's business applications are usually microservice architecture, which also means that one application will deploy multiple processes. If multiple processes need to modify the same row of records in MySQL, in order to avoid dirty data caused by out-of-order operations, distribution needs to be introduced at this time. The style is locked. Want to achieve points

How to solve the problem that springboot cannot access the file after reading it into a jar package Jun 03, 2023 pm 04:38 PM
Springboot reads the file, but cannot access the latest development after packaging it into a jar package. There is a situation where springboot cannot read the file after packaging it into a jar package. The reason is that after packaging, the virtual path of the file is invalid and can only be accessed through the stream. Read. The file is under resources publicvoidtest(){Listnames=newArrayList();InputStreamReaderread=null;try{ClassPathResourceresource=newClassPathResource("name.txt");Input

Comparison and difference analysis between SpringBoot and SpringMVC Dec 29, 2023 am 11:02 AM
SpringBoot and SpringMVC are both commonly used frameworks in Java development, but there are some obvious differences between them. This article will explore the features and uses of these two frameworks and compare their differences. First, let's learn about SpringBoot. SpringBoot was developed by the Pivotal team to simplify the creation and deployment of applications based on the Spring framework. It provides a fast, lightweight way to build stand-alone, executable

How to implement Springboot+Mybatis-plus without using SQL statements to add multiple tables Jun 02, 2023 am 11:07 AM
When Springboot+Mybatis-plus does not use SQL statements to perform multi-table adding operations, the problems I encountered are decomposed by simulating thinking in the test environment: Create a BrandDTO object with parameters to simulate passing parameters to the background. We all know that it is extremely difficult to perform multi-table operations in Mybatis-plus. If you do not use tools such as Mybatis-plus-join, you can only configure the corresponding Mapper.xml file and configure The smelly and long ResultMap, and then write the corresponding sql statement. Although this method seems cumbersome, it is highly flexible and allows us to

How SpringBoot customizes Redis to implement cache serialization Jun 03, 2023 am 11:32 AM
1. Customize RedisTemplate1.1, RedisAPI default serialization mechanism. The API-based Redis cache implementation uses the RedisTemplate template for data caching operations. Here, open the RedisTemplate class and view the source code information of the class. publicclassRedisTemplateextendsRedisAccessorimplementsRedisOperations, BeanClassLoaderAware{//Declare key, Various serialization methods of value, the initial value is empty @NullableprivateRedisSe

How to get the value in application.yml in springboot Jun 03, 2023 pm 06:43 PM
In projects, some configuration information is often needed. This information may have different configurations in the test environment and the production environment, and may need to be modified later based on actual business conditions. We cannot hard-code these configurations in the code. It is best to write them in the configuration file. For example, you can write this information in the application.yml file. So, how to get or use this address in the code? There are 2 methods. Method 1: We can get the value corresponding to the key in the configuration file (application.yml) through the ${key} annotated with @Value. This method is suitable for situations where there are relatively few microservices. Method 2: In actual projects, When business is complicated, logic

See all articles