Home Web Front-end HTML Tutorial 用Jsoup对用户输入内容的HTML安全过滤_html/css_WEB-ITnose

用Jsoup对用户输入内容的HTML安全过滤_html/css_WEB-ITnose

Jun 21, 2016 am 09:17 AM

在网站使用input或textarea提供给用户可输入内容的功能,比如发帖子,发文章,发评论等等。这时候需要后端程序对输入内容作安全过滤,比如<script>等可造成安全隐患的标签。</script>

java中有个开源包叫Jsoup,本身用来解析html,xml文档的,特点是可以使用类似jquery的选择权语法。

最近在解决内容安全过滤的时候,通过google发现Jsoup通过自定义Whitelist(安全标签白名单)提供了这样的功能,非常好用。

简单演示如下:

//HTML cleanString unsafe = "<table><tr><td>1</td></tr></table>" +		"<img src='' alt='' />" +		"<p><a href='http://example.com/' onclick='stealCookies()'>Link</a>" +		"<object></object>" +		"<script>alert(1);</script>" +		"</p>";String safe = Jsoup.clean(unsafe, Whitelist.relaxed());System.out.println("safe: " + safe);
Copy after login

官方API地址: http://jsoup.org/apidocs/org/jsoup/safety/Whitelist.html

发现来源:

http://www.oschina.net/question/12_10232 , 据此自己写了个自定义的帮助类:

package com.cssor.safety; import org.jsoup.Jsoup;import org.jsoup.helper.StringUtil;import org.jsoup.safety.Whitelist; public class ContentSafeFilter {	private final static Whitelist user_content_filter = Whitelist.relaxed();	static {		//增加可信标签到白名单		user_content_filter.addTags("embed","object","param","span","div");		//增加可信属性	user_content_filter.addAttributes(":all", "style", "class", "id", "name");		user_content_filter.addAttributes("object", "width", "height","classid","codebase");		user_content_filter.addAttributes("param", "name", "value");		user_content_filter.addAttributes("embed", "src","quality","width","height","allowFullScreen","allowScriptAccess","flashvars","name","type","pluginspage");	} 	/**	 * 对用户输入内容进行过滤	 * @param html	 * @return	 */	public static String filter(String html) {		if(StringUtil.isBlank(html)) return "";		return Jsoup.clean(html, user_content_filter);		//return filterScriptAndStyle(html);	} 	/**	 * 比较宽松的过滤,但是会过滤掉object,script, span,div等标签,适用于富文本编辑器内容或其他html内容	 * @param html	 * @return	 */	public static String relaxed(String html) {		return Jsoup.clean(html, Whitelist.relaxed());	} 	/**	 * 去掉所有标签,返回纯文字.适用于textarea,input	 * @param html	 * @return	 */	public static String pureText(String html) {		return Jsoup.clean(html, Whitelist.none());	} 	/**	 * @param args	 */	public static void main(String[] args) {		String unsafe = "<table><tr><td>1</td></tr></table>" +	"<img src='' alt='' />" +				"<p><a href='http://example.com/' onclick='stealCookies()'>Link</a>" +				"<object></object>" +				"<script>alert(1);</script>" +				"</p>";		String safe = ContentSafeFilter.filter(unsafe);		System.out.println("safe: " + safe);	} }
Copy after login

Jsoup不支持相对路径图片的过滤,比如””会被去掉src属性,想了个简单的方法避免:

/** * 自定义对用户输入内容进行过滤的标签 * @param html * @return */public static String filter(String html) {    if(StringUtil.isBlank(html)) return "";    String baseUri = "http://baseuri";    return Jsoup.clean(html, baseUri, user_content_filter).replaceAll("src=\"http://baseuri", "src=\"");}
Copy after login

http://cssor.com/jsoup-whitelist-clean-html-for-user-content.html


Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Difficulty in updating caching of official account web pages: How to avoid the old cache affecting the user experience after version update? Difficulty in updating caching of official account web pages: How to avoid the old cache affecting the user experience after version update? Mar 04, 2025 pm 12:32 PM

The official account web page update cache, this thing is simple and simple, and it is complicated enough to drink a pot of it. You worked hard to update the official account article, but the user still opened the old version. Who can bear the taste? In this article, let’s take a look at the twists and turns behind this and how to solve this problem gracefully. After reading it, you can easily deal with various caching problems, allowing your users to always experience the freshest content. Let’s talk about the basics first. To put it bluntly, in order to improve access speed, the browser or server stores some static resources (such as pictures, CSS, JS) or page content. Next time you access it, you can directly retrieve it from the cache without having to download it again, and it is naturally fast. But this thing is also a double-edged sword. The new version is online,

How do I use HTML5 form validation attributes to validate user input? How do I use HTML5 form validation attributes to validate user input? Mar 17, 2025 pm 12:27 PM

The article discusses using HTML5 form validation attributes like required, pattern, min, max, and length limits to validate user input directly in the browser.

How to efficiently add stroke effects to PNG images on web pages? How to efficiently add stroke effects to PNG images on web pages? Mar 04, 2025 pm 02:39 PM

This article demonstrates efficient PNG border addition to webpages using CSS. It argues that CSS offers superior performance compared to JavaScript or libraries, detailing how to adjust border width, style, and color for subtle or prominent effect

What are the best practices for cross-browser compatibility in HTML5? What are the best practices for cross-browser compatibility in HTML5? Mar 17, 2025 pm 12:20 PM

Article discusses best practices for ensuring HTML5 cross-browser compatibility, focusing on feature detection, progressive enhancement, and testing methods.

What is the purpose of the <datalist> element? What is the purpose of the <datalist> element? Mar 21, 2025 pm 12:33 PM

The article discusses the HTML &lt;datalist&gt; element, which enhances forms by providing autocomplete suggestions, improving user experience and reducing errors.Character count: 159

What is the purpose of the <meter> element? What is the purpose of the <meter> element? Mar 21, 2025 pm 12:35 PM

The article discusses the HTML &lt;meter&gt; element, used for displaying scalar or fractional values within a range, and its common applications in web development. It differentiates &lt;meter&gt; from &lt;progress&gt; and ex

How do I use the HTML5 <time> element to represent dates and times semantically? How do I use the HTML5 <time> element to represent dates and times semantically? Mar 12, 2025 pm 04:05 PM

This article explains the HTML5 &lt;time&gt; element for semantic date/time representation. It emphasizes the importance of the datetime attribute for machine readability (ISO 8601 format) alongside human-readable text, boosting accessibilit

What is the purpose of the <progress> element? What is the purpose of the <progress> element? Mar 21, 2025 pm 12:34 PM

The article discusses the HTML &lt;progress&gt; element, its purpose, styling, and differences from the &lt;meter&gt; element. The main focus is on using &lt;progress&gt; for task completion and &lt;meter&gt; for stati

See all articles