When building a website, user comments are often provided. Some malicious users will insert some scripts into the comment content, and these scripts may destroy the behavior of the entire page, or more seriously, obtain some confidential information. At this time, the HTML needs to be cleaned to avoid cross-site scripting. -site scripting attacks (XSS).
Use the jsoup HTML Cleaner method for cleaning, but you need to specify a configurable Whitelist.
String unsafe = "<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>";String safe = Jsoup.clean(unsafe, Whitelist.basic());// now: <p><a href="http://example.com/" rel="nofollow">Link</a></p>
XSS is also called CSS (Cross Site Script), a cross-site scripting attack. It refers to a malicious attacker inserting malicious HTML code into a Web page. When a user browses the page, the HTML code embedded in the Web will be executed, thereby achieving the special purpose of maliciously attacking the user. XSS is a passive attack. Because it is passive and difficult to exploit, many people often ignore its harm. So we often only allow users to enter plain text content, but this results in a poor user experience.
A better solution is to use a WYSIWYG rich text editor such as CKEditor and TinyMCE. These can output HTML and enable visual editing by the user. Although they can be verified on the client side, this is not secure enough. It needs to be verified on the server side and remove harmful HTML code to ensure that the HTML entered into your website is safe. Otherwise, an attacker is able to bypass client-side Javascript validation and inject insecure HMTL directly into your website.
jsoup's whitelist cleaner can filter the HTML input by the user on the server side and only output some safe tags and attributes.
jsoup provides a series of basic Whitelist configurations that can meet most requirements; but they can be modified if necessary, but be careful.
This cleaner is very easy to use. It can not only avoid XSS attacks, but also limit the range of tags that users can enter.