用 Java 读取 PDF 遇到中文标签该怎么处理-PHP Chinese Network Q&A

Article Topic Learning Download Q&A Programming Dictionary Game Recent Updates

简体中文(ZH-CN) English(EN) 繁体中文(ZH-TW) 日本語(JA) 한국어(KO) Melayu(MS) Français(FR) Deutsch(DE)

用 Java 读取 PDF 遇到中文标签该怎么处理

黄舟 2017-04-17 11:43:26

641

我使用 iText 去读取 PDF 内的信息，使用如下方法可以将有标签的 PDF 转换成 xml，可是遇到中文标签（不是正文中出现中文）的时候会出现乱码

TaggedPdfReaderTool readertool = new TaggedPdfReaderTool();
PdfReader reader = new PdfReader(pdfPath);
readertool.convertToXml(reader, new FileOutputStream(xmlPath));
reader.close();

出现的内容类似于

<？？-？？-？？>标题</??-??-??>

正确的应该是

<标题>标题</标题>

有什么方法可以处理掉这些乱码内容

黄舟

人生最曼妙的风景，竟是内心的淡定与从容！

reply all(0)

Php8, I'm coming too

Learn website layout in 30 minutes

Shangguan Oracle Beginner to Proficient Video Tutorial

Your first line of UNI-APP code

Flutter from scratch to app launch

Brother Lian New Linux Video Tutorial

AXURE 9 Video Tutorial (Suitable for Product Manager Interactive Product Design UI)

Zero Basic Proficiency PS Video Tutorial

16 day UI video tutorial to get you started

PS Techniques and Slicing Techniques Video Tutorial

Alibaba Cloud Environment Construction and Project Launch Video Tutorial

Overview of Computer Networks - Basic Knowledge that Programmers Must Master

Essential Tutorial for Programmers - HTTP Protocol Explanation

Websocket Video Tutorial