Home > Java > javaTutorial > Detailed explanation of how to read PDF and WORD documents in JAVA

Detailed explanation of how to read PDF and WORD documents in JAVA

Y2J
Release: 2017-04-28 09:56:53
Original
3185 people have browsed it

This article mainly introduces JAVA to read PDF and WORD documents through example code. Friends in need can refer to the following

Read PDF file jar reference

<dependency>
  <groupid>org.apache.pdfbox</groupid>
  pdfbox</artifactid>
  <version>1.8.13</version>
</dependency>
Copy after login

Read WORD file jar reference

<dependency>
  <groupid>org.apache.poi</groupid>
  poi-scratchpad</artifactid>
  <version>3.16-beta1</version>
</dependency>
<dependency>
  <groupid>org.apache.poi</groupid>
  poi</artifactid>
  <version>3.16-beta1</version>
</dependency>
Copy after login

Read WORD file method

/**
   * 
   * @Title: getTextFromWord
   * @Description: 读取word
   * @param filePath
   *      文件路径
   * @return: String 读出的Word的内容
   */
  public static String getTextFromWord(String filePath) {
    String result = null;
    File file = new File(filePath);
    FileInputStream fis = null;
    try {
      fis = new FileInputStream(file);
      @SuppressWarnings("resource")
      WordExtractor wordExtractor = new WordExtractor(fis);
      result = wordExtractor.getText();
    } catch (FileNotFoundException e) {
      e.printStackTrace();
    } catch (IOException e) {
      e.printStackTrace();
    } finally {
      if (fis != null) {
        try {
          fis.close();
        } catch (IOException e) {
          e.printStackTrace();
        }
      }
    }
    return result;
  }
Copy after login

Read PDF file method

/**
 * 
 * @Title: getTextFromPdf
 * @Description: 读取pdf文件内容
 * @param filePath
 * @return: 读出的pdf的内容
 */
public static String getTextFromPdf(String filePath) {
  String result = null;
  FileInputStream is = null;
  PDDocument document = null;
  try {
    is = new FileInputStream(filePath);
    PDFParser parser = new PDFParser(is);
    parser.parse();
    document = parser.getPDDocument();
    PDFTextStripper stripper = new PDFTextStripper();
    result = stripper.getText(document);
  } catch (FileNotFoundException e) {
    e.printStackTrace();
  } catch (IOException e) {
    e.printStackTrace();
  } finally {
    if (is != null) {
      try {
        is.close();
      } catch (IOException e) {
        e.printStackTrace();
      }
    }
    if (document != null) {
      try {
        document.close();
      } catch (IOException e) {
        e.printStackTrace();
      }
    }
  }
  return result;
}
Copy after login

The above is the detailed content of Detailed explanation of how to read PDF and WORD documents in JAVA. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template