首页 > 其他分享 >org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be i

org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be i

时间:2022-12-06 15:14:53浏览次数:62  
标签:XML Documents 读取 poi part POI ppt apache Open

异常:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

1、场景

项目中需要使用到读取 word 文档中的内容,使用的工具是 apache poi 来实现 word 、ppt 、excel 等文件的读取。在开发过程中,读取文件的过程中,出现了异常: org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)


2、分析

office中,ppt 文档的保存是有 ppt(office 2003-2007) 和 pptx 两种格式的。在 apche poi 中,对 不同格式的 ppt 文档是不同类进行支持的。

图示中使用的是 XMLSlideShow 类读取 ppt 格式的文档,而 XMLSlideShow 是只支持 pptx 格式的文档的读取,所以会报错。

错误示例:


3、ppt 和 pptx 文档读取详解

现在对读取两种格式的ppt的读取,做正确的示例代码详解:

读取 ppt

// 使用 HSLFSlideShow 类读取 ppt 格式文档

// --------- ppt -----------
File file = new File("E:\\search-file\\44.ppt");
FileInputStream fis = null;
HSLFSlideShow document = null;
SlideShowExtractor extractor = null;
try {
    fis = new FileInputStream(file);
    document = new HSLFSlideShow(fis);
    extractor = new SlideShowExtractor(document);
    log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
    e.printStackTrace();
}

格式使用错误就会报错:org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)

读取 pptx

// 使用 XMLSlideShow 类读取 pptx 格式的文档

// --------- pptx -----------
File file = new File("E:\\search-file\\33.pptx");
FileInputStream fis = null;
XMLSlideShow document = null;
SlideShowExtractor extractor = null;
try {
    fis = new FileInputStream(file);
    document = new XMLSlideShow(fis);
    extractor = new SlideShowExtractor(document);
    log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
    e.printStackTrace();
}

XWPFDocument 类读取 doc 格式文档使用错误会报错:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

4、总结

apache poi 工具还是很强大的,功能非常多,对具体使用也可参考 apache poi 的官方文档:

https://poi.apache.org/apidocs/index.html

请注意自己使用的 apache poi 的版本,参考对应版本的 javadocs

标签:XML,Documents,读取,poi,part,POI,ppt,apache,Open
From: https://www.cnblogs.com/xiangningdeguang/p/16955308.html

相关文章