方案总结:
1:poi(html属性支持) 存在一个bug,对于table中的cell中既有文本又有图片的在转化后图片丢失
2:tika(主要是提取内容,转换出来的html不太好)
3.openoffice(依赖安装,转出的html不太好)
4. aspose(功能强大但是付费),但也可以免费使用,缺点:不支持扩展原因代码不是开源的
5.mammoth(对比poi缺少标签的属性比如颜色字体),这个不存在一个cell中既有文本又有图片的在转化后图片丢失的问题,对于样式这块支持扩展,样例很多
1.maven依赖
<dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>3.14</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-examples</artifactId> <version>3.14</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-excelant</artifactId> <version>3.14</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-scratchpad</artifactId> <version>3.14</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>3.14</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>ooxml-schemas</artifactId> <version>1.3</version> </dependency> <dependency> <groupId>com.aspose</groupId> <artifactId>aspose-words</artifactId> <version>18.6</version> <scope>system</scope> <systemPath>${project.basedir}/lib/aspose-words-18.6-jdk16.jar</systemPath> </dependency> <dependency> <groupId>com.aspose</groupId> <artifactId>aspose-cells</artifactId> <version>8.5.2</version> <scope>system</scope> <systemPath>${project.basedir}/lib/aspose-cells-8.5.2.jar</systemPath> </dependency> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox</artifactId> <version>2.0.9</version> </dependency> <dependency> <groupId>fr.opensagres.xdocreport</groupId> <artifactId>xdocreport</artifactId> <version>1.0.6</version> </dependency> <dependency> <groupId>org.apache.xmlbeans</groupId> <artifactId>xmlbeans</artifactId> <version>2.6.0</version> </dependency> <dependency> <groupId>net.sf.cssbox</groupId> <artifactId>pdf2dom</artifactId> <version>1.8</version> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <version>1.16.10</version> </dependency> <!-- Hutool工具类 --> <!-- https://mvnrepository.com/artifact/cn.hutool/hutool-all --> <dependency> <groupId>cn.hutool</groupId> <artifactId>hutool-all</artifactId> <version>5.3.8</version> </dependency>
2.代码实现
2.1 wordBytes2HtmlFile方法
public static File wordBytes2HtmlFile(byte[] wordBytes, String htmlFilePath) { try { log.info("实现`aspose-words`授权 -> 去掉头部水印"); /* 实现匹配文件授权 -> 去掉头部水印 `Evaluation Only. Created with Aspose.Words. Copyright 2003-2018 Aspose Pty Ltd.` | `Evaluation Only. Created with Aspose.Cells for Java. Copyright 2003 - 2020 Aspose Pty Ltd.` */ // InputStream is = new ClassPathResource("license.xml").getInputStream(); // 从当前类加载器中加载资源 InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream("license.xml"); if (is != null) { License license = new License(); license.setLicense(is); } } catch (Exception e) { log.error("《`aspose-words`授权》 失败: {}", e.getMessage()); } // Load word document from disk. com.aspose.words.Document doc = new com.aspose.words.Document(new ByteArrayInputStream(wordBytes)); // Save the document into MHTML. doc.save(htmlFilePath, SaveFormat.HTML); return new File(htmlFilePath); }
2.2 readBytes方法
public static byte[] readBytes(String filePath) { return FileUtil.readBytes(filePath); }
2.3main方法
public static void main(String[] args) { // word2HtmlFile("D:\\doc","JKLJLJLGJ.docx","JKLJLJLGJ.1111.html"); File htmlFile = wordBytes2HtmlFile(readBytes("D:\\doc\\xxxx.docx"), "D:\\doc\\xxxxxxxaaaa.html"); }
标签:latex,docx,aspose,html,words,poi,apache,org From: https://www.cnblogs.com/QAZLIU/p/18284031