首页 > 其他分享 >itextsharp upgrade to itext7

itextsharp upgrade to itext7

时间:2024-01-10 14:34:52浏览次数:29  
标签:upgrade text when itext7 page itextsharp new extract pages

Why am I getting duplicate pages extracted from iText7 C#?

Actually it is not the same text being returned from sequential pages. Instead you get

  • the text from page 1 when you extract page 1;
  • the text from pages 1 and 2 when you extract page 2;
  • the text from pages 1, 2, and 3 when you extract page 3;
  • ...

Often this happens for code that re-uses a text extraction strategy for multiple pages. But that's not the case in your code, you correctly create a new strategy object for each page. Thus the cause must be in the PDF itself.

And indeed, each page of your document does contain the contents of all previous pages, too, merely outside its crop box. To extract only the text in the respective page crop box you have to filter, e.g. like this:

string SRC = @"285187.pdf";

PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC));

Console.WriteLine("\n285187 Filtered\n============\n");

for (int i = 1; i <= pdfDoc.GetNumberOfPages(); i++)
    var strategy = new SimpleTextExtractionStrategy();
    var pdfPage = pdfDoc.GetPage(i);

    var filter = new IEventFilter[1];
    filter[0] = new TextRegionEventFilter(pdfPage.GetCropBox());
    var filteredTextEventListener = new FilteredTextEventListener(strategy, filter);

    var currentText = PdfTextExtractor.GetTextFromPage(pdfPage, filteredTextEventListener);

    Console.WriteLine("PAGE {0}", i);




From: https://www.cnblogs.com/chucklu/p/17956407


  • NX-OS Upgrade步骤vPC概述
        第一章:        介绍了Nexus3048的NX-OS升级方法。        介绍了Nexus3048的License导入方法。    第二章:        介绍了采用vPC技术所带来的好处。        介绍了vPC的术语及2种部署拓扑类型。        介绍了vPC的配置。1.......
  • helm upgrade rancher server from v2.7.5 to v2.7.8 in kubernetes【helm 升级 ranch
  • [ME]Backup, upgrade & installation
  • python.exe -m pip install --upgrade pip什么问题
  • org.apache.subversion.javahl.ClientException: The working copy needs to be upgra
  • Upgrade-Insecure-Requests:1 详解
    Upgrade-Insecure-Requests:1Upgrade-Insecure-Requests 是一个HTTP响应头,用于向浏览器发出指示,要求浏览器使用HTTPS加密协议来访问网站,以提高网站的安全性。当浏览器收到这个响应头时,它会自动将所有的HTTP请求转换为HTTPS请求,从而避免使用不安全的HTTP协议......
  • Grafana导入 json 文件的 dashboard 错误 Templating Failed to upgrade legacy queri
  • Gitlab upgrade paths
    UpgradepathsUpgradingacrossmultipleGitLabversionsinonegois onlypossiblebyacceptingdowntime.Ifyoudon’twantanydowntime,readhowto upgradewithzerodowntime.Foradynamicviewofexamplesofsupportedupgradepaths,trythe UpgradePa......
  • itext7.pdfhtml For C#
  • 漏洞修复系列-如何升级linux系统Upgrade to PostgreSQL JDBC Driver version 42.2.27,