首页 > 编程问答 >从文档中删除文本,只留下模板

从文档中删除文本,只留下模板

时间:2024-07-29 07:12:28浏览次数:12  
标签:python opencv computer-vision ocr

我使用 doctr 库来识别文本并获取 pdf 文档中文本的坐标。但是,我根本不需要此文档中的文本,只需要文档模板。

我正在寻找如何删除文本的解决方案,并决定最好遍历获得的坐标并删除这些坐标内的文本。

我开始研究如何在 Python 中实现这一点。不幸的是,信息不多,但我设法找到了这个视频 https://www.youtube.com/watch?v=3RNPJbUHZKs 这个原则对我来说更加清晰,并且我尝试去实现它。但问题是:

  1. 该视频使用 keras_ocr 进行文本识别,x 和 y 坐标采用不同的数据格式,与坐标来自 doctr 的格式不同
  2. 当视频中有大量文本时文档(通常超过 100 项),程序运行时间超过 10 分钟。如您所知,这很长。

关于坐标:

  1. 在 doctr 中,点 (0,0) 位于左上角。
  2. 我收到的文本坐标的格式为 ((x1, y1),(x2,y2)。其中x1,y1是识别出的单词的左上点,x2,y2是识别出的单词的右下点。

下面我以这个文件为例| ||我通过 doctr 运行此文档。生成的文件将包含单词和相应的坐标,因此我将它们提取到一个单独的列表中 enter image description here

如您所见,这是一个非常大的列表。坐标,主要问题是视频中的算法运行缓慢也许你可以告诉我如何解决我的问题。

coordinates = [((50.66015624999998, 179.25390625), (96.70703124999997, 198.98828125)), ((53.949218750000014, 235.16796875), (111.50781249999999, 254.90234375)), ((53.949218750000014, 263.125), (129.59765625000003, 281.21484375)), ((53.949218750000014, 291.08203125), (170.7109375, 310.81640625)), ((52.30468750000003, 317.39453125), (103.28515624999999, 337.12890625)), ((53.949218750000014, 366.73046875), (111.50781249999999, 386.46484375)), ((53.949218750000014, 394.6875), (129.59765625000003, 412.77734375)), ((52.30468750000003, 422.64453125), (172.35546875, 447.3125)), ((50.66015624999998, 447.3125), (106.57421875000003, 471.98046875)), ((53.949218750000014, 499.9375), (111.50781249999999, 519.671875)), ((53.949218750000014, 526.25), (129.59765625000003, 545.984375)), ((53.949218750000014, 557.49609375), (170.7109375, 577.23046875)), ((52.30468750000003, 596.96484375), (134.53124999999997, 615.0546875)), ((137.82031250000003, 596.96484375), (282.53906249999994, 615.0546875)), ((76.97265625000001, 687.4140625), (175.64453125000003, 705.50390625)), ((175.64453125000003, 687.4140625), (241.42578125, 705.50390625)), ((241.42578125, 687.4140625), (317.07421875, 705.50390625)), ((479.8828125, 180.8984375), (553.88671875, 212.14453125)), ((552.2421875, 179.25390625), (603.22265625, 213.7890625)), ((606.51171875, 180.8984375), (723.2734375, 212.14453125)), ((943.6406249999999, 180.8984375), (1001.19921875, 207.2109375)), ((1002.8437499999999, 180.8984375), (1017.64453125, 203.921875)), ((1022.578125, 179.25390625), (1048.890625, 203.921875)), ((295.6953125, 213.7890625), (340.09765625, 235.16796875)), ((333.51953125, 212.14453125), (391.07812500000006, 236.8125)), ((647.625, 235.16796875), (682.16015625, 256.546875)), ((680.5156250000001, 235.16796875), (705.18359375, 256.546875)), ((705.18359375, 236.8125), (767.67578125, 256.546875)), ((770.96484375, 236.8125), (849.90234375, 256.546875)), ((769.3203125, 289.4375), (828.5234375, 314.10546875)), ((835.1015624999999, 289.4375), (914.0390625, 314.10546875)), ((922.26171875, 292.7265625), (1012.7109375, 312.4609375)), ((544.01953125, 319.0390625), (593.35546875, 345.3515625)), ((596.64453125, 320.68359375), (613.08984375, 337.12890625)), ((307.20703125, 345.3515625), (356.54296875, 365.0859375)), ((346.67578125, 345.3515625), (377.921875, 366.73046875)), ((649.26953125, 346.99609375), (726.5624999999999, 365.0859375)), ((726.5624999999999, 346.99609375), (785.765625, 365.0859375)), ((649.26953125, 366.73046875), (703.5390625, 386.46484375)), ((705.18359375, 366.73046875), (775.8984375, 386.46484375)), ((647.625, 393.04296875), (688.73828125, 414.421875)), ((690.3828125, 393.04296875), (784.12109375, 417.7109375)), ((647.625, 422.64453125), (710.1171874999999, 442.37890625)), ((645.98046875, 447.3125), (685.44921875, 473.625)), ((685.44921875, 450.6015625), (761.09765625, 470.3359375)), ((399.30078125, 366.73046875), (471.66015625, 386.46484375)), ((471.66015625, 365.0859375), (494.68359375, 388.109375)), ((544.01953125, 453.890625), (593.35546875, 480.203125)), ((172.35546875, 476.9140625), (231.55859375, 501.58203125)), ((224.98046875, 476.9140625), (361.4765625, 499.9375)), ((354.8984375, 478.55859375), (440.4140625, 498.29296875)), ((435.48046875, 475.26953125), (481.52734375, 501.58203125)), ((476.59375, 478.55859375), (509.484375, 499.9375)), ((770.96484375, 501.58203125), (825.234375, 521.31640625)), ((836.74609375, 501.58203125), (912.39453125, 521.31640625)), ((922.26171875, 501.58203125), (1012.7109375, 521.31640625)), ((647.625, 554.20703125), (715.05078125, 580.51953125)), ((715.05078125, 554.20703125), (782.4765625, 580.51953125)), ((784.12109375, 555.8515625), (849.90234375, 575.5859375)), ((851.546875, 555.8515625), (910.7499999999999, 580.51953125)), ((909.10546875, 555.8515625), (974.88671875, 580.51953125)), ((971.59765625, 557.49609375), (1002.8437499999999, 577.23046875)), ((1002.8437499999999, 559.140625), (1063.69140625, 578.875)), ((1063.69140625, 557.49609375), (1116.31640625, 575.5859375)), ((647.625, 575.5859375), (708.47265625, 595.3203125)), ((708.47265625, 573.94140625), (792.34375, 598.609375)), ((647.625, 595.3203125), (713.40625, 619.98828125)), ((831.8125, 596.96484375), (891.015625, 616.69921875)), ((988.04296875, 592.03125), (1016.0000000000001, 616.69921875)), ((1014.35546875, 596.96484375), (1062.046875, 619.98828125)), ((782.4765625, 626.56640625), (840.0351562500001, 646.30078125)), ((840.0351562500001, 628.2109375), (866.34765625, 646.30078125)), ((863.05859375, 624.921875), (886.08203125, 647.9453125)), ((882.79296875, 624.921875), (941.99609375, 649.58984375)), ((940.3515625000001, 624.921875), (978.17578125, 646.30078125)), ((976.53125, 628.2109375), (1043.95703125, 646.30078125)), ((1045.6015625, 626.56640625), (1126.18359375, 651.234375)), ((798.921875, 687.4140625), (904.171875, 705.50390625)), ((904.171875, 687.4140625), (979.8203125, 705.50390625)), ((979.8203125, 685.76953125), (1024.22265625, 705.50390625)), ((669.00390625, 651.234375), (715.05078125, 670.96875)), ((711.76171875, 649.58984375), (746.296875, 670.96875)), ((782.4765625, 644.65625), (818.6562499999999, 666.03515625)), ((817.01171875, 644.65625), (840.0351562500001, 667.6796875)), ((835.1015624999999, 644.65625), (892.66015625, 669.32421875)), ((460.14843749999994, 664.390625), (619.66796875, 689.05859375)), ((614.734375, 664.390625), (733.140625, 689.05859375)), ((364.765625, 687.4140625), (381.2109375, 705.50390625)), ((381.2109375, 685.76953125), (432.19140625, 705.50390625)), ((463.4375, 687.4140625), (535.796875, 705.50390625)), ((562.109375, 685.76953125), (664.0703125000001, 705.50390625)), ((570.33203125, 702.21484375), (622.95703125, 721.94921875)), ((616.37890625, 702.21484375), (655.8476562499999, 721.94921875)), ((573.62109375, 718.66015625), (591.7109375, 738.39453125)), ((632.82421875, 717.015625), (655.8476562499999, 740.0390625)), ((573.62109375, 746.6171875), (591.7109375, 764.70703125)), ((631.1796875, 744.97265625), (655.8476562499999, 766.3515625)), ((573.62109375, 774.57421875), (591.7109375, 792.6640625)), ((631.1796875, 772.9296875), (655.8476562499999, 794.30859375)), ((573.62109375, 800.88671875), (591.7109375, 818.9765625)), ((632.82421875, 800.88671875), (652.55859375, 818.9765625)), ((573.62109375, 828.84375), (591.7109375, 845.2890625)), ((634.4687500000001, 827.19921875), (652.55859375, 846.93359375)), ((575.265625, 855.15625), (591.7109375, 873.24609375)), ((634.4687500000001, 855.15625), (652.55859375, 873.24609375)), ((575.265625, 883.11328125), (591.7109375, 901.203125)), ((634.4687500000001, 881.46875), (652.55859375, 901.203125)), ((575.265625, 909.42578125), (590.06640625, 927.515625)), ((634.4687500000001, 911.0703125), (652.55859375, 927.515625)), ((499.6171875, 960.40625), (578.5546875, 985.07421875)), ((573.62109375, 960.40625), (692.02734375, 985.07421875)), ((52.30468750000003, 939.02734375), (123.01953125000001, 957.1171875)), ((124.6640625, 937.3828125), (190.44531249999997, 957.1171875)), ((58.88281249999997, 981.78515625), (150.97656249999997, 1001.51953125)), ((142.75390624999997, 981.78515625), (187.15625, 1001.51953125)), ((231.55859375, 983.4296875), (317.07421875, 1001.51953125)), ((601.578125, 983.4296875), (719.984375, 1003.1640625)), ((721.62890625, 981.78515625), (853.19140625, 1004.80859375)), ((678.87109375, 1019.609375), (690.3828125, 1029.4765625)), ((565.3984375, 1016.3203125), (603.22265625, 1031.12109375)), ((599.93359375, 1016.3203125), (622.95703125, 1031.12109375)), ((619.66796875, 1014.67578125), (665.71484375, 1034.41015625)), ((667.359375, 1019.609375), (683.8046875, 1031.12109375)), ((692.02734375, 1016.3203125), (721.62890625, 1031.12109375)), ((719.984375, 1016.3203125), (744.65234375, 1031.12109375)), ((741.36328125, 1014.67578125), (805.5, 1032.765625)), ((803.85546875, 1016.3203125), (828.5234375, 1031.12109375)), ((825.234375, 1016.3203125), (863.05859375, 1031.12109375)), ((861.4140625, 1016.3203125), (887.7265625, 1031.12109375)), ((649.26953125, 1026.1875), (667.359375, 1040.98828125)), ((665.71484375, 1027.83203125), (705.18359375, 1040.98828125)), ((703.5390625, 1027.83203125), (726.5624999999999, 1042.6328125)), ((724.91796875, 1029.4765625), (738.07421875, 1039.34375)), ((736.4296875, 1026.1875), (769.3203125, 1040.98828125)), ((767.67578125, 1027.83203125), (792.34375, 1040.98828125)), ((789.0546875, 1026.1875), (812.078125, 1040.98828125)), ((997.91015625, 980.140625), (1040.66796875, 1006.453125)), ((1037.37890625, 980.140625), (1094.9375, 1004.80859375)), ((1020.93359375, 1006.453125), (1037.37890625, 1026.1875)), ((968.3085937499999, 1003.1640625), (1024.22265625, 1027.83203125)), ((1071.9140625, 1003.1640625), (1134.40625, 1027.83203125)), ((62.17187500000001, 1004.80859375), (103.28515624999999, 1026.1875)), ((132.88671875, 1003.1640625), (187.15625, 1027.83203125)), ((216.7578125, 1004.80859375), (256.2265625, 1026.1875)), ((284.18359375, 1006.453125), (333.51953125, 1026.1875)), ((358.18750000000006, 1008.09765625), (435.48046875, 1026.1875)), ((458.50390625, 1004.80859375), (497.97265625, 1024.54296875)), ((519.3515625, 1003.1640625), (583.48828125, 1021.25390625)), ((580.19921875, 1003.1640625), (626.2460937500001, 1022.8984375)), ((619.66796875, 1004.80859375), (657.4921875, 1019.609375)), ((655.8476562499999, 1008.09765625), (665.71484375, 1017.96484375)), ((665.71484375, 1003.1640625), (715.05078125, 1021.25390625)), ((710.1171874999999, 1004.80859375), (734.78515625, 1019.609375)), ((733.140625, 1006.453125), (743.0078125, 1017.96484375)), ((746.296875, 1004.80859375), (784.12109375, 1019.609375)), ((795.6328125, 1004.80859375), (838.390625, 1021.25390625)), ((833.45703125, 1004.80859375), (849.90234375, 1019.609375)), ((846.61328125, 1006.453125), (882.79296875, 1019.609375)), ((882.79296875, 1006.453125), (905.81640625, 1019.609375)), ((905.81640625, 1004.80859375), (922.26171875, 1019.609375)), ((920.6171875, 1006.453125), (933.7734375, 1019.609375)), ((463.4375, 1024.54296875), (491.39453125000006, 1045.921875)), ((941.99609375, 1133.08203125), (1091.6484375, 1162.68359375)), ((953.5078125, 1164.328125), (1040.66796875, 1184.0625)), ((1048.890625, 1164.328125), (1136.05078125, 1184.0625)), ((659.13671875, 1235.04296875), (729.8515625, 1253.1328125)), ((729.8515625, 1233.3984375), (795.6328125, 1253.1328125)), ((678.87109375, 1259.7109375), (726.5624999999999, 1282.734375)), ((724.91796875, 1261.35546875), (803.85546875, 1281.08984375)), ((52.30468750000003, 1258.06640625), (91.77343750000001, 1272.8671875)), ((90.12890624999997, 1258.06640625), (113.15234374999997, 1274.51171875)), ((108.21875000000001, 1256.421875), (139.46484375, 1276.15625)), ((132.88671875, 1258.06640625), (149.33203125, 1274.51171875)), ((144.39843750000003, 1258.06640625), (208.53515625, 1276.15625)), ((203.6015625, 1258.06640625), (224.98046875, 1274.51171875)), ((218.40234375, 1256.421875), (257.87109375, 1276.15625)), ((254.58203125, 1258.06640625), (305.56249999999994, 1276.15625)), ((302.2734375, 1259.7109375), (326.94140625, 1274.51171875)), ((322.00781249999994, 1258.06640625), (374.63281250000006, 1276.15625)), ((369.69921875, 1258.06640625), (387.7890625, 1274.51171875)), ((384.49999999999994, 1259.7109375), (414.10156249999994, 1274.51171875)), ((412.45703125000006, 1258.06640625), (476.59375, 1276.15625)), ((471.66015625, 1258.06640625), (488.10546875, 1274.51171875)), ((486.4609375, 1259.7109375), (524.28515625, 1274.51171875)), ((522.640625, 1258.06640625), (545.6640625, 1274.51171875)), ((540.73046875, 1258.06640625), (598.2890625, 1276.15625)), ((52.30468750000003, 1282.734375), (83.55078125000003, 1302.46875)), ((78.6171875, 1284.37890625), (123.01953125000001, 1304.11328125)), ((118.0859375, 1286.0234375), (136.17578124999997, 1302.46875)), ((131.2421875, 1284.37890625), (185.51171875, 1302.46875)), ((182.22265624999997, 1286.0234375), (215.11328125000003, 1300.82421875)), ((211.82421874999997, 1286.0234375), (229.91406250000003, 1300.82421875)), ((224.98046875, 1282.734375), (249.6484375, 1304.11328125)), ((244.71484375000003, 1286.0234375), (294.05078125, 1304.11328125)), ((290.76171875000006, 1284.37890625), (307.20703125, 1302.46875)), ((303.91796875, 1284.37890625), (366.41015625000006, 1304.11328125)), ((364.765625, 1287.66796875), (400.9453125, 1300.82421875)), ((399.30078125, 1286.0234375), (419.03515625, 1304.11328125)), ((415.74609375, 1284.37890625), (438.76953124999994, 1300.82421875)), ((435.48046875, 1284.37890625), (479.8828125, 1304.11328125)), ((476.59375, 1286.0234375), (493.0390625, 1302.46875)), ((489.75, 1286.0234375), (509.484375, 1300.82421875)), ((506.19531249999994, 1286.0234375), (529.21875, 1300.82421875)), ((527.57421875, 1284.37890625), (585.1328125, 1304.11328125)), ((49.01562499999999, 1350.16015625), (119.73046874999999, 1369.89453125)), ((118.0859375, 1351.8046875), (160.84375000000003, 1371.5390625)), ((157.55468749999997, 1351.8046875), (173.99999999999997, 1368.25)), ((170.7109375, 1351.8046875), (234.84765624999997, 1369.89453125)), ((233.203125, 1351.8046875), (297.33984375, 1369.89453125)), ((295.6953125, 1353.44921875), (325.296875, 1368.25)), ((323.65234375, 1351.8046875), (340.09765625, 1368.25)), ((336.80859375000006, 1350.16015625), (392.72265624999994, 1369.89453125)), ((386.14453125, 1350.16015625), (415.74609375, 1369.89453125)), ((412.45703125000006, 1355.09375), (442.05859375, 1368.25)), ((440.4140625, 1353.44921875), (470.015625, 1368.25)), ((468.37109374999994, 1350.16015625), (512.7734375, 1369.89453125)), ((511.12890625, 1355.09375), (539.0859375, 1369.89453125)), ((537.44140625, 1351.8046875), (553.88671875, 1368.25)), ((550.59765625, 1351.8046875), (591.7109375, 1371.5390625)), ((50.66015624999998, 1379.76171875), (118.0859375, 1397.8515625)), ((114.79687500000003, 1379.76171875), (132.88671875, 1397.8515625)), ((129.59765625000003, 1379.76171875), (152.62109375000003, 1396.20703125)), ((150.97656249999997, 1381.40625), (188.80078124999997, 1396.20703125)), ((187.15625, 1379.76171875), (213.46874999999997, 1397.8515625)), ((210.1796875, 1379.76171875), (231.55859375, 1396.20703125)), ((229.91406250000003, 1378.1171875), (282.53906249999994, 1396.20703125)), ((279.25, 1379.76171875), (295.6953125, 1396.20703125)), ((292.40625, 1379.76171875), (315.4296875, 1396.20703125)), ((313.78515624999994, 1379.76171875), (359.83203124999994, 1399.49609375)), ((356.54296875, 1381.40625), (377.921875, 1396.20703125)), ((372.98828125, 1378.1171875), (422.32421874999994, 1397.8515625)), ((420.67968750000006, 1381.40625), (446.9921875, 1396.20703125)), ((445.34765625, 1383.05078125), (458.50390625, 1396.20703125)), ((456.859375, 1379.76171875), (474.94921875000006, 1396.20703125)), ((471.66015625, 1379.76171875), (530.86328125, 1397.8515625)), ((529.21875, 1381.40625), (558.8203125, 1394.5625)), ((555.53125, 1378.1171875), (585.1328125, 1397.8515625)), ((581.84375, 1381.40625), (619.66796875, 1394.5625)), ((101.640625, 1272.8671875), (134.53124999999997, 1287.66796875)), ((131.2421875, 1272.8671875), (149.33203125, 1287.66796875)), ((144.39843750000003, 1271.22265625), (167.42187500000003, 1287.66796875)), ((50.66015624999998, 1271.22265625), (104.92968749999997, 1289.3125)), ((164.1328125, 1271.22265625), (213.46874999999997, 1290.95703125)), ((210.1796875, 1272.8671875), (228.26953124999997, 1287.66796875)), ((224.98046875, 1271.22265625), (272.671875, 1289.3125)), ((170.7109375, 1313.98046875), (197.02343749999997, 1332.0703125)), ((807.14453125, 1258.06640625), (828.5234375, 1282.734375)), ((708.47265625, 1286.0234375), (743.0078125, 1305.7578125)), ((741.36328125, 1286.0234375), (800.56640625, 1305.7578125)), ((820.30078125, 1286.0234375), (886.08203125, 1305.7578125)), ((895.94921875, 1287.66796875), (914.0390625, 1304.11328125)), ((951.8632812499999, 1286.0234375), (1024.22265625, 1310.69140625)), ((1022.578125, 1287.66796875), (1042.3125, 1305.7578125)), ((751.23046875, 1305.7578125), (831.8125, 1325.4921875)), ((831.8125, 1305.7578125), (884.4375, 1325.4921875)), ((882.79296875, 1304.11328125), (978.17578125, 1328.78125)), ((978.17578125, 1305.7578125), (997.91015625, 1323.84765625)), ((647.625, 1353.44921875), (675.58203125, 1368.25)), ((672.2929687500001, 1351.8046875), (718.3398437499999, 1371.5390625)), ((716.6953125, 1353.44921875), (749.5859375, 1368.25)), ((746.296875, 1350.16015625), (774.25390625, 1371.5390625)), ((769.3203125, 1351.8046875), (810.4335937499999, 1371.5390625)), ((807.14453125, 1351.8046875), (858.125, 1371.5390625)), ((856.4804687500001, 1353.44921875), (874.5703125, 1369.89453125)), ((872.9257812500001, 1353.44921875), (899.23828125, 1371.5390625)), ((895.94921875, 1351.8046875), (956.7968750000001, 1371.5390625)), ((955.15234375, 1351.8046875), (1002.8437499999999, 1369.89453125)), ((1001.19921875, 1353.44921875), (1058.7578125, 1371.5390625)), ((1055.46875, 1351.8046875), (1073.55859375, 1369.89453125)), ((1071.9140625, 1351.8046875), (1114.671875, 1373.18359375)), ((50.66015624999998, 1332.0703125), (103.28515624999999, 1351.8046875)), ((106.57421875000003, 1330.42578125), (177.2890625, 1355.09375)), ((175.64453125000003, 1332.0703125), (257.87109375, 1351.8046875)), ((257.87109375, 1332.0703125), (287.47265625, 1351.8046875)), ((284.18359375, 1332.0703125), (323.65234375, 1351.8046875)), ((320.36328125000006, 1332.0703125), (346.67578125, 1353.44921875)), ((345.03125000000006, 1333.71484375), (410.8125, 1353.44921875)), ((409.16796875, 1330.42578125), (432.19140625, 1353.44921875)), ((428.90234375000006, 1332.0703125), (465.08203125, 1351.8046875)), ((465.08203125, 1333.71484375), (540.73046875, 1353.44921875)), ((542.375, 1333.71484375), (580.19921875, 1355.09375)), ((580.19921875, 1333.71484375), (603.22265625, 1350.16015625)), ((603.22265625, 1330.42578125), (695.31640625, 1355.09375)), ((696.9609375, 1330.42578125), (733.140625, 1351.8046875)), ((729.8515625, 1330.42578125), (756.1640625000001, 1351.8046875)), ((752.875, 1328.78125), (812.078125, 1353.44921875)), ((810.4335937499999, 1337.00390625), (820.30078125, 1348.515625)), ((818.6562499999999, 1328.78125), (938.70703125, 1358.3828125)), ((932.1289062500001, 1328.78125), (971.59765625, 1355.09375)), ((968.3085937499999, 1332.0703125), (1001.19921875, 1355.09375)), ((50.66015624999998, 1364.9609375), (101.640625, 1383.05078125)), ((98.35156250000003, 1366.60546875), (121.37500000000003, 1381.40625)), ((118.0859375, 1363.31640625), (159.19921874999997, 1383.05078125)), ((155.91015625, 1366.60546875), (182.22265624999997, 1381.40625)), ((178.93359375, 1364.9609375), (238.13671875000003, 1383.05078125)), ((233.203125, 1366.60546875), (295.6953125, 1384.6953125)), ((292.40625, 1364.9609375), (349.96484375, 1383.05078125)), ((345.03125000000006, 1366.60546875), (361.4765625, 1383.05078125)), ((359.83203124999994, 1366.60546875), (381.2109375, 1381.40625)), ((379.56640625, 1368.25), (412.45703125000006, 1383.05078125)), ((410.8125, 1364.9609375), (491.39453125000006, 1383.05078125)), ((488.10546875, 1366.60546875), (514.41796875, 1381.40625)), ((511.12890625, 1366.60546875), (542.375, 1381.40625)), ((540.73046875, 1366.60546875), (567.04296875, 1381.40625)), ((563.75390625, 1368.25), (591.7109375, 1381.40625)), ((591.7109375, 1366.60546875), (621.3125, 1381.40625)), ((647.625, 1369.89453125), (675.58203125, 1384.6953125)), ((673.9375, 1368.25), (692.02734375, 1386.33984375)), ((692.02734375, 1369.89453125), (724.91796875, 1384.6953125)), ((726.5624999999999, 1369.89453125), (762.7421875, 1384.6953125)), ((762.7421875, 1368.25), (820.30078125, 1387.984375)), ((50.66015624999998, 1392.91796875), (118.0859375, 1412.65234375)), ((984.75390625, 1386.33984375), (1047.24609375, 1406.07421875)), ((1048.890625, 1386.33984375), (1122.89453125, 1406.07421875)), ((50.66015624999998, 1409.36328125), (129.59765625000003, 1427.453125)), ((129.59765625000003, 1409.36328125), (231.55859375, 1427.453125)), ((229.91406250000003, 1412.65234375), (239.78125, 1424.1640625)), ((239.78125, 1409.36328125), (289.1171875, 1427.453125)), ((52.30468750000003, 1425.80859375), (78.6171875, 1443.8984375)), ((73.68359374999997, 1429.09765625), (86.83984375, 1442.25390625)), ((80.26171874999999, 1427.453125), (98.35156250000003, 1442.25390625)), ((93.41796875, 1427.453125), (124.6640625, 1442.25390625)), ((123.01953125000001, 1425.80859375), (146.04296875000003, 1442.25390625)), ((142.75390624999997, 1425.80859375), (162.48828125, 1442.25390625)), ((160.84375000000003, 1427.453125), (190.44531249999997, 1442.25390625)), ((190.44531249999997, 1427.453125), (224.98046875, 1442.25390625)), ((220.04687499999997, 1425.80859375), (267.73828125000006, 1443.8984375)), ((262.8046875, 1427.453125), (285.828125, 1442.25390625)), ((279.25, 1425.80859375), (323.65234375, 1445.54296875)), ((317.07421875, 1425.80859375), (368.05468749999994, 1443.8984375)), ((50.66015624999998, 1437.3203125), (103.28515624999999, 1455.41015625)), ((99.99609375000001, 1438.96484375), (137.82031250000003, 1453.765625)), ((136.17578124999997, 1438.96484375), (157.55468749999997, 1453.765625)), ((152.62109375000003, 1438.96484375), (192.08984375000003, 1453.765625)), ((190.44531249999997, 1438.96484375), (213.46874999999997, 1453.765625)), ((211.82421874999997, 1438.96484375), (229.91406250000003, 1453.765625)), ((228.26953124999997, 1440.609375), (239.78125, 1450.4765625)), ((238.13671875000003, 1438.96484375), (271.02734375, 1453.765625)), ((266.09375, 1435.67578125), (315.4296875, 1455.41015625)), ((310.49609375, 1438.96484375), (330.23046874999994, 1455.41015625)), ((52.30468750000003, 1450.4765625), (118.0859375, 1463.6328125)), ((113.15234374999997, 1448.83203125), (164.1328125, 1466.921875)), ((157.55468749999997, 1450.4765625), (173.99999999999997, 1465.27734375)), ((169.06640625000003, 1450.4765625), (190.44531249999997, 1465.27734375)), ((185.51171875, 1448.83203125), (238.13671875000003, 1466.921875)), ((233.203125, 1448.83203125), (287.47265625, 1466.921875)), ((284.18359375, 1450.4765625), (315.4296875, 1465.27734375)), ((312.14062500000006, 1448.83203125), (340.09765625, 1463.6328125)), ((389.43359375, 1406.07421875), (445.34765625, 1430.7421875)), ((438.76953124999994, 1406.07421875), (504.55078125, 1432.38671875)), ((525.9296875, 1409.36328125), (580.19921875, 1429.09765625)), ((578.5546875, 1409.36328125), (645.98046875, 1429.09765625)), ((427.2578125, 1435.67578125), (446.9921875, 1453.765625)), ((443.703125, 1432.38671875), (499.6171875, 1457.0546875)), ((420.67968750000006, 1458.69921875), (445.34765625, 1480.078125)), ((440.4140625, 1458.69921875), (483.17187500000006, 1478.43359375)), ((769.3203125, 1409.36328125), (851.546875, 1427.453125)), ((849.90234375, 1409.36328125), (951.8632812499999, 1427.453125)), ((950.21875, 1411.0078125), (961.73046875, 1427.453125)), ((961.73046875, 1409.36328125), (1029.15625, 1427.453125)), ((1025.8671875, 1406.07421875), (1078.4921875, 1430.7421875)), ((769.3203125, 1427.453125), (803.85546875, 1440.609375)), ((802.2109374999999, 1425.80859375), (869.63671875, 1443.8984375)), ((866.34765625, 1427.453125), (900.8828125, 1442.25390625)), ((900.8828125, 1429.09765625), (914.0390625, 1442.25390625)), ((909.10546875, 1425.80859375), (958.44140625, 1443.8984375)), ((955.15234375, 1427.453125), (976.53125, 1442.25390625)), ((973.2421875000001, 1427.453125), (1016.0000000000001, 1443.8984375)), ((1011.0664062499999, 1425.80859375), (1057.11328125, 1443.8984375)), ((1057.11328125, 1427.453125), (1091.6484375, 1442.25390625)), ((1088.359375, 1427.453125), (1127.828125, 1442.25390625)), ((767.67578125, 1437.3203125), (823.58984375, 1455.41015625)), ((820.30078125, 1437.3203125), (866.34765625, 1455.41015625)), ((863.05859375, 1438.96484375), (889.37109375, 1453.765625)), ((882.79296875, 1438.96484375), (918.9726562499999, 1453.765625)), ((915.6835937500001, 1438.96484375), (938.70703125, 1453.765625)), ((935.4179687499999, 1438.96484375), (965.0195312500001, 1453.765625)), ((961.73046875, 1437.3203125), (1007.7773437500001, 1455.41015625)), ((1004.48828125, 1438.96484375), (1037.37890625, 1453.765625)), ((1035.734375, 1438.96484375), (1065.3359375, 1453.765625)), ((1063.69140625, 1438.96484375), (1086.71484375, 1453.765625)), ((1083.42578125, 1438.96484375), (1104.8046875, 1453.765625)), ((1098.2265625, 1435.67578125), (1127.828125, 1455.41015625)), ((769.3203125, 1448.83203125), (823.58984375, 1466.921875)), ((820.30078125, 1448.83203125), (867.9921875, 1466.921875)), ((863.05859375, 1448.83203125), (915.6835937500001, 1466.921875)), ((912.39453125, 1452.12109375), (925.55078125, 1465.27734375)), ((923.9062500000001, 1450.4765625), (950.21875, 1465.27734375)), ((945.28515625, 1450.4765625), (973.2421875000001, 1463.6328125)), ((971.59765625, 1450.4765625), (1039.0234375, 1463.6328125)), ((1037.37890625, 1450.4765625), (1066.98046875, 1463.6328125)), ((1063.69140625, 1450.4765625), (1101.515625, 1465.27734375)), ((552.2421875, 1434.03125), (576.91015625, 1455.41015625)), ((571.9765625, 1435.67578125), (624.6015625, 1455.41015625)), ((553.88671875, 1486.65625), (575.265625, 1503.1015625)), ((571.9765625, 1483.3671875), (659.13671875, 1503.1015625)), ((552.2421875, 1458.69921875), (576.91015625, 1480.078125)), ((571.9765625, 1460.34375), (655.8476562499999, 1478.43359375)), ((655.8476562499999, 1461.98828125), (685.44921875, 1476.7890625)), ((682.16015625, 1460.34375), (700.25, 1478.43359375)), ((698.60546875, 1460.34375), (746.296875, 1478.43359375)), ((772.6093750000001, 1461.98828125), (815.3671875, 1478.43359375)), ((812.078125, 1458.69921875), (866.34765625, 1476.7890625)), ((863.05859375, 1460.34375), (894.3046875, 1475.14453125)), ((894.3046875, 1463.6328125), (905.81640625, 1473.5)), ((904.171875, 1461.98828125), (948.5742187500001, 1475.14453125)), ((945.28515625, 1460.34375), (961.73046875, 1475.14453125)), ((958.44140625, 1461.98828125), (988.04296875, 1476.7890625)), ((986.3984375, 1461.98828125), (1017.64453125, 1476.7890625)), ((1016.0000000000001, 1461.98828125), (1053.82421875, 1476.7890625)), ((1047.24609375, 1461.98828125), (1066.98046875, 1475.14453125)), ((1063.69140625, 1460.34375), (1094.9375, 1475.14453125))]

也就是说,在输出处,我想收到这样的文件。

That is, at the output, I would like to receive a file like this

enter image description here


当然,我可以帮你实现这个目标。下面是你可以用来从PDF文档中移除文本,只留下空白模板的Python代码:

import fitz

def redact_words(pdf_path, coordinates):
  """
  移除PDF文档中指定坐标处的文字。

  参数:
    pdf_path: PDF文档的路径
    coordinates: 要移除的文字坐标列表。每个坐标应为 ((x1, y1), (x2, y2)) 格式,
                 其中 (x1, y1) 是文字左上角的坐标,(x2, y2) 是文字右下角的坐标。
  """

  doc = fitz.open(pdf_path)

  for page in doc:
    for x1, y1, x2, y2 in coordinates:
      rect = fitz.Rect(x1, y1, x2, y2)
      page.add_redact_annot(rect, fill=(1, 1, 1)) # 使用白色填充矩形以遮盖文字

    page.apply_redact()

  redacted_pdf_path = pdf_path.replace(".pdf", "_redacted.pdf")
  doc.save(redacted_pdf_path)
  print(f"已保存移除文字后的文档:{redacted_pdf_path}")

# 使用方法示例
coordinates = [((50.66015624999998, 179.25390625), (96.70703124999997, 198.98828125)), 
            # ... 其他坐标 ...
           ]

pdf_path = "path/to/your.pdf"  # 替换为你的PDF文档路径
redact_words(pdf_path, coordinates)

代码说明:

  1. 导入 fitz 库:
  2. 确保你已经安装了 PyMuPDF 库。可以使用 pip install pymupdf 命令进行安装。
  3. 定义 redact_words 函数:
  4. 接收PDF文档路径和文字坐标列表作为输入。
  5. 使用 fitz.open(pdf_path) 打开PDF文档。
  6. 遍历文档中的每一页。
  7. 对于每一组坐标:
    • 创建一个 fitz.Rect 对象,表示要移除的文字区域。
    • 使用 page.add_redact_annot(rect, fill=(1, 1, 1)) 在文字区域添加一个白色的矩形遮罩。
    • 使用 page.apply_redact() 应用遮罩,移除文字。
  8. 保存移除文字后的PDF文档。

使用方法:

  1. 将示例代码中的 coordinates 列表替换为从 doctr 库中获取的文字坐标列表。
  2. pdf_path 变量替换为你的PDF文档路径。
  3. 运行代码,它将会创建一个名为 "your_redacted.pdf" 的新PDF文档,其中所有指定坐标处的文字都将被移除。

优化建议:

  • 为了提高代码的效率,可以考虑使用多线程或异步处理来并行处理多个页面或文字区域。

希望这可以帮到你!

标签:python,opencv,computer-vision,ocr
From: 78803532

相关文章

  • 如何用Python制作Android服务?
    我想构建一个简单的Android应用程序,例如PushOver应用程序,它具有TCP服务器并接收其记录的文本消息,然后将其作为推送通知发送。这部分已经完成并且工作正常。但即使GUI应用程序关闭,我也想接收消息。我知道这是可能的,因为PushOver应用程序做到了!我想,我可能需要一......
  • Python Discord Bot 的应用程序命令的区域设置名称(多语言别名)
    如何根据用户的语言设置,使应用程序命令的名称具有不同的名称例如,如果一个用户将其discord的语言设置为英语,则用户可以看到英语的应用程序命令名称。另一方面,如果另一个用户将其不和谐语言设置为法语,则用户可以看到法语中的相同应用程序命令的名称。为此,我尝试使用ap......
  • 如何在Python中添加热键?
    我正在为游戏制作一个机器人,我想在按下热键时调用该函数。我已经尝试了一些解决方案,但效果不佳。这是我的代码:defstart():whileTrue:ifkeyboard.is_pressed('alt+s'):break...defmain():whileTrue:ifkeyboard.is_pr......
  • 在Python中解压文件
    我通读了zipfile文档,但不明白如何解压缩文件,只了解如何压缩文件。如何将zip文件的所有内容解压缩到同一目录中?importzipfilewithzipfile.ZipFile('your_zip_file.zip','r')aszip_ref:zip_ref.extractall('target_directory')将......
  • 如何在Python中从RSA公钥中提取N和E?
    我有一个RSA公钥,看起来像-----BEGINPUBLICKEY-----MIIBIDANBgkqhkiG9w0BAQEFAAOCAQ0AMIIBCAKCAQEAvm0WYXg6mJc5GOWJ+5jkhtbBOe0gyTlujRER++cvKOxbIdg8So3mV1eASEHxqSnp5lGa8R9Pyxz3iaZpBCBBvDB7Fbbe5koVTmt+K06o96ki1/4NbHGyRVL/x5fFiVuTVfmk+GZNakH5dXDq0fwvJyVmUtGYA......
  • Swagger、Docker、Python-Flask: : https://editor.swagger.io/ 生成服务器 python-fl
    在https://editor.swagger.io/上您可以粘贴一些json/yaml。我正在将此作为JSON进行测试(不要转换为YAML):{"swagger":"2.0","info":{"version":"1.0","title":"OurfirstgeneratedRES......
  • 使用 Matplotlib 的 Python 代码中出现意外的控制流
    Ubuntu22.04上的此Python3.12代码的行为符合预期,除非我按q或ESC键退出。代码如下:importnumpyasnp,matplotlib.pyplotaspltfrompathlibimportPathfromcollectionsimportnamedtuplefromskimage.ioimportimreadfrommatplotlib.widgets......
  • 参考 - Python 类型提示
    这是什么?这是与在Python中使用类型提示主题相关的问题和答案的集合。这个问题本身就是一个社区维基;欢迎大家参与维护。这是为什么?Python类型提示是一个不断增长的话题,因此许多(可能的)新问题已经被提出,其中许多甚至已经有了答案。该集合有助于查找现有内容。范......
  • 我的 Python 程序中解决 UVa 860 的运行时错误 - 熵文本分析器
    我正在尝试为UVa860编写一个解决方案,但是当我通过vJudge发送它时,它一直显示“运行时错误”。fromsysimportstdinimportmathdefmain():end_of_input=Falselambda_words=0dictionary={}text_entropy=0relative_entropy=0whilenotend_of_in......
  • Python进度条
    当我的脚本正在执行某些可能需要时间的任务时,如何使用进度条?例如,一个需要一些时间才能完成并在完成后返回True的函数。如何在函数执行期间显示进度条?请注意,我需要实时显示进度条,所以我不知道该怎么办。我需要thread为此吗?我不知道。现在在执行函数......