首页 > 其他分享 >碑文书法汉字拆分,把字分拆出来高级

碑文书法汉字拆分,把字分拆出来高级

时间:2022-10-21 21:45:39浏览次数:54  
标签:碑文 img cv2 拆分 single 分拆 images prcaligraphy col

此程序的主要目的,就是将碑文图片上的汉字截取出来,并且将文字周围多余边距去除,完成此后模式识别的先前准备工作。

用的是opencv的库,在处理噪音和二值化处理的时候方便一点。

其中涉及了一些在是使用opencv可能遇到的问题,比如矩形轮廓怎么画,用opencv提取出轮廓之后,怎么取舍这些轮廓……

利用如上图所示的方法,对每个字进行切分,即寻找每个谷点。

目录结构如下:

calligraphies里面放的是原始碑文图片,split放的是切分后得到的图片,子文件夹以每个原始图片名命名。

calligraphy_split.py为主程序。

以后可以偷这个站上面的碑文了http://www.yamoke.com/

代码如下,有点长,步骤写得应该还算清楚,英文注释:

import numpy as np import cv2 from matplotlib import pyplot as plt import os class PrCalligraph(object): filename = 0 dirname = "" def cut_img(self, img, flag_pi): row, col = img.shape for i in range(row-1): if img[i, col/2] <= flag_pi: new_up_row = i break for i in range(col-1): if img[row/2, i] <= flag_pi: new_left_col = i break for i in range(row-1, 0, -1): if img[i, col/2] <= flag_pi: new_down_row = i break for i in range(col-1, 0, -1): if img[row/2, i] <= flag_pi: new_right_col = i break print new_up_row, new_left_col, new_down_row, new_right_col return new_up_row, new_left_col, new_down_row, new_right_col # deal the image with binaryzation def thresh_binary(self, img): blur = cv2.GaussianBlur(img, (9, 9), 0) # OTSU's binaryzation ret3, th3 = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU) kernel = np.ones((2, 2), np.uint8) opening = cv2.morphologyEx(th3, cv2.MORPH_OPEN, kernel) return opening # sum the black pixel numbers in each cols def hist_col(self, img): list=[] row, col = img.shape for i in xrange(col): list.append((img[:, i] < 200).sum()) return list # find the each segmentatoin cols def cut_col(self, img, hist_list): minlist = [] images = [] row, col = img.shape np_list = np.array(hist_list) avg = col/8 i = 0 # print np_list while i < col-1: if i >= col-10: if np_list[i] < 40 and np_list[i] <= np_list[i+1: col].min(): minlist.append(i) break if i == col-1: minlist.append(i) break else: if np_list[i] < 40 and np_list[i] < np_list[i+1: i+10].min(): minlist.append(i) i += avg i += 1 print minlist for j in xrange(len(minlist)-1): print j images.append(img[0:row, minlist[j]:minlist[j+1]]) return images # sum the black pixel numbers in each rows def hist_row(self, img): list=[] row, col = img.shape for i in xrange(row): list.append((img[i, :] < 200).sum()) return self.cut_row(img, list) # find each segmentation rows def cut_row(self, img, row_list): minlist = [] single_images_with_rect = [] row, col = img.shape np_list = np.array(row_list) avg = row/16 i = 0 while i <= row-1: if i >= row-10 and np_list[i] == 0: minlist.append(i) break elif np_list[i] == 0 and (np_list[i+1: i+10] < 200).sum() >= 5: minlist.append(i) i += avg i += 1 print minlist for j in xrange(len(minlist)-1): single_img = img[minlist[j]:minlist[j+1], 0:col] single_img_with_rect = self.single_cut(single_img) if single_img_with_rect is not None: single_images_with_rect.append(single_img_with_rect) return single_images_with_rect # find the single word's contours and take off the redundant margin def single_cut(self, img): blur = cv2.GaussianBlur(img, (9, 9), 0) ret3, th3 = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU) contours, hierarchy = cv2.findContours(th3, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) up, left = img.shape down, right = 0, 0 for i in range(len(contours)): cnt = contours[i] x, y, w, h = cv2.boundingRect(cnt) if w < 6 and h < 6: continue if x < up: up = x if y < left: left = y if x+w > down: down = x+w if y+h > right: right = y+h if down-up >= 40 and right-left >= 40: word = img[left:right, up:down] cv2.imwrite(self.dirname+str(self.filename)+'.png', word) cv2.rectangle(img,(up, left), (down, right), (0, 255, 0), 2) self.filename += 1 return img else: return None if __name__ == '__main__': prcaligraphy = PrCalligraph() # read sys origin dir origin_images = os.listdir('./calligraphies/') # handle each picture for im in origin_images: # use for new single word filename prcaligraphy.filename = 0 # take out the original picture's name outdir = os.path.splitext(im)[0] # mkdir output dir name/path prcaligraphy.dirname = "./split/"+outdir+'/' os.makedirs(prcaligraphy.dirname, False) # use opencv read images img = cv2.imread('./calligraphies/'+im, cv2.IMREAD_GRAYSCALE) # preprocess original picture, cutout the redundant margin row, col = img.shape middle_pi = img[row/2, col/2] if middle_pi > 220: middle_pi = 220 else: middle_pi += 10 new_up_row, new_left_col, new_down_row, new_right_col = prcaligraphy.cut_img(img, middle_pi) cutedimg = img[new_up_row:new_down_row, new_left_col:new_right_col] # deal the image with binaryzation opening = prcaligraphy.thresh_binary(cutedimg) # split the image into pieces with cols hist_list = prcaligraphy.hist_col(opening) images = prcaligraphy.cut_col(opening, hist_list) # create two plt fig, axes = plt.subplots(1, len(images), sharex=True, sharey=True) fig2, axes2 = plt.subplots(len(images), 12, sharex=True, sharey=True) # split the pieces into single words by rows for i in range(len(images)): axes[i].imshow(images[i], 'gray') single_images_with_rect = prcaligraphy.hist_row(images[i]) for j in range(len(single_images_with_rect)): axes2[i, j].imshow(single_images_with_rect[j], 'gray') fig.savefig(prcaligraphy.dirname+'cut_col.png') fig2.savefig(prcaligraphy.dirname+'single.png') plt.clf() # plt.show() # cv2.imshow('image', imageee) # cv2.waitKey(0) # cv2.destroyAllWindows()

 

标签:碑文,img,cv2,拆分,single,分拆,images,prcaligraphy,col
From: https://www.cnblogs.com/cihai123/p/16814872.html

相关文章

  • #yyds干货盘点# LeetCode 热题 HOT 100:单词拆分
    题目:给你一个字符串s和一个字符串列表wordDict作为字典。请你判断是否可以利用字典中出现的单词拼接出s。注意:不要求字典中出现的单词全部都使用,并且字典中的单词可以......
  • python拆分表格并发送电子邮件;python窗体应用程序tkinter的使用
    该需求背景是有一个应收逾期表格,里面有很多部门的数据,要把表格按部门拆分成每个部门单独一个EXCEL表格文件,并把拆分出来的各部门文件邮件发送给各部门领导,涉及到的python知......
  • imagemagick: 对损坏的gif图做拆分(ImageMagick 6.9.10)
    一,对正常的gif图拆分:[lhdop@blogimg2]$identifymaoshu.gifmaoshu.gif[0]GIF400x224400x224+0+08-bitsRGB256c0.000u0:00.001maoshu.gif[1]GIF400x22440......
  • 文件系统模块3(拆分文件案例)
    //导入constfs=require('fs')constpath=require('path')//匹配正则表达式//样式constregStyle=/<style>[\s\S]*<\/style>///jsconstregScript=/<s......
  • 【LeetCode】561. 数组拆分 I(C++)
    561.数组拆分I(C++)​​1题目描述​​​​2示例描述​​​​2.1示例1​​​​2.2示例2​​​​3解题提示​​​​4解题思路​​​​5源码详解(C++)​​1题目描述给......
  • 按比例拆分
    问题:以A3为例,茶几点30%,即2099.7元;沙发占70%,即4899.3元以此类推let源=Excel.CurrentWorkbook(){[Name="表1"]}[Content],按分隔符拆分列=Table.ExpandLi......
  • MIRO 凭证拆分 增强
    群里有人问MIRO拆分凭证怎么实现,找出来之前的资料看了下。整理之前的文章,当时也是朋友看的需求。MIRO发票校验,生成的财务发票凭证中,标准逻辑应付是一条,现在的需求是要根据......
  • 从路径中拆分出文件名和后缀
    "函数:拆分文件绝对路径long_filename=l_local_file_path"文件绝对路径C://DOC/TEST.TXTpure_filename=l_pure_filename......
  • 你知道微服务如何拆分,能解决哪些问题?
    你知道微服务如何拆分,能解决哪些问题?文章目录​​你知道微服务如何拆分,能解决哪些问题?​​​​拆分目的是什么?​​​​除此之外,单体架构增加了研发的成本抑制了研发效率的提......
  • VOFM修改组单开票时会计凭拆分规则
    货铺QQ群号:834508274之前有人问到例程的内容,我好久不做了,应该是17年还是18年做过。当时是因为组单开票的时候,发现本来应该开在一张凭证上的单子开出来两张凭证。......