OpenCV: Find columns in Arabic journals (Python)
I am new to opencv and new to python. I tried to piece together code I found online to solve my research problem. I have an Arabic diary from 1870 that has hundreds of pages, each page contains two columns and has a thick black border. I want to extract two columns as image files so that I can run ocr on them individually while ignoring the header and footer. Here is an example page:
Page 3
I have ten pages of raw prints as separate png files. I wrote the following script to handle each one. It works as expected in 2 of the 10 pages, but fails to generate the columns in the other 8 pages. I don't understand all the functions well enough to know where I could use these values, or if my entire approach is misguided - I think the best way to learn is to ask the community how you would solve this problem.
import cv2 def cutpage(fname, pnum): image = cv2.imread(fname) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) blur = cv2.GaussianBlur(gray, (7,7), 0) thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 13)) dilate = cv2.dilate(thresh, kernel, iterations=1) dilatename = "temp/dilate" + str(pnum) + ".png" cv2.imwrite(dilatename, dilate) cnts = cv2.findContours(dilate, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) cnts = cnts[0] if len(cnts) == 2 else cnts[1] cnts = sorted(cnts, key=lambda x: cv2.boundingRect(x)[0]) fullpage=1 column=1 for c in cnts: x, y, w, h = cv2.boundingRect(c) if h > 300 and w > 20: if (h/w)<2.5: print("Found full page: ", x, y, w, h) filename = "temp/p" + str(pnum) + "-full" + str(fullpage) + ".png" fullpage+=1 else: print("Found column: ", x, y, w, h) filename = "temp/p" + str(pnum) + "-col" + str(column) + ".png" column+=1 roi = image[y:y+h, x:x+w] cv2.imwrite(filename, roi) return (column-1) for nr in range(10): filename = "p"+str(nr)+".png" print("Checking page", nr) diditwork = cutpage(filename, nr) print("Found", diditwork, "columns")
Following the tutorial, I created a binary inversion of blur and dilation so that it could identify different rectangular areas by large white areas. I also saved a copy of each extended version so I could see what it looked like, here's the page above after processing:
Page 3 has been enlarged
The "for c in cnts" loop should find large rectangular areas in the image. If the aspect ratio is less than 2.5 I get a full page (without header and footer, which works fine), if the aspect ratio is greater than this I know it's a column and it saves this e.g. temp/ p2-col2.png
I get some nice full pages without headers and footers, that is, just large black borders, but not chopped into columns. In 2 pages out of 10 I got what I wanted, which is:
Success Column on Page 2
Since I sometimes get the desired results, there must be something working, but I don't know how to improve it further.
edit:
Here are more page examples:
p0
p1
p5
Correct answer
I tried something without any expansion because I wanted to see if I could just use the middle line as a "separator" . This is the code:
im = cv2.cvtcolor(cv2.imread("arabic.png"), cv2.color_bgr2rgb) # read im as rgb for better plots gray = cv2.cvtcolor(im, cv2.color_rgb2gray) # convert to gray _, threshold = cv2.threshold(gray, 250, 255, cv2.thresh_binary_inv) # inverse thresholding contours, _ = cv2.findcontours(threshold, cv2.retr_external, cv2.chain_approx_none) # find contours sortedcontours = sorted(contours, key = cv2.contourarea, reverse=true) # sort according to area, descending bigbox = sortedcontours[0] # get the contour of the big box middleline = sortedcontours[1] # get the contour of the vertical line xmiddleline, _, _, _ = cv2.boundingrect(middleline) # get x coordinate of middleline leftboxcontour = np.array([point for point in bigbox if point[0, 0] < xmiddleline]) # assign left of line as points from the big contour rightboxcontour = np.array([point for point in bigbox if point[0, 0] >= xmiddleline]) # assigh right of line as points from the big contour leftboxx, leftboxy, leftboxw, leftboxh = cv2.boundingrect(leftboxcontour) # get properties of box on left rightboxx, rightboxy, rightboxw, rightboxh = cv2.boundingrect(rightboxcontour) # get properties of box on right leftboxcrop = im[leftboxy:leftboxy + leftboxh, leftboxx:leftboxx + leftboxw] # crop left rightboxcrop = im[rightboxy:rightboxy + rightboxh, rightboxx:rightboxx + rightboxw] # crop right # maybe do you assertations about aspect ratio?? cv2.imwrite("right.png", rightboxcrop) # save image cv2.imwrite("left.png", leftboxcrop) # save image
I'm not using any assertions about aspect ratio, so maybe this is still something you need to do..
Basically, the most important lines in this method are generating left and right contours based on x-coordinates. This is the final result I get:
There are still some black parts on the edges, but that shouldn't be a problem for ocr.
FYI: I'm using the following packages in jupyter:
import cv2 import numpy as np %matplotlib notebook import matplotlib.pyplot as plt
v2.0: Implemented using only large box detection:
So I did some dilation and the big box was easily detectable. I use a horizontal kernel to ensure that the vertical lines of the large box are always thick enough to be detected. However, I cannot solve the problem with the middle line as it is very thin... Nonetheless, here is the code for the above method:
im = cv2.cvtcolor(cv2.imread("1.png"), cv2.color_bgr2rgb) # read im as rgb for better plots gray = cv2.cvtcolor(im, cv2.color_rgb2gray) # convert to gray gray[gray<255] = 0 # added some contrast to make it either completly black or white _, threshold = cv2.threshold(gray, 250, 255, cv2.thresh_binary_inv) # inverse thresholding thresholddilated = cv2.dilate(threshold, np.ones((1,10)), iterations = 1) # dilate horizontally contours, _ = cv2.findcontours(thresholddilated, cv2.retr_external, cv2.chain_approx_none) # find contours sortedcontours = sorted(contours, key = cv2.contourarea, reverse=true) # sort according to area, descending x, y, w, h = cv2.boundingrect(sortedcontours[0]) # get the bounding rect properties of the contour left = im[y:y+h, x:x+int(w/2)+10].copy() # generate left, i included 10 pix from the right just in case right = im[y:y+h, int(w/2)-10:w].copy() # and right, i included 10 pix from the left just in case fig, ax = plt.subplots(nrows = 2, ncols = 3) # plotting... ax[0,0].axis("off") ax[0,1].imshow(im) ax[0,1].axis("off") ax[0,2].axis("off") ax[1,0].imshow(left) ax[1,0].axis("off") ax[1,1].axis("off") ax[1,2].imshow(right) ax[1,2].axis("off")
These are the results, you can notice it's not perfect, but again, since your target is ocr, this shouldn't be a problem.
Please tell me if this works, if not I will rack my brain to find a better solution...
v3.0: A better way to get straighter images, which will improve the quality of ocr.
Inspired by my other answer here: answer. It makes sense to straighten the image so that the ocr has better results. Therefore, I used a four-point transform on the detected outer frame. This will straighten the image slightly and make the text more horizontal. This is the code:
im = cv2.cvtcolor(cv2.imread("2.png"), cv2.color_bgr2rgb) # read im as rgb for better plots gray = cv2.cvtcolor(im, cv2.color_rgb2gray) # convert to gray gray[gray<255] = 0 # added some contrast to make it either completly black or white _, threshold = cv2.threshold(gray, 250, 255, cv2.thresh_binary_inv) # inverse thresholding thresholddilated = cv2.dilate(threshold, np.ones((1,10)), iterations = 1) # dilate horizontally contours, _ = cv2.findcontours(thresholddilated, cv2.retr_external, cv2.chain_approx_none) # find contours largest_contour = max(contours, key = cv2.contourarea) # get largest contour hull = cv2.convexhull(largest_contour) # get the hull epsilon = 0.02 * cv2.arclength(largest_contour, true) # epsilon pts1 = np.float32(cv2.approxpolydp(hull, epsilon, true).reshape(-1, 2)) # get the points result = four_point_transform(im, pts1) # using imutils height, width = result.shape[:2] # get the dimensions of the transformed image left = result[:, 0:int(width/2)].copy() # from the beginning to half the width right = result[:, int(width/2): width].copy() # from half the width till the end fig, ax = plt.subplots(nrows = 2, ncols = 3) # plotting... ax[0,0].axis("off") ax[0,1].imshow(result) ax[0,1].axvline(width/2) ax[0,1].axis("off") ax[0,2].axis("off") ax[1,0].imshow(left) ax[1,0].axis("off") ax[1,1].axis("off") ax[1,2].imshow(right) ax[1,2].axis("off")
Has the following packages:
import cv2 import numpy as np %matplotlib notebook import matplotlib.pyplot as plt from imutils.perspective import four_point_transform
As you can see from the code, this is a better approach, you can force the image to be centered and horizontal thanks to the four point transform. Furthermore, there is no need to include some overlap since the images are well separated. Here is an example for your reference:
The above is the detailed content of OpenCV: Find columns in Arabic journals (Python). For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Since its inception in 2009, Bitcoin has become a leader in the cryptocurrency world and its price has experienced huge fluctuations. To provide a comprehensive historical overview, this article compiles Bitcoin price data from 2009 to 2025, covering major market events, changes in market sentiment, and important factors influencing price movements.

1. First, right-click on the blank space of the taskbar at the bottom of Windows 11 and select [Taskbar Settings]. 2. Find [taskbarcorneroverflow] on the right in the taskbar settings. 3. Then find [clock] or [Clock] above it and select to turn it on. Method 2: 1. Press the keyboard shortcut [win+r] to call up run, enter [regedit] and press Enter to confirm. 2. Open the registry editor, find [HKEY_CURRENT_USERControlPanel] in it, and delete it. 3. After deletion, restart the computer and you will be prompted for configuration. When you return to the system, the time will be displayed.

Answer: The following community forums and discussion groups are available for Java functional programming questions: StackOverflow: The world's largest programming Q&A website with a community of Java functional programming experts. JavaFunctionalProgramming: A community forum focused on Java functional programming, providing discussions on concepts, language features, and best practices. Redditr/functionaljava: A subreddit focused on functional programming in Java, focusing on tools, libraries, and technologies. Discord: JavaFunctional Programming: Discord service that provides real-time discussion, code sharing and collaboration

How do I use other people's Python code? Find code repositories: Find the code you need on platforms like PyPI and GitHub. Installation code: Use pip or clone the GitHub repository to install. Import modules: Use the import statement in your script to import installed modules. Working with code: Access functions and classes in modules. (Optional) Adapt the code: Modify the code as needed to fit your project.

What should I do if the time on my win11 computer is always wrong? We all set the time or calendar when using win11 system, but many users are asking that the computer time is always wrong, so what is going on? Users can directly click on the taskbar below, and then find taskbarcorneroverflow to set it up. Let this site introduce to users in detail how to adjust the time error on Win11 computers. How to adjust the computer time error in Windows 11. Method 1: 1. We first right-click on the blank space of the taskbar below and select Taskbar Settings. Method 2: 1. Press the keyboard shortcut win+r to call up run, enter regedit and press Enter to confirm.

Common exception types and their repair measures in Java function development During the development of Java functions, various exceptions may be encountered, which affect the correct execution of the function. The following are common exception types and their repair measures: 1. NullPointerException Description: Thrown when accessing an object that has not been initialized. Fix: Make sure you check the object for non-null before using it. Sample code: try{Stringname=null;System.out.println(name.length());}catch(NullPointerExceptione){

overflow is a property of CSS that is used to control the display mode of element content when it exceeds the container. Available values include: visible: the content is visible, the overflow container is hidden: the overflow content is cut scroll: the scroll bar is displayed to view the overflow content auto: the browser automatically determines Whether to display the scroll bar inherit: inherit the overflow attribute of the parent element

As a world-renowned short video platform, Douyin has a huge user base and content creators. However, as the platform rules are constantly updated and improved, some users may encounter account bans. This has raised public questions about the transparency and fairness of platform management. This article will discuss the issue of Douyin account bans and whether users have ways to appeal after their accounts are banned. There may be many reasons for being banned on the Douyin platform, including but not limited to illegal content, violation of platform regulations, infringement of other people's rights, etc. In order to maintain the order of the platform and the interests of users, Douyin has set up a series of rules and review mechanisms. When some users violate the rules, their accounts may be banned. However, some users may question or be dissatisfied with the reasons for the ban
