candev_statcan_egls

The problem statement is to extract tables from a bunch of scanned pdfs. We have addressed the problem in two ways. First we have converted the pdfs into images then extracted the texts from those images. Later we have counted the horizontal lines. This is because the images with tables have higher number of horizontal lines. Then we have extracted text from an image if it crosses a certain value indicating that the image contains a table. The JSON files for this approach are stored in line_count folder. Our next approach is to count the number of numerical values in an image. The motivation behind this approach is the presence of higher number of numerical values in an image with tables. The JSON files for numeric value count approach are stored in num_count folder.

We have also implemeted an user interface from which a user can upload an image and the table will be extracted as a JSON file and will be showed in the format of a table into an HTML page. We also did some postprocessing of the texts to remove garbage words. To run the program you need to first install the required python packages and type in flask run from the particular directory where app.py is.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Templates		Templates
__pycache__		__pycache__
line_count		line_count
num_count		num_count
uploads		uploads
user_interface_screenshots		user_interface_screenshots
EGLS_approach.ipynb		EGLS_approach.ipynb
README.md		README.md
app.py		app.py
num_count accuracy.xlsx		num_count accuracy.xlsx
requirements.txt		requirements.txt
table3.html		table3.html
tableGS bakertilly.html		tableGS bakertilly.html
tablebakertilly.html		tablebakertilly.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

candev_statcan_egls

About

Releases

Packages

Languages

ShawkhIbneRashid/candev_egls

Folders and files

Latest commit

History

Repository files navigation

candev_statcan_egls

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages