Skip to content

A Python-based PDF parser tool that can process both searchable and non-searchable PDF files. The parser extracts title, headings, subheadings, and content which are processed and incorporated into custom-designed HTML representation to preserve the structure and format. The extracted data is stored in a CSV file for easy retrieval & analysis.

Notifications You must be signed in to change notification settings

SurekhaSuresh/PDF-Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

PDF-Parser

This repository contains a Python-based PDF parser tool that can process both searchable and non-searchable PDF files. The parser extracts title, headings, subheadings, and content from PDF files and incorporates the processed data into custom-designed HTML representation to preserve the structure and formatting of the original document. Additionally, the extracted data is stored in a CSV file for easy retrieval and analysis.

About

A Python-based PDF parser tool that can process both searchable and non-searchable PDF files. The parser extracts title, headings, subheadings, and content which are processed and incorporated into custom-designed HTML representation to preserve the structure and format. The extracted data is stored in a CSV file for easy retrieval & analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages