This repository contains a Python-based PDF parser tool that can process both searchable and non-searchable PDF files. The parser extracts title, headings, subheadings, and content from PDF files and incorporates the processed data into custom-designed HTML representation to preserve the structure and formatting of the original document. Additionally, the extracted data is stored in a CSV file for easy retrieval and analysis.
-
Notifications
You must be signed in to change notification settings - Fork 0
A Python-based PDF parser tool that can process both searchable and non-searchable PDF files. The parser extracts title, headings, subheadings, and content which are processed and incorporated into custom-designed HTML representation to preserve the structure and format. The extracted data is stored in a CSV file for easy retrieval & analysis.
SurekhaSuresh/PDF-Parser
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
A Python-based PDF parser tool that can process both searchable and non-searchable PDF files. The parser extracts title, headings, subheadings, and content which are processed and incorporated into custom-designed HTML representation to preserve the structure and format. The extracted data is stored in a CSV file for easy retrieval & analysis.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published