PDF-Parser

This repository contains a Python-based PDF parser tool that can process both searchable and non-searchable PDF files. The parser extracts title, headings, subheadings, and content from PDF files and incorporates the processed data into custom-designed HTML representation to preserve the structure and formatting of the original document. Additionally, the extracted data is stored in a CSV file for easy retrieval and analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
pdfParser.py		pdfParser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-Parser

About

Releases

Packages

Languages

SurekhaSuresh/PDF-Parser

Folders and files

Latest commit

History

Repository files navigation

PDF-Parser

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages