pdf-extract-api-digitalocean

This project implements a simulated Optical Character Recognition (OCR) service that extracts text from PDF files uploaded by users. Built with Node.js and utilizing several libraries such as Express, Multer, and pdf-parse, this application is designed to be easy to set up and integrate into other systems needing PDF text extraction capabilities.

Features

PDF Text Extraction: Allows users to upload PDF files and extracts readable text from them.
File Upload Management: Utilizes Multer for efficient handling of file uploads with customizable storage options.
Error Handling: Robust error management to ensure stability and provide meaningful error messages to the client.

Dependencies

Node.js: The script runs in a Node.js environment.
express: Web framework for Node.js.
multer: Middleware for handling multipart/form-data, used for uploading files.
pdf-parse: Library to parse and extract text from PDF files.
fs.promises: Part of Node.js File System module to handle file operations using promises.
path: Utilities for handling and transforming file paths.

Installing Node.js

Before installing, ensure you have Node.js and npm (Node Package Manager) installed on your system. You can download and install Node.js from Node.js official website.

Installing pdf-extract-api-digitalocean

To install and use pdf-extract-api-digitalocean, follow these steps:

Clone the Repository: Begin by cloning the repository containing the pdf-extract-api-digitalocean to your local machine.

git clone https://github.com/samestrin/pdf-extract-api-digitalocean/

Set PORT environment variable to define the port on which the server will listen. Default is 3000

Navigate to your project's root directory and run:

npm start

Endpoints

Extract

Endpoint: /extract Method: POST

Extract text from a PDF file.

Parameters

file: PDF file

Example Usage

Use a tool like Postman or curl to make a request:

curl -F "file=@path_to_pdf_file.pdf" http://localhost:[PORT]/extract

The server will process the uploaded file and return the extracted text in JSON format.

Error Handling

The API handles errors gracefully and returns appropriate error responses.

400 Bad Request: Invalid request parameters.
500 Internal Server Error: Unexpected server error.

Contribute

Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes or improvements.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eng.traineddata		eng.traineddata
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf-extract-api-digitalocean

Features

Dependencies

Installing Node.js

Installing pdf-extract-api-digitalocean

Endpoints

Extract

Parameters

Example Usage

Error Handling

Contribute

License

Share

About

Releases

Packages

Languages

License

samestrin/pdf-extract-api-digitalocean

Folders and files

Latest commit

History

Repository files navigation

pdf-extract-api-digitalocean

Features

Dependencies

Installing Node.js

Installing pdf-extract-api-digitalocean

Endpoints

Extract

Parameters

Example Usage

Error Handling

Contribute

License

Share

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages