Skip to content

Receives a video (mp4) and uses Tika service to get frames OCR and an asr/transcriber to audio

Notifications You must be signed in to change notification settings

rmazzine/VideoTranscriptionOCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video Transcription and OCR

This service tries to retrieve all text content from a video. The image part of the video is retrieved by splitting it in several image frames and then making an OCR (using Tika). The audio transcription part is obtained by using another service (like AudioTranscriberAPI) that receives an audio and returns the text transcription.

How to run

You will need to set the environment variables described below. After that, follow the steps:

Development env

Install packages

pip install -r requirements

Go to Flask API folder

cd ./flaskapp

Start Flask server (e.g. http://localhost:3860)

flask run -h localhost -p 3680

Create docker image

To create a docker image, build it with:

docker build -t videotranscriptionocr .

Then run it port-forwarding the required port

docker run -p 3680:3680 -e TIKA_SERVER="TIKA_SERVER_HOST" \
 -e TRANSCRIBE_SERVER="TRANSCRIPTION_SERVER_HOST" \
--network="host" videotranscriptionocr

How to use

For testing, it's recommended to use an API tool like Postman.

On Headers: Include the key Content-Type with value application/json as we will send the base64 audio data using a JSON format.

In Body: Create a JSON where the data key has the base64 mp4 data, for example:

{
  "data": "BASE64DATA"
}

Finally on URL field, select the POST method and send the JSON to the following address: http://localhost:3860/extract_video

If successful, it will return a JSON with code 200 and the following data:

{
  "code": 200,
  "data": {
    "video_ocr": "OCR_TEXT_EXTRACTED_FROM_VIDEO",
    "audio_transcription": "TRANSCRIPTED_TEXT_FROM_VIDEO_AUDIO",
    "audiob64": "BASE64_AUDIO_DATA"
  } 
}

Environment Variables

About

Receives a video (mp4) and uses Tika service to get frames OCR and an asr/transcriber to audio

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published