Video Analysis (VIA) Framework

This repository contains the Video Analysis (VIA) Framework, a collection of Google Cloud services that you can use to transcribe video.

The repository also contains an extended version of the the Video Analysis (VIA) Framework which includes a collection of components including Elasticsearch and a web interface that you can use to search for words and phrases within your videos.

It can:

Process uploaded video files to Cloud Storage.
Enrich the processed video files with Google Cloud Video Intelligence API.
Write the enriched data to BigQuery.
With the Extended version add the enriched data to Elasticsearch index and provide a user interface to search for words and phrases

The life of a video file with the VIA:

A video file is uploaded to Cloud Storage
The Cloud Function is triggered on object.create
The Cloud Function sends a long running job request to the Video Intelligence API
The Video Intelligence API starts processing the video file
The Cloud Function then sends the job ID from Video Intelligence API with additional metadata to Cloud Pub/Sub
The Cloud Dataflow job enriches the data
Cloud Dataflow then writes the data to Google Cloud BigQuery

Extended Version: The life of a video file with the VIA:

Scroll to the bottom for instructions on how to install the extended version.

A video file is uploaded to Cloud Storage
The Cloud Function is triggered on object.create
The Cloud Function sends a long running job request to the Video Intelligence API
The Video Intelligence API starts processing the video file
The Cloud Function then sends the job ID from Video Intelligence API with additional metadata to Cloud Pub/Sub
The Cloud Dataflow job enriches the data
Cloud Dataflow then writes the data to Google Cloud BigQuery
Next step in the pipeline includes the data to be written to Elasticsearch index
The data is now ready to be searched with Elasticsearch

How to install the Video Analysis Framework

Install the Google Cloud SDK
Create a storage bucket for Dataflow Staging Files

gsutil mb gs://[BUCKET_NAME]/

Through the Google Cloud Console create a folder named tmp in the newly created bucket for the DataFlow staging files
Create a storage bucket for Uploaded Video Files

gsutil mb gs://[BUCKET_NAME]/

Create a BigQuery Dataset

bq mk [YOUR_BIG_QUERY_DATABASE_NAME]

Create Cloud Pub/Sub Topic

gcloud pubsub topics create [YOUR_TOPIC_NAME]

Enable Cloud Dataflow API

gcloud services enable dataflow

Enable Cloud Video Intelligence API

gcloud services enable videointelligence.googleapis.com

Deploy the Google Cloud Function

In the cloned repo, go to the via-longrun-job-func directory and deploy the following Cloud Function.

gcloud functions deploy viaLongRunJobFunc --region=us-central1 --stage-bucket=[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME] --runtime=nodejs8 --trigger-event=google.storage.object.finalize --trigger-resource=[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME]

Deploy the Cloud Dataflow Pipeline

python3 --version Python 3.7.8
In the cloned repo, go to via-longrun-job-dataflow directory and deploy the Cloud Dataflow Pipeline. Run the commands below to deploy the dataflow job.

# Apple/Linux
python3 -m venv env
source env/bin/activate
pip3 install apache-beam[gcp]
pip3 install dateparser

The Dataflow job will create the BigQuery Table you listed in the parameters.
Please wait as it might take a few minutes to complete.

python3 vialongrunjobdataflow.py --project=[YOUR_PROJECT_ID] --input_topic=projects/[YOUR_PROJECT_ID]/topics/[YOUR_TOPIC_NAME] --runner=DataflowRunner --temp_location=gs://[YOUR_DATAFLOW_STAGING_BUCKET]/tmp --output_bigquery=[DATASET NAME].[TABLE] --requirements_file="requirements.txt" --region=[GOOGLE_CLOUD_REGION]

How to install the Extended version of the Video Analysis Framework

The VIA Framework requires you have an working Elasticsearch install, for more information visit Managed Elasticsearch on Google Cloud

Install the Google Cloud SDK
Create a storage bucket for Dataflow Staging Files

gsutil mb gs://[BUCKET_NAME]/

Through the Google Cloud Console create a folder named tmp in the newly created bucket for the DataFlow staging files
Create a storage bucket for Uploaded Video Files

gsutil mb gs://[BUCKET_NAME]/

Create a BigQuery Dataset

bq mk [YOUR_BIG_QUERY_DATABASE_NAME]

Create Cloud Pub/Sub Topic

gcloud pubsub topics create [YOUR_TOPIC_NAME]

Enable Cloud Dataflow API

gcloud services enable dataflow

Enable Cloud Video Intelligence API

gcloud services enable videointelligence.googleapis.com

Deploy the Google Cloud Function

In the cloned repo, go to the via-longrun-job-func directory and deploy the following Cloud Function.

gcloud functions deploy viaLongRunJobFunc --region=us-central1 --stage-bucket=[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME] --runtime=nodejs8 --trigger-event=google.storage.object.finalize --trigger-resource=[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME]

Deploy the Cloud Dataflow Pipeline

python3 --version Python 3.7.8
In the cloned repo, go to via-longrun-job-dataflow-extended directory and deploy the Cloud Dataflow Pipeline.
You need to edit the pipeline to include your Elasticsearch settings on line 100
Run the commands below to deploy the dataflow job.

# Apple/Linux
python3 -m venv env
source env/bin/activate
pip3 install apache-beam[gcp]
pip3 install dateparser
pip3 install elasticsearch

The Dataflow job will create the BigQuery Table you listed in the parameters.
Please wait as it might take a few minutes to complete.

python3 viaextendedlongrunjobdataflow.py --project=[YOUR_PROJECT_ID] --input_topic=projects/[YOUR_PROJECT_ID]/topics/[YOUR_TOPIC_NAME] --runner=DataflowRunner --temp_location=gs://[YOUR_DATAFLOW_STAGING_BUCKET]/tmp --output_bigquery=[DATASET NAME].[TABLE] --requirements_file="requirements.txt" --region=[GOOGLE_CLOUD_REGION]

Deploy Search Interface

In the cloned repo, go to the via-web/src directory. Edit the Settings.js file to include your Elasticsearch parameters.
Run the commands below in the via-web directory to deploy in the search interface.

npm run build
gcloud app deploy

The Search Interface requires Google Cloud Identity-Aware Proxy (IAP)
Browse to the newly created App Engine service URL.

Notes

To search for phrases enter your text string in quotes as:

To search for multiple words enter your words separated by space as:

This is not an officially supported Google product

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
via-longrun-job-dataflow-extended		via-longrun-job-dataflow-extended
via-longrun-job-dataflow		via-longrun-job-dataflow
via-longrun-job-function		via-longrun-job-function
via-web		via-web
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Analysis (VIA) Framework

How to install the Video Analysis Framework

How to install the Extended version of the Video Analysis Framework

Notes

About

Releases

Packages

Languages

License

GoogleCloudPlatform/dataflow-video-analysis

Folders and files

Latest commit

History

Repository files navigation

Video Analysis (VIA) Framework

How to install the Video Analysis Framework

How to install the Extended version of the Video Analysis Framework

Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages