Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
-
Updated
Oct 27, 2023 - Jupyter Notebook
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*
Audio Visual Scene-Aware Dialog (AVSD) Challenge at the 10th Dialog System Technology Challenge (DSTC)
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
Source code of the paper titled *Attentive Visual Semantic Specialized Network for Video Captioning*
Video content description technique for generating descriptions for unconstrained videos.
A Video.js 7 middleware that uses browser speech synthesis to speak descriptions contained in a description text track
A simple attention deep learning model to answer questions about a given video with the most relevant video intervals as answers.
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian (Bahasa Indonesia).
FrVD: French Video Description dataset
Tool employed to visualize synchronized FrVD metadata and videos simultaneously.
Add a description, image, and links to the video-description topic page so that developers can more easily learn about it.
To associate your repository with the video-description topic, visit your repo's landing page and select "manage topics."