A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
-
Updated
Jul 31, 2024 - Python
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Pressure Testing Large Video-Language Models (LVLM): Doing multimodal retrieval from LVLM at any video lengths to measure accuracy
official repo of "VideoGUI: A Benchmark for GUI Automation from Instructional Videos"
a multi-modal video caption dataset with richer annotation
[NeurIPS2022] Egocentric Video-Language Pretraining
[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
[CVPR2022] Official Implementation of ReferFormer
A Video Chat Agent with Temporal Prior
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model
A repository of Video Language papers, code and datasets.
VLG: General Video Recognition with Web Textual Knowledge (https://arxiv.org/abs/2212.01638)
Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.
[ICCV 2023] The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"
The official GitHub page for the survey paper "Self-Supervised learning for Videos: A survey"
Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"
A Survey on video and language understanding.
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
Add a description, image, and links to the video-language topic page so that developers can more easily learn about it.
To associate your repository with the video-language topic, visit your repo's landing page and select "manage topics."