A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
-
Updated
May 25, 2024 - Python
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019
Fully-Convolutional Point Networks for Large-Scale Point Clouds
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
A Base Tensorflow Project for Medical Report Generation
A Tennis dataset and models for event detection & commentary generation
Python code for handling the Clotho dataset.
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
Sample app to display live captioning to a WebRTC video session with the Deepgram API.
CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
Medical image captioning using OpenAI's CLIP
Audio captioning baseline system for DCASE 2020 challenge.
Toolkit for supporting the EBU-TT Live specification
S2VT (seq2seq) video captioning with bahdanau & luong attention implementation in Tensorflow
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Python program to generate memes.
Indonesian Image Captioning using Attention-based Semantic Compositional Networks
Audio Captioning datasets for PyTorch.
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Add a description, image, and links to the captioning topic page so that developers can more easily learn about it.
To associate your repository with the captioning topic, visit your repo's landing page and select "manage topics."