#

image-captioning

Here are 340 public repositories matching this topic...

OpenGVLab / InternGPT

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Updated Aug 20, 2024
Python

sgrvinod / a-PyTorch-Tutorial-to-Image-Captioning

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

computer-vision pytorch image-captioning show-attend-and-tell attention-mechanism encoder-decoder pytorch-tutorial mscoco

Updated Jul 28, 2022
Python

OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

prompt chinese image-captioning pretrained-models visual-question-answering multimodal text-to-image-synthesis vision-language pretraining referring-expression-comprehension prompt-tuning

Updated Apr 24, 2024
Python

ttengwang / Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything

image-captioning controllable-image-captioning controllable-generation chatgpt segment-anything

Updated Aug 29, 2023
Python

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

vqa image-captioning language-model multi-task-learning vision-and-language multi-modal-learning vision-language-model

Updated Jan 17, 2024
Python

Oscar

microsoft / Oscar

Oscar and VinVL

vqa image-captioning oscar vision-and-language pre-training image-text-search vinvl

Updated Aug 28, 2023
Python

YehLi / xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

image-captioning video-captioning visual-question-answering vision-and-language cross-modal-retrieval pretraining tden

Updated Feb 27, 2023
Python

ruotianluo / self-critical.pytorch

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

image-captioning

Updated Oct 5, 2023
Python

jhc13 / taggui

Tag manager and captioner for image datasets

image-captioning image-tagging tag-manager pyside6 stable-diffusion llava cogvlm florence-2

Updated Aug 4, 2024
Python

SkalskiP / awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

nlp computer-vision image-captioning clip blip multimodal zero-shot-detection foundational-models llava segment-anything open-vocabulary-detection open-vocabulary-segmentation grounding-dino

Updated Feb 29, 2024
Python

kdexd / virtex

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations

model-zoo image-captioning pretrained-models coco-dataset cvpr2021

Updated Jan 1, 2024
Python

kuanghuei / SCAN

PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)

computer-vision deep-learning neural-network pytorch image-captioning cross-modal visual-semantic

Updated May 18, 2023
Python

aimagelab / meshed-memory-transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020

pytorch transformer image-captioning captioning-images visual-semantic caption-generation cvpr2020

Updated Dec 21, 2022
Python

subho406 / OmniNet

Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain

nlp machine-learning deep-learning neural-network artificial-intelligence transformer image-captioning video-recognition multimodal-learning multitask-learning

Updated Oct 31, 2020
Python

ufal / neuralmonkey

An open-source tool for sequence learning in NLP built on TensorFlow.

python nlp deep-learning tensorflow gpu machine-translation neural-networks image-captioning neural-machine-translation sequence-to-sequence mt nmt encoder-decoder

Updated Apr 28, 2020
Python

gokayfem / ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

Updated Sep 24, 2024
Python

husthuaan / AoANet

Code for paper "Attention on Attention for Image Captioning". ICCV 2019

image-captioning attention-mechanism iccv2019

Updated May 2, 2021
Python

scopeInfinity / Video2Description

Video to Text: Natural language description generator for some given video. [Video Captioning]

deep-neural-networks video-processing image-captioning cnn-keras audio-processing lstm-neural-networks video-captioning video-to-text

Updated May 3, 2022
Python

Image-to-Image-Search

sethuiyer / Image-to-Image-Search

A reverse image search engine powered by elastic search and tensorflow

search-engine elasticsearch deep-learning image-captioning

Updated Apr 3, 2021
Python

krasserm / fairseq-image-captioning

Transformer-based image captioning extension for pytorch/fairseq

pytorch transformer image-captioning fairseq

Updated Dec 18, 2020
Python

Improve this page

Add a description, image, and links to the image-captioning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the image-captioning topic, visit your repo's landing page and select "manage topics."