mllm

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

Updated Jul 31, 2024
Python

X-PLUG / Youku-mPLUG

Star

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

benchmark video dataset chinese youku multimodal video-retrieval video-question-answering multimodal-pretraining mllm multimodal-large-language-models

Updated Jan 8, 2024
Python

X-PLUG / mPLUG-2

Star

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

video vqa image-retrieval multimodal video-retrieval video-question-answering foundation-models multimodal-pretraining mllm mplug

Updated Jul 21, 2023
Python

baaivision / EVE

Star

EVE: Encoder-Free Vision-Language Models

clip vlm instruction-following large-language-models llm mllm multimodal-large-language-models vision-language-models encoder-free-vlm

Updated Jul 20, 2024
Python

TIGER-AI-Lab / Mantis

Star

Official code for Paper "Mantis: Multi-Image Instruction Tuning"

language video vision mantis vlm multimodal lmm fuyu mllm llava-llama3 multi-image-understanding

Updated Aug 7, 2024
Python

sterzhang / image-textualization

Star

Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions

dense-captioning text-image mllm

Updated Jul 30, 2024
Python

FoundationVision / GenerateU

Star

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

open-world object-detection multimodality open-vocabulary mllm open-vocabulary-detection

Updated Mar 25, 2024
Python

CircleRadon / TokenPacker

Star

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

connector lmm mllm token-reduction visual-projector tokenpacker

Updated Jul 26, 2024
Python

360CVGroup / SEEChat

Star

Multimodal chatbot with computer vision capabilities integrated

chatbot gpt4 mllm

Updated May 17, 2024
Python

baaivision / DenseFusion

Star

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

vlm image-descriptions visual-perception mllm multimodal-large-language-models vision-language-models

Updated Jul 31, 2024
Python

Improve this page

Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mllm

Here are 36 public repositories matching this topic...

microsoft / unilm

X-PLUG / MobileAgent

InternLM / InternLM-XComposer

cambrian-mllm / cambrian

X-PLUG / mPLUG-DocOwl

BAAI-DCAI / Bunny

CircleRadon / Osprey

BradyFU / Woodpecker

FoundationVision / Groma

dvlab-research / LLMGA

gokayfem / ComfyUI_VLM_nodes

X-PLUG / Youku-mPLUG

X-PLUG / mPLUG-2

baaivision / EVE

TIGER-AI-Lab / Mantis

sterzhang / image-textualization

FoundationVision / GenerateU

CircleRadon / TokenPacker

360CVGroup / SEEChat

baaivision / DenseFusion

Improve this page

Add this topic to your repo