#

mllm

Here are 49 public repositories matching this topic...

TIGER-AI-Lab / Mantis

Official code for Paper "Mantis: Multi-Image Instruction Tuning"

language video vision mantis vlm multimodal lmm fuyu mllm llava-llama3 multi-image-understanding

Updated Jul 12, 2024
Python

Hon-Wong / Elysium

[ECCV2024] Elysium: Exploring Object-level Perception in Videos via MLLM

tracking benchmark dataset gpt vlm sot mllm eccv2024

Updated Jul 12, 2024

dvlab-research / LLMGA

This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024

image-editing image-generation multi-modal aigc llm large-language-model mllm image-design-assistant

Updated Jul 12, 2024
Python

baaivision / DenseFusion

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

vlm image-descriptions visual-perception mllm multimodal-large-language-models vision-language-models

Updated Jul 12, 2024
Python

InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated Jul 11, 2024
Python

thu-ml / MMTrustEval

A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust)

benchmark privacy toolbox safety multi-modal fairness robustness claude gpt-4 trustworthy-ai truthfulness mllm

Updated Jul 11, 2024
Python

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

Updated Jul 11, 2024
Python

wangclnlp / Vision-LLM-Alignment

This repo contains the codes for supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) designed for vision LLMs.

vision alignment multi-model reward ppo sft dpo llm rlhf mllm llava

Updated Jul 10, 2024
Python

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Updated Jul 9, 2024
Python

CircleRadon / TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

connector lmm mllm token-reduction visual-projector tokenpacker

Updated Jul 9, 2024
Python

sterzhang / image-textualization

Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions

dense-captioning text-image mllm

Updated Jul 8, 2024
Python

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

computer-vision chatbot representation-learning clip dino large-language-models llms instruction-tuning mllm multimodal-large-language-models

Updated Jul 6, 2024
Python

turningpoint-ai / MOSSBench

This is the official implementation (code, data) of the paper "MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?""

vlm mllm oversensitivity safety-alignment turningpoint-ai

Updated Jul 5, 2024
JavaScript

xirui-li / MOSSBench

MOSSBench: A webpage for an oversensitivity benchmark

attack alignment vlm mllm oversensitivity

Updated Jul 5, 2024
JavaScript

baaivision / EVE

EVE: Encoder-Free Vision-Language Models from BAAI

clip vlm instruction-following large-language-models llm mllm multimodal-large-language-models vision-language-models encoder-free-vlm

Updated Jul 4, 2024
Python

Coobiw / MPP-LLaVA

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.

fine-tuning pipeline-parallelism pretraining model-parallel deepspeed mllm multimodal-large-language-models qwen video-large-language-models video-language-model

Updated Jul 4, 2024
Jupyter Notebook

Ahnsun / merlin

[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds

Updated Jul 4, 2024
Python

gokayfem / ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

Updated Jul 4, 2024
Python

alexander-moore / vlm

Composition of Multimodal Language Models From Scratch

machine-learning ai vlm llm mllm vision-language-model multimodal-large-language-models mmllm

Updated Jul 2, 2024
Jupyter Notebook

X-PLUG / MobileAgent

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

android agent harmony ios app gui automation mobile copilot multimodal mobile-agents mllm multimodal-large-language-models gpt4v multimodal-agent

Updated Jul 1, 2024
Python

Improve this page

Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."