Skip to content

up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources

Notifications You must be signed in to change notification settings

NishilBalar/Awesome-LVLM-Hallucination

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 

Repository files navigation

Awesome-LVLM-Hallucination Awesome

Even though the world has seen the imersive capabilities of large vision language models, particularly in zero-shot inference, such models struggle with hallucinations, which can be referred to as the generation of text with information that is not present in the visual input. Lots of research work is going on to tackle this problem, such as hallucinated objects, inaccurate attributes and relationships, unfaithful descriptions, and so on. Possible reasons behind this could be language prior, insufficient visual context, biases and misinformation in the training dataset, and lot more.

This repository will provide an organized list of state-of-the-art research papers, relevant code, and a brief description related to hallucinations of the Large-Vision-Language Model (LVLM), also known as the Multimodal Large Language Model (MLLM).

The main intention of this project is to provide a platform where all the research work in the field of hallucination in LVLMs is accessed in a constructive way. If you have any suggestions for intersecting work within this field, kindly contribute them by raising an open issue. I am looking forward to fruitful discussion and learning!



Evaluation-Benchmark

  1. CHAIR: Object Hallucination in Image Captioning (EMNLP 2018) Star
    • Introduce problem of object hallucination on MSCOCO image captioning task
    • CHAIR metrics [built upon unique 80 MSCOCO dataset objects]
  2. POPE: Evaluating Object Hallucination in Large Vision-Language Models (EMNLP 2023) Star
    • Object existence hallucination [Yes/No]
    • Random, Popular and Adversial settings on MSCOCO dataset
  3. MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models (23 June, 2023) Star
    • MME benchmark covers the evaluation of MLLM's perception and cognition abilities
    • Perception (Coase-Grained): 4; Perception (Fine-Grained): 5; Perception (OCR): 1; Cognition (Reasoning): 4; [Total 14 subtasks]
    • Answer in Yes/No format for easy evaluation & 30 advanced MLLMs are benchmarked
  4. M-HalDetect: Detecting and Preventing Hallucinations in Large Vision Language Models (AAAI 2024) Star
    • Hallucination detection dataset with fine-grained annotations [accurate, inaccurate and analysis]
    • Fine-grained Direct Preference Optimization (FDPO) technique and reward model dataset
    • High correlation of reward model score with human evaluation
  5. HaELM: Evaluation and Analysis of Hallucination in Large Vision-Language Models (29 August, 2023) Star
    • Discussed LVLMs tendency to response as 'Yes' to judgement type queries
    • Use of ChatGPT to collect hallucination data via iterative prompt modification
    • Open-source LLM trained over this dataset for evaluation of LVLM's response
    • Evaluation results on various LVLMs, Generation length and Top-K of sampling
  6. CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning (NeurIPS 2023 Workshop) Static Badge
    • Automatic construction of question-answer pair with based on dataset with caption annotation using ChatGPT [Yes/No QA pair] and automatic pipeline for evaluation
    • Constractive instruction tuning (CIT) with Factual and Constractive QA pairs with Chain-of-Thought (CoT) justification
  7. MMHAL-BENCH: Aligning Large Multimodal Models with Factually Augmented RLHF (25 September, 2023) Star
    • Introduced novel algorithm called Factually Augmented RLHF (Fact-RLHF) to alleviate the reward hacking phenomenon in RLHF
    • Developed evaluation benchmark MMHAL-BENCH with a special focus on penalizing hallucinations
    • Trained a LLM with RLHF (Llava-RLHF) which shows improved multimodal alignment
  8. LRV (GAVIE): Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning (29 September, 2023) Star
    • LRV-Instruction - positive and negative robust instruction tuning dataset with 400k visual instructions (16 tasks)
    • Negative instruction semantics: (a) Nonexistent Object Manipulation (b) Existent Object Manipulation (c) Knowledge Manipulation
    • GPT4-Assisted Visual Instruction Evaluation (GAVIE)
  9. NOPE: Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models (09 October, 2023) Static Badge
    • VQA diagnostic benchmark to measure object hallucination with use of 'Negative Pompt' based questions
    • LLM based generation of 29.5k synthetic negative pronoum (none, no one, nobody. nowhere, neither) dataset
    • Finding: tendency of VLMs to hallucinate more on data with higher lexical diversity, more scene relavent objects (co-occurance) and large answer copes.
  10. HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models (CVPR 2024) Star
    • Language Hallucination + Visual Illusion: 1129 VQA paired with total 346 images
    • It includes topics such as food, math, geometry, statistics, geography, sports, cartoon, famous illusions, movie, meme, etc. and formats such as including logo, poster, figure, charts, table, map, consecutive images, etc.
  11. FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models (02 November, 2023) Star
    • Reference-free and fine-grained evaluation metric
      1. Recognizer : LLM is used for descriptive content identification of LVLM's prediction
      1. Decomposer : LLM is used to generate atomic facts based on recognizer's output
      1. Verifier : Visual Entailment Model (e.g. OFA) is used to verify atomic facts with input image
  12. Bingo: Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges (07 November, 2023) Star
    • Total 308 Images and 370 QA Pairs
    • Bias category: Region, OCR and Factual
    • Interferance catogary: Image-to-Image and Text-to-Image
  13. AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation (13 November, 2023) Star
    • LLM free evaluation of hallucination using AMBER benchmark
    • Evaluation of hallucination for generative and discriminative task using AMBERSCORE metric (covers existence, attributes and relation types of hallucination)
    • Includes hallucinatory target objects (more likely to be imagined by LVLMs)
  14. RAH-Bench: Mitigating Hallucination in Visual Language Models with Visual Supervision (27 Novemebr, 2023) Static Badge
    • Introduce fine-grained vision instruction dataset named RAI-30K (built upon panoptic scene graph dataset (PSG))
    • RAH-BENCH vision hallucination evaluation benchmark (3 types: Categorial, Relation and Attribute Hallucination)
    • False Positive Rates as evaluation metric
  15. Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models (03 Decemeber, 2023) Star
    • Proposed a novel test-bed to evaluate IT-LVLMs (Instruction Tuning Large Vision and Language models) on core computer vision tasks
    • Observed poor performance of IT-LVLMs with multiple failure cases in visual grounding
    • Identify problems with IT-LVLMSs like generation of hallucinatory events and sensitivity to the input query
  16. CCEval: HallE-Switch: Controlling Object Hallucination in Large Vision Language Models (03 Decemebr, 2023) Star
    • Suggest an approach to control object existence hallucination in detailed captions of LVLM
    • Introduced CCEval which is a GPT-4 assisted evaluation method for detailed captioning (Metrics: CHAIR(i&s), Coverage, Average Length, Average Objects)
    • Detailed investigation on LVLM's component that might imfluence hallucination such as alignment of language decoder, volume of instruction data, resolution of input image and so on
    • Introduced a controlling parameters over LLMs (HallE-Control) to condition the inference of objects
  17. FGHE: Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites (04 December, 2023) Star
    • Dealing with fine-grained object hallucination with ReCaption framework
    • Two stage frame work : 1) Caption generation with help of ChatGPT 2) Finetuning LVLMs on generated captions
    • Inroduced Fine-Grained Object Hallucination Evaluation (FGHE) which similar to POPE. (manually annotted 50 images with 200 binary questions with type multi-object, attributes and behaviour)
  18. OpenCHAIR: Mitigating Open-Vocabulary Caption Hallucinations (06 Decemeber, 2023) Star
    • soon
  19. CorrelationQA: The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs (06 February, 2024) Star
    • soon
  20. ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling (09 February, 2024) Static Badge
    • soon
  21. VQAv2-IDK: Visually Dehallucinative Instruction Generation: Know What You Don’t Know (15 February, 2024) Star
    • soon
  22. MHaluBench: Unified Hallucination Detection for Multimodal Large Language Models (20 February, 2024) Star
    • soon
  23. MAD-Bench: How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts (20 February, 2024) Static Badge
    • soon
  24. VHTest: Visual Hallucinations of Multi-modal Large Language Models (22 February, 2024) Star
    • soon
  25. Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models (24 February, 2024) Static Badge
    • soon
  26. Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective (03 March, 2024) Static Badge
    • soon
  27. ** EvalDial**: Mitigating Dialogue Hallucination for Large Multi-modal Models via Adversarial Instruction Tuning (15 March, 2024) Static Badge
    • soon
  28. IVL-Hallu: PhD: A Prompted Visual Hallucination Evaluation Dataset (17 March, 2024) Star
    • soon
  29. Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models (29 March, 2024) Star
    • soon
  30. ALOHa: A New Measure for Hallucination in Captioning Models (3 April, 2024) Static Badge
    • soon
  31. VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models (22 April, 2024) Star
    • soon
  32. THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models (08 May, 2024) Static Badge
    • soon
  33. MRHal-Bench: Automated Multi-level Preference for MLLMs (18 May, 2024) Static Badge
    • soon
  34. VLind-Bench: Measuring Language Priors in Large Vision-Language Models (13 June, 2024) Static Badge
    • soon
  35. MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era (13 June, 2024) Star
    • soon
  36. Med-HallMark: Detecting and Evaluating Medical Hallucinations in Large Vision Language Models (14 June, 2024) Static Badge
    • Medical field hallucination benchmark
    • MediHall Score - evaluation metric
  37. AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models (16 June, 2024) Static Badge
    • soon
  38. MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models (17 June, 2024) Star
    • soon
  39. CHAIR-MEN: Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? (20 June, 2024) Static Badge
    • soon
  40. R-BENCH: Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models (24 June, 2024) (ICML2024) Star
    • Introduce an evaluation benchmark to tackle relation type of hallucination
    • soon
  41. HQH: Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models (24 June, 2024) Star
    • Propose a framework called Hallucination benchmark Quality Measurement (HQM) to assess the quality of existing hallucination benchmarks
    • soon
  42. VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models (24 June, 2024) Star
    • soon
  43. MMHalSnowball: Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models (30 June, 2024) Star
    • soon
  44. MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context (03 July, 2024) Star
    • soon
  45. ROPE: Multi-Object Hallucination in Vision-Language Models (08 July, 2024) Star
    • Deals with multi-object hallucinations and their cause
    • Introduce Recognition-based Ob�ject Probing Evaluation (ROPE) for assessing multi-object hallucination
    • In-depth analysis of hallucinatory behaviors
  46. BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models (18 July, 2024) (ECCV 2024) Star
    • Proposed a hallucination evaluation benchmark called BEfore-After (BEAF)
    • New metrics introduced: True Understanding (TU), IGnorance (IG), StuBbornness (SB), and InDecision (ID)
  47. HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning (22 July, 2024) (ECCV 2024) Star
    • Introduced a novel VQA dataset for VLM evaluation
    • soon
  48. MMINSTRUCT: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity (22 July, 2024) Star
    • Introduced high-quality and diverse visual instruction tuning dataset
    • Claims SOTA performance of MMINSTRUCT finetuned LLava-1.5 on 10 out of 12 famous benchmarks
  49. Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs (02 August, 2024) Star
    • Constructed hallucination evaluation benchmark with perturbed inputs with 7 different purturbed scenarios
    • 12 SOTA MLLMs are benchmarked
  50. Up to Date (01st August) and SOTA research work loading...

Note: 'soon' will be replaced with brief description!

Detection

  1. FPDO - Reward Model: Detecting and Preventing Hallucinations in Large Vision Language Models (AAAI 2024) Star
    • M-HalDetect - Hallucination detection dataset with fine-grained annotations [accurate, inaccurate and analysis]
    • Fine-grained Direct Preference Optimization (FDPO) technique and reward model trained on introduced dataset
    • High correlation of reward model score with human evaluation
  2. HaELM: Evaluation and Analysis of Hallucination in Large Vision-Language Models (29 August, 2023) Star
    • Discussed LVLMs tendency to response as 'Yes' to judgement type queries
    • Use of ChatGPT to collect hallucination data via iterative prompt modification
    • Open-source LLM trained over this dataset for evaluation of LVLM's response
    • Evaluation results on various LVLMs, Generation length and Top-K of sampling
  3. HallE-Switch: Controlling Object Hallucination in Large Vision Language Models (3 October, 2023) Star
    • Suggest an approach to control object existence hallucination in detailed captions of LVLM
    • Introduced CCEval which is a GPT-4 assisted evaluation method for detailed captioning (Metrics: CHAIR(i&s), Coverage, Average Length, Average Objects)
    • Detailed investigation on LVLM's component that might imfluence hallucination such as alignment of language decoder, volume of instruction data, resolution of input image and so on
    • Introduced a controlling parameters over LLMs (HallE-Control) to condition the inference of objects
  4. HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (22 November, 2023) Star
    • Investigates hallucination toxicity in already existing visual instruction dataset
    • Proposed HalluciDoctor method for automatic elimination of such toxicity
    • Generation of more counterfactual instruction data with help of HalluciDoctor to improve LVLMs' resistance to hallucination
  5. LogicCheckGPT: Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models (18 february, 2024) Star
    • Postprocessing output description of LVLMs
    • 5 steps logical loop procedure such as
      • Object extraction, Object-to-Attribute inquiring, Attribute-to-Object inquiring, Logic closed llop check and Hallucination detection and mitigation
    • Experimental analysis on POPE and MME benchmark
  6. UNIHD: Unified Hallucination Detection for Multimodal Large Language Models (20 February, 2024) Star
    • Introduce a meta evaluation benchmark called MHALUBENCH
    • Introduce a framework named UNIHD which detect modality-conflicting hallucinations at various levels such as object, attribute, and scene-text, as well as fact-conflicting hallucinations
  7. Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback (22 April, 2024) Static Badge
    • Use of GPT-4/GPT-4v to generate fine-grained feedback for hallucination detection and detection (by supervised finetuning (SFT) of LVLM)
    • Propose automatic pipeline for preference dataset construction
    • Hallucination Severity Aware Direct Prefential Optimization (HSA-DPO) is introduced for mitigation of LVLM's hallucination
  8. MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification (29 May, 2024) Static Badge
    • Really cool approach
    • Lightweight method for hallucination detection
  9. Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions (11 June, 2024) Star
    • soon
  10. MediHallDetector: Detecting and Evaluating Medical Hallucinations in Large Vision Language Models (14 June, 2024) Static Badge
    • Medical field hallucination detection
    • soon
  11. Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification (02 July, 2024)Static Badge
    • soon
  12. Up to Date (01st August) and SOTA research work loading...

Note: 'soon' will be replaced with brief description!

Mitigation

  1. ObjMLM: Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training (10 February 2023) Star
    • Deals with object hallucination problem of VLMs
    • Discuss the influence of various Vision Language Pretraining (VLP) objective (ITM, ITC and ICLM) and Image encoding methods (region-based, grid-based, and patch-based) on object hallucination
    • Introduce novel VLP objective ObjMLM to mitigate object hallucination
  2. MMCoT: Multimodal Chain-of-Thought Reasoning in Language Models (17 February 2023) Star
    • Two stage framework by finetuning language models to perform Multimodal chain-of-thoughts (CoT) which incorporates language (text) and vision (images) modalities
    • Claims state-of-the-art performance of model under 1 billion parameters on ScienceQA benchmark
    • Multimodal-CoT has the merits of mitigating hallucination and enhancing convergence speed
  3. LRV-GAVIE: Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning (26 June, 2023) Star
    • LRV-Instruction - positive and negative robust instruction tuning dataset with 400k visual instructions (16 tasks)
    • Negative instruction semantics: (a) Nonexistent Object Manipulation (b) Existent Object Manipulation (c) Knowledge Manipulation
    • GPT4-Assisted Visual Instruction Evaluation (GAVIE)
  4. LLaVA-RLHF: Aligning Large Multimodal Models with Factually Augmented RLHF (25 September, 2023) Star
    • Introduced novel algorithm called Factually Augmented RLHF (Fact-RLHF) to alleviate the reward hacking phenomenon in RLHF
    • Developed evaluation benchmark MMHAL-BENCH with a special focus on penalizing hallucinations
    • Trained a LLM with RLHF (Llava-RLHF) which shows improved multimodal alignment
  5. LURE: Analyzing and Mitigating Object Hallucination in Large Vision-Language Models (01 October, 2023) Star
    • Introduced LURE framework which is lightweight and compatible post-hoc approach for rectifying object hallucination in LVLMs
    • Statstical analysis of Co-occurence of objects, object uncertainity and object position in generated description which might correlate with object hallucination
    • Uncertain objects are put as placeholder with tokens while training LURE and while infernece (for revision)
    • Really popular method
  6. HallE-Switch: Controlling Object Hallucination in Large Vision Language Models (3 October, 2023) Star
    • Suggest an approach to control object existence hallucination in detailed captions of LVLM
    • Introduced CCEval which is a GPT-4 assisted evaluation method for detailed captioning (Metrics: CHAIR(i&s), Coverage, Average Length, Average Objects)
    • Detailed investigation on LVLM's component that might imfluence hallucination such as alignment of language decoder, volume of instruction data, resolution of input image and so on
    • Introduced a controlling parameters over LLMs (HallE-Control) to condition the inference of objects
  7. Woodpecker: Hallucination Correction for Multimodal Large Language Models (24 October, 2023) Star
    • Really popular method
    • Training free, post-hoc method to mitigate hallucination (but computationally expensive!!)
    • 5 steps framework:
      1. Key concept extraction from LVLM's output
      2. Formulation of questions based on key concepts
      3. Visual Knowledge validation (use of open-source object detector + pretrained VQA model)
      4. Visual claim generation (use of fix sentence templates + QA to claim model)
      5. Hallucination Correction (use LLM to correct LVLM's response)
  8. VOLCANO: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision (14 November, 2023) Star
    • soon
  9. HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (22 November, 2023) Star
    • Investigates hallucination toxicity in already existing visual instruction dataset
    • Proposed HalluciDoctor method for automatic elimination of such toxicity
    • Generation of more counterfactual instruction data with help of HalluciDoctor to improve LVLMs' resistance to hallucination
  10. RAH-Bench: Mitigating Hallucination in Visual Language Models with Visual Supervision (27 Novemebr, 2023) Static Badge
    • soon
  11. HA-DPO: Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization (28 November, 2023) Star
    • soon
  12. VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding (28 November, 2023) Star
    • Decoding strategy
    • soon
  13. OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation (CVPR 2024) Star
    • soon
  14. FGHE: Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites (04 December, 2023) Star
    • soon
  15. RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback (01 December, 2023) Star
    • fine-grained refined DPO!
    • soon
  16. MOCHa: Mitigating Open-Vocabulary Caption Hallucinations (06 December 2023) Star
    • soon
  17. HACL: Hallucination Augmented Contrastive Learning for Multimodal Large Language Model (12 December 2023) Star
    • soon
  18. SILKIE: Preference Distillation for Large Visual Language Models (17 December, 2023) Star
    • propose VLFeedback dataset for DPO
    • soon
  19. KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning (23 January, 2024) Static Badge
    • soon
  20. Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study (31 January, 2024) Star
    • soon
  21. ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling (09 February, 2024) Static Badge
    • soon
  22. SKIP \N: A Simple Method to Reduce Hallucination in Large Vision-Language Models (12 February, 2024) Star
    • soon
  23. MARINE: Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance (13 February, 2024) Static Badge
    • soon
  24. IDK-Instructions: Visually Dehallucinative Instruction Generation: Know What You Don’t Know (15 February, 2024) Star
    • soon
  25. EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models (15 February, 2024) Static Badge
    • soon
  26. LogicCheckGPT: Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models (18 february, 2024) Star
    • soon
  27. POVID: Aligning Modalities in Vision Large Language Models via Preference Fine-tuning (18 february, 2024) Star
    • soon
  28. Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (22 February, 2024) Star
    • decoding strategy
    • soon
  29. CGD: Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding (23 February, 2024) Static Badge
    • decoding strategy
    • soon
  30. IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding (28 February, 2024) Static Badge
    • decoding strategy
    • soon
  31. HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding (01 March, 2024) Star
    • Decodig strategy to tackle object hallucination
    • soon
  32. Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective (03 March, 2024) Static Badge
    • number hallucination
    • soon
  33. AIT: Mitigating Dialogue Hallucination for Large Multi-modal Models via Adversarial Instruction Tuning (15 March, 2024) Static Badge
    • soon
  34. DVP: What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models (20 March, 2024) Static Badge
    • soon
  35. M3ID: Multi-Modal Hallucination Control by Visual Information Grounding (20 March, 2024) Static Badge
    • decoding strategy
    • soon
  36. PENSIEVE: Retrospect-then-Compare Mitigates Visual Hallucination (21 March, 2024) Star
    • decoding strategy
    • soon
  37. ESREAL: Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models (26 March, 2024) Static Badge
    • soon
  38. ICD: Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding (27 March, 2024) Static Badge
    • decoding strategy
    • soon
  39. FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback (07 April, 2024) Static Badge
    • soon
  40. Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning (16 April, 2024) Static Badge
    • soon
  41. FACT: Teaching MLLMs with Faithful, Concise and Transferable Rationales (17 April, 2024) Static Badge
    • soon
  42. TVP: Exploring the Transferability of Visual Prompting for Multimodal Large Language Models (17 April, 2024) Star
    • soon
  43. TextSquare: Scaling up Text-Centric Visual Instruction Tuning (19 April, 2024) Static Badge
    • soon
  44. HSA-DPO: Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback (22 April, 2024) Static Badge
    • Use of GPT-4/GPT-4v to generate fine-grained feedback for hallucination detection and detection (by supervised finetuning (SFT) of LVLM)
    • Propose automatic pipeline for preference dataset construction
    • Hallucination Severity Aware Direct Prefential Optimization (HSA-DPO) is introduced for mitigation of LVLM's hallucination
  45. Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation (30 April - CVPR 2024) Static Badge
    • soon
  46. CSR: Calibrated Self-Rewarding Vision Language Models (23 May, 2024) Star
    • soon
  47. HIO: Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization (24 May, 2024) Static Badge
    • soon
  48. VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap (24 May, 2024) Static Badge
    • soon
  49. RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthines (27 May, 2024) Star
    • soon
  50. AvisC: Don’t Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models (28 May, 2024) Star
    • decoding strategy
  51. RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs (28 May, 2024) Star
    • soon
  52. HALVA: Mitigating Object Hallucination via Data Augmented Contrastive Tuning (28 May, 2024) Static Badge
    • decoding strategy
    • will publish code soon
  53. NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models (30 May, 2024) Star
    • soon
  54. CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Model (04 June, 2024) Star
    • soon
  55. mDPO: Conditional Preference Optimization for Multimodal Large Language Models (17 June, 2024) Static Badge
    • soon
  56. DBD: Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning? (18 June, 2024) Static Badge
    • Introduce novel decoding technique called Differentiated Beam Decoding (DBD)
    • soon
  57. AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention (18 June, 2024) Star
    • Introduce AGLA, a training-free and plug-and-play decoding framework
    • soon
  58. Residual Visual Decoding: Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models (30 June, 2024) Star
    • decoding method
    • Soon
  59. BDHS: UNDERSTANDING ALIGNMENT IN MULTIMODAL LLMS: A COMPREHENSIVE STUDY (02 July, 2024) Static Badge
    • soon
  60. REVERIE: Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models (16 July, 2024) (ECCV 2024) Star
    • Introduced novel reflective instruction tuning to incorporate rationales into visual instruction tuning
    • Proposed large-scale instruction tuning dataset called REVERIE
  61. PAI: Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs (31 July, 2024) (ECCV 2024) Star
    • soon
  62. MHR: Mitigating Multilingual Hallucination in Large Vision-Language Models (01 August, 2024) Star
    • soon
  63. ARA: Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation (01 August, 2024) Static Badge
    • RAG for LVLMs for mitigating hallucination
    • soon
  64. Up to Date (01st August) and SOTA research work loading...

Note: 'soon' will be replaced with brief description!

Survey

  1. DEEP LEARNING APPROACHES ON IMAGE CAPTIONING: A REVIEW (22 August, 2023)
  2. A Survey on Hallucination in Large Vision-Language Models (1 February, 2024)
  3. Visual Hallucination: Definition, Quantification, and Prescriptive Remediations (26 March, 2024)
  4. Hallucination of Multimodal Large Language Models: A Survey (29 April, 2024) Star
  5. Unveiling Hallucination in Text, Image, Video, and Audio Foundation Models: A Comprehensive Survey (20 May, 2024)
  6. Up to Date (01st August) and SOTA research work loading...