Skip to content

LEMMA: An effective and explainable way to detect multimodal misinformation with LVLM and external knowledge augmentation, incorporating the intuition and reasoning capbility inside LVLM.

License

Notifications You must be signed in to change notification settings

fan19-hub/LEMMA

Repository files navigation

LEMMA

An effective and explainable way to detect the multimodal misinformation with LVLM and external knowledge augmentation, incorporating the intuition and reasoning capbility inside LVLM.

Code License:License: MIT

Table of Contents

Publication

This is the offical repository for LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation

Framework

Title

Get Started

Install Dependency

pip install -r requirements.txt

Chrome Driver

Open your chrome and check its version here: chrome://settings/help Find the chromederver suitable for your chrome version here: https://googlechromelabs.github.io/chrome-for-testing/#stable, put it under the root directory of this project

OpenAI API Key

Please register for an API Key on https://platform.openai.com/api-keys. Then set up the environment variable For Linux, run:

export OPENAI_API_KEY=<Your own API Key>

For Windows, run:

$env:OPENAI_API_KEY = "<Your own API Key>"

Example Run

Run the exmaple input

python lemma.py --input_file_name data/example_input.json --use_cache

Run Twitter Dataset

python lemma.py --input_file_name data/twitter/twitter.json --use_cache

Dataset

To assess the performance of LEMMA, we mainly evaluate its performance on two representative datasets in the field.

  • Twitter (Ma et al., 2017) collects multimedia tweets from Twitter platform. The posts in the dataset contain textual tweets, image/video attachments, and additional social contextual information. For our task, we filtered out only image-text pairs as testing samples.
  • Fakeddit (Nakamura et al., 2019) is designed for fine-grained fake news detection. The dataset is curated from multiple subreddits of the Reddit plat form where each post includes textual sentences, images, and social context information. The 2-way categorization for this dataset establishes whether the news is real or false.

Baselines

Models Twitter Fakeddit
Direct (LLaVA) 0.605 0.663
CoT (LLaVA) 0.468 0.673
Direct (InstructBLIP) 0.494 0.726
CoT (InstructBLIP) 0.455 0.610
Direct (GPT-4) 0.637 0.677
CoT (GPT-4) 0.667 0.691
FacTool (GPT-4) 0.548 0.506
Direct (GPT-4V) 0.757 0.734
CoT (GPT-4V) 0.678 0.754
LEMMA (our model) 0.824 0.828

Citation

To cite this work, please follow the citation format below

@article{xuan2024lemma,
  title={LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation},
  author={Xuan, Keyang and Yi, Li and Yang, Fan and Wu, Ruochen and Fung, Yi R and Ji, Heng},
  journal={arXiv preprint arXiv:2402.11943},
  year={2024}
}

About

LEMMA: An effective and explainable way to detect multimodal misinformation with LVLM and external knowledge augmentation, incorporating the intuition and reasoning capbility inside LVLM.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published