Skip to content

Multimodal Visual Question Answering on the CLEVR Dataset, and analyzing the performance of General Purpose Vision models on downstream VQA tasks in a zero-shot settings.

License

Notifications You must be signed in to change notification settings

iamsashank09/multimodal-visual-question-answering

Repository files navigation

Exploring and Building Visual Question Answering Systems using CLEVR and EasyVQA

Research Report: Exploring and Building Visual Question Answering Systems using CLEVR and EasyVQA

Example 1:

enter image description here

Example 2:

enter image description here

Please follow the below instructions to run our code:

  1. Download our best performing model checkpoint from here and place in:

     models/
    
  2. Download the CLEVR dataset from here, we've used CLEVR v1.0 Main (Not CoGenT), place the data in:

     CLEVR_v1.0/
     	images/
     	questions/
    

    If you want to instead try it on a simpler, smaller datatset you can try EasyVQA, which has only 13 classes. The code and process remains the same.

  3. Run multimodel-clevr-public.py which is in the project root folder: (Make sure all the requirements are installed)

     python multimodel-clevr-public.py
    

About

Multimodal Visual Question Answering on the CLEVR Dataset, and analyzing the performance of General Purpose Vision models on downstream VQA tasks in a zero-shot settings.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published