Questions repetition in test datasets (augmented) #5

anette123 · 2021-04-06T16:34:41Z

Hi team, I have noticed that there is a high repetition of questions in test datasets in augmented data. In particular, I am looking at synonym_generalization task, which reads data from /data_agumented/CLEVR/questions/synonym_generalization/i/, where data_augmented is the file I dowloaded from http://vcml.csail.mit.edu/data/dataset_augmentation.tgz as per the instruction. I can see the following:

File /questions/synonym_generalization/0/test_questions.json consists of 60000 questions, which are the repetition of the following 9 questions:

       ['Is small a synonym of sphere?', 'Is shiny a synonym of sphere?',
       'Is shiny a synonym of small?', 'Is sphere a synonym of small?',
       'Is small a synonym of shiny?', 'Is sphere a synonym of shiny?',
       'Is small a synonym of small?', 'Is shiny a synonym of shiny?',
       'Is sphere a synonym of sphere?']

File /questions/synonym_generalization/1/test_questions.json consists of 60000 questions, which are the repetition of the following 9 questions:

       ['Is cube a synonym of cube?', 'Is metal a synonym of cube?',
       'Is ball a synonym of cube?', 'Is metal a synonym of ball?',
       'Is cube a synonym of ball?', 'Is ball a synonym of ball?',
       'Is metal a synonym of metal?', 'Is ball a synonym of metal?',
       'Is cube a synonym of metal?']

File /questions/synonym_generalization/2/test_questions.json consists of 60000 questions, which are the repetition of the following 1 question:

     ['Is metallic a synonym of metallic?']

File /questions/synonym_generalization/3/test_questions.json consists of 60000 questions, which are the repetition of the following 16 questions:

      ['Is metal a synonym of shiny?', 'Is shiny a synonym of shiny?',
       'Is ball a synonym of large?', 'Is metal a synonym of large?',
       'Is shiny a synonym of large?', 'Is ball a synonym of ball?',
       'Is ball a synonym of shiny?', 'Is large a synonym of shiny?',
       'Is large a synonym of large?', 'Is shiny a synonym of ball?',
       'Is large a synonym of ball?', 'Is metal a synonym of ball?',
       'Is metal a synonym of metal?', 'Is shiny a synonym of metal?',
       'Is ball a synonym of metal?', 'Is large a synonym of metal?']

Is this behaviour expected? I could not find any other difference between these questions apart from 'question_index'. Would really appreciate your help on this.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions repetition in test datasets (augmented) #5

Questions repetition in test datasets (augmented) #5

anette123 commented Apr 6, 2021 •

edited

Loading

Questions repetition in test datasets (augmented) #5

Questions repetition in test datasets (augmented) #5

Comments

anette123 commented Apr 6, 2021 • edited Loading

anette123 commented Apr 6, 2021 •

edited

Loading