You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi team, I have noticed that there is a high repetition of questions in test datasets in augmented data. In particular, I am looking at synonym_generalization task, which reads data from /data_agumented/CLEVR/questions/synonym_generalization/i/, where data_augmented is the file I dowloaded from http://vcml.csail.mit.edu/data/dataset_augmentation.tgz as per the instruction. I can see the following:
File /questions/synonym_generalization/0/test_questions.json consists of 60000 questions, which are the repetition of the following 9 questions:
['Is small a synonym of sphere?', 'Is shiny a synonym of sphere?',
'Is shiny a synonym of small?', 'Is sphere a synonym of small?',
'Is small a synonym of shiny?', 'Is sphere a synonym of shiny?',
'Is small a synonym of small?', 'Is shiny a synonym of shiny?',
'Is sphere a synonym of sphere?']
File /questions/synonym_generalization/1/test_questions.json consists of 60000 questions, which are the repetition of the following 9 questions:
['Is cube a synonym of cube?', 'Is metal a synonym of cube?',
'Is ball a synonym of cube?', 'Is metal a synonym of ball?',
'Is cube a synonym of ball?', 'Is ball a synonym of ball?',
'Is metal a synonym of metal?', 'Is ball a synonym of metal?',
'Is cube a synonym of metal?']
File /questions/synonym_generalization/2/test_questions.json consists of 60000 questions, which are the repetition of the following 1 question:
['Is metallic a synonym of metallic?']
File /questions/synonym_generalization/3/test_questions.json consists of 60000 questions, which are the repetition of the following 16 questions:
['Is metal a synonym of shiny?', 'Is shiny a synonym of shiny?',
'Is ball a synonym of large?', 'Is metal a synonym of large?',
'Is shiny a synonym of large?', 'Is ball a synonym of ball?',
'Is ball a synonym of shiny?', 'Is large a synonym of shiny?',
'Is large a synonym of large?', 'Is shiny a synonym of ball?',
'Is large a synonym of ball?', 'Is metal a synonym of ball?',
'Is metal a synonym of metal?', 'Is shiny a synonym of metal?',
'Is ball a synonym of metal?', 'Is large a synonym of metal?']
Is this behaviour expected? I could not find any other difference between these questions apart from 'question_index'. Would really appreciate your help on this.
The text was updated successfully, but these errors were encountered:
Hi team, I have noticed that there is a high repetition of questions in test datasets in augmented data. In particular, I am looking at
synonym_generalization
task, which reads data from/data_agumented/CLEVR/questions/synonym_generalization/i/
, where data_augmented is the file I dowloaded from http://vcml.csail.mit.edu/data/dataset_augmentation.tgz as per the instruction. I can see the following:/questions/synonym_generalization/0/test_questions.json
consists of 60000 questions, which are the repetition of the following 9 questions:/questions/synonym_generalization/1/test_questions.json
consists of 60000 questions, which are the repetition of the following 9 questions:/questions/synonym_generalization/2/test_questions.json
consists of 60000 questions, which are the repetition of the following 1 question:/questions/synonym_generalization/3/test_questions.json
consists of 60000 questions, which are the repetition of the following 16 questions:Is this behaviour expected? I could not find any other difference between these questions apart from 'question_index'. Would really appreciate your help on this.
The text was updated successfully, but these errors were encountered: