reproducibilityindex.ai

MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond

Authors: Duy Kien Nguyen, Vedanuj Goswami, Xinlei Chen

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a series of experiments to validate the effectiveness of Mo Vie. By default, we use Adam (Kingma & Ba, 2015) optimizer, with batch size 128 and base learning rate 1e 4; momentum 0.9 and 0.98. We start training by linearly warming up learning rate from 2.5e 5 for 3 epochs (Yu et al., 2019). The rate is decayed by 0.1 after 10 epochs and we ﬁnish training after 13 epochs.
Researcher Affiliation	Industry	Duy-Kien Nguyen, Vedanuj Goswami, Xinlei Chen Facebook AI Research (FAIR)
Pseudocode	No	The paper provides architectural diagrams in Figure 2, but no explicit pseudocode or algorithm blocks.
Open Source Code	No	Code will be made available.
Open Datasets	Yes	Two datasets are used to counting with question queries. First is How Many-QA (Trott et al., 2018)... Extending How Many-QA, the Tally QA (Acharya et al., 2019) dataset... Results on COCO (Lin et al., 2014) are summarized in Tab. 3... Finally, to explore the capability of our model beyond counting, we evaluate Mo Vie on the CLEVR dataset (Johnson et al., 2017)... we also initiate an exploration of Mo Vie on the recent natural-image reasoning dataset, GQA (Hudson & Manning, 2019a).
Dataset Splits	Yes	First is How Many-QA (Trott et al., 2018) where the train set questions are extracted from VQA 2.0 train and Visual Genome (VG) (Krishna et al., 2017). The val and test sets are taken from VQA 2.0 val set. Extending How Many-QA, the Tally QA (Acharya et al., 2019) dataset augments the train set by adding synthetic counting questions automatically generated from COCO annotations. They also split the test set into two parts: testsimple and test-complex... We train all models on VQA 2.0 train and report the breakdown scores on val.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models. It only mentions the general software environment: "We use Pytorch to implement our model on a modular framework for vision and language multimodal research from Facebook AI Research (FAIR)."
Software Dependencies	No	The paper mentions using "Pytorch" and a "modular framework" but does not specify exact version numbers for these or any other software dependencies required for reproducibility. (Appendix A)
Experiment Setup	Yes	By default, we use Adam (Kingma & Ba, 2015) optimizer, with batch size 128 and base learning rate 1e 4; momentum 0.9 and 0.98. We start training by linearly warming up learning rate from 2.5e 5 for 3 epochs (Yu et al., 2019). The rate is decayed by 0.1 after 10 epochs and we ﬁnish training after 13 epochs.