reproducibilityindex.ai

Structure-Aware Multimodal Sequential Learning for Visual Dialog

Authors: Young-Jin Kim, Min-Jun Kim, Kyunghwan An, Jinwoo Ahn, Jaeseok Kim, Yu-Jung Heo, Du-Seong Chang, Eun-Sol Kim

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For experiments, we achieved a new state-of-the-art performance on three visual dialog datasets, including the most challenging one COMET.
Researcher Affiliation	Collaboration	1 Department of Artificial Intelligence Application, Hanyang University, South Korea... 3 KT Corporation
Pseudocode	No	The paper describes the algorithm and architecture but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We mainly evaluate our algorithm on the most challenging visual dialog dataset, COMET (Kottur et al. 2022) and use two commonly used visual dialog datasets: Vis Dial 1.0 (Vis Dial) (Das et al. 2017), and MNIST Dialog (Seo et al. 2017).
Dataset Splits	Yes	Vis Dial As presented in table 5, our model exhibits remarkable results on overall metrics, compared to the baselines on Vis Dial v1.0 validation set.
Hardware Specification	Yes	It takes 5 hours for 20 epoch training with 64 batch size on a 4-A100 machine.
Software Dependencies	No	The paper mentions pretrained models like ViT-base and Flan-T5-base, and the AdamW optimizer, but does not provide specific version numbers for software dependencies like programming languages or libraries.
Experiment Setup	Yes	It takes 5 hours for 20 epoch training with 64 batch size on a 4-A100 machine. [...] We use the Adam W optimizer (Loshchilov and Hutter 2017) with β1 = 0.9, β2 = 0.98, and weight decay of 0.05. We use a piecewise linear scheduler with a linear warmup of 2K steps starting from a learning rate of 1e-4 and a peak learning rate of 1e-3.