SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

Authors: Yuxing Long, Binyuan Hui, Fulong Ye, Yanyang Li, Zhuoxin Han, Caixia Yuan, Yongbin Li, Xiaojie Wang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results verify the SPRING s effectiveness, showing that it significantly outperforms state-of-the-art approaches on both SIMMC 1.0 and SIMMC 2.0 datasets.
Researcher Affiliation Academia Yuxing Long1, Binyuan Hui2, Fulong Ye1, Yanyang Li2, Zhuoxin Han1, Caixia Yuan1, Yongbin Li2, Xiaojie Wang1 * 1 Beijing University of Posts and Telecommunications, Beijing, China 2 Independent Researcher {longyuxing, fulong ye, hanzhuoxin, yuancx, xjwang}@bupt.edu.cn, lyb821@gmail.com
Pseudocode Yes Algorithm 1: QA Pair Generation
Open Source Code Yes We release our code and data at Github LYX0501/SPRING repository.
Open Datasets Yes To evaluate the performance of the proposed model, we first conduct experiments on widely-used situated multimodal dialogue datasets SIMMC 1.0 and SIMMC 2.0. The SIMMC 2.0 dataset contains 7.2k fashion dialogs and 4k furniture dialogs, respectively. There are around 290 digital assets for fashion and 110 assets for furniture, which are rearranged within seed scenes to generate 160 different scenes. The SIMMC 1.0 dataset includes 6.6k fashion dialogs and 6.4k furniture dialogs.
Dataset Splits No The paper mentions training on the 'SIMMC train set' and evaluating on the 'dev-test split' but does not explicitly provide the percentages or specific sizes for all train/validation/test splits needed for full reproduction.
Hardware Specification Yes During pretraining, our model is trained for 4 epochs with 8 batch sizes on 8 TESLA V100 GPU.
Software Dependencies No The paper mentions using specific models (Transformer, OFA) and optimizers (Adam) but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes During pretraining, our model is trained for 4 epochs with 8 batch sizes on 8 TESLA V100 GPU. Adam (Kingma and Ba 2015) is adopted as optimizer with a 4e-4 learning rate. Besides, the dropout rate is set to 0.2 to prevent over-fitting. During fine-tuning stage, we train 60 epochs on the SIMMC train set with a learning rate of 4e-5 and a batch size of 16.