SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph
Authors: Yuxing Long, Binyuan Hui, Fulong Ye, Yanyang Li, Zhuoxin Han, Caixia Yuan, Yongbin Li, Xiaojie Wang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results verify the SPRING s effectiveness, showing that it significantly outperforms state-of-the-art approaches on both SIMMC 1.0 and SIMMC 2.0 datasets. |
| Researcher Affiliation | Academia | Yuxing Long1, Binyuan Hui2, Fulong Ye1, Yanyang Li2, Zhuoxin Han1, Caixia Yuan1, Yongbin Li2, Xiaojie Wang1 * 1 Beijing University of Posts and Telecommunications, Beijing, China 2 Independent Researcher {longyuxing, fulong ye, hanzhuoxin, yuancx, xjwang}@bupt.edu.cn, lyb821@gmail.com |
| Pseudocode | Yes | Algorithm 1: QA Pair Generation |
| Open Source Code | Yes | We release our code and data at Github LYX0501/SPRING repository. |
| Open Datasets | Yes | To evaluate the performance of the proposed model, we first conduct experiments on widely-used situated multimodal dialogue datasets SIMMC 1.0 and SIMMC 2.0. The SIMMC 2.0 dataset contains 7.2k fashion dialogs and 4k furniture dialogs, respectively. There are around 290 digital assets for fashion and 110 assets for furniture, which are rearranged within seed scenes to generate 160 different scenes. The SIMMC 1.0 dataset includes 6.6k fashion dialogs and 6.4k furniture dialogs. |
| Dataset Splits | No | The paper mentions training on the 'SIMMC train set' and evaluating on the 'dev-test split' but does not explicitly provide the percentages or specific sizes for all train/validation/test splits needed for full reproduction. |
| Hardware Specification | Yes | During pretraining, our model is trained for 4 epochs with 8 batch sizes on 8 TESLA V100 GPU. |
| Software Dependencies | No | The paper mentions using specific models (Transformer, OFA) and optimizers (Adam) but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | During pretraining, our model is trained for 4 epochs with 8 batch sizes on 8 TESLA V100 GPU. Adam (Kingma and Ba 2015) is adopted as optimizer with a 4e-4 learning rate. Besides, the dropout rate is set to 0.2 to prevent over-fitting. During fine-tuning stage, we train 60 epochs on the SIMMC train set with a learning rate of 4e-5 and a batch size of 16. |