reproducibilityindex.ai

Synergistic Dual Spatial-aware Generation of Image-to-text and Text-to-image

Authors: Yu Zhao, Hao Fei, Xiangtai Li, Libo Qin, Jiayi Ji, Hongyuan Zhu, Meishan Zhang, Min Zhang, Jianguo Wei

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the visual spatial understanding dataset VSD, our system outperforms the mainstream T2I and I2T methods significantly. Further in-depth analysis reveals how our dual learning strategy advances.
Researcher Affiliation	Collaboration	Yu Zhao1, Hao Fei2 , Xiangtai Li3, Libo Qin4, Jiayi Ji2, Hongyuan Zhu5, Meishan Zhang6, Min Zhang6, Jianguo Wei1 1 Tianjin University 2 National University of Singapore 3 Bytedance 4 Central South University 5 I2R & CFAR, A*STAR 6 Harbin Institute of Technology (Shenzhen)
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found.
Open Source Code	Yes	We will open source at Github.
Open Datasets	Yes	To demonstrate the capability of our proposed method for both ST2I and SI2T generation, we conduct experiments on the VSD [95, 97] dataset, which is constructed for visual spatial understanding. ... We follow [72] to take the 3D datasets Matterport3D (MP3D) [7] 3DSSG [78] and CURB-SG [27]...
Dataset Splits	No	The paper mentions using the VSD dataset for experiments and training on 'aligned 3DSG-Image-Text data' but does not specify exact train/validation/test splits (e.g., percentages or sample counts) needed to reproduce the data partitioning.
Hardware Specification	No	The paper provides training time (e.g., '20 hours' for 'Overall Training') and model parameters in Table 8, but does not specify the exact hardware used (e.g., GPU/CPU models, memory amounts).
Software Dependencies	No	We use the pre-trained VQ-VAE of VQ-GAN [14]... For text encoder, we adopt the CLIP model... We adopt the pre-trained GPT-2 [59]... We optimize the framework using Adam W [50]... We follow the default settings of DGAE [54]... While it lists software, it generally lacks explicit version numbers. The checklist mentions 'Python 3.8', but this is for environment and not detailed for key libraries.
Experiment Setup	Yes	We optimize the framework using Adam W [50] with β1 = 0.9 and β2 = 0.98. The learning rate is set to 5e-5 after 10,000 warmup iterations in the final dual tuning.