reproducibilityindex.ai

Language-driven Scene Synthesis using Multi-conditional Diffusion Model

Authors: An Dinh Vuong, Minh Nhat VU, Toan Nguyen, Baoru Huang, Dzung Nguyen, Thieu Vo, Anh Nguyen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The intensive experiment results illustrate that our method outperforms state-of-the-art benchmarks and enables natural scene editing applications.
Researcher Affiliation	Collaboration	An Dinh Vuong FSOFT AI Center Vietnam Minh Nhat Vu TU Wien, AIT Gmb H Austria Toan Tien Nguyen FSOFT AI Center Vietnam Baoru Huang Imperial College London UK Dzung Nguyen FSOFT AI Center Vietnam Thieu Vo Ton Duc Thang University Vietnam Anh Nguyen University of Liverpool UK
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The source code and dataset can be accessed at https://lang-scene-synth.github.io/.
Open Datasets	Yes	We contribute PRO-te Xt, an extension of PROXD (Hassan et al., 2019) and PROXE (Zhang et al., 2020)... We utilize 143/17 interactions of HUMANISE (Wang et al., 2022) to train/test.
Dataset Splits	No	The paper specifies train/test splits for the datasets (e.g., "180/20 interactions for training/testing") but does not explicitly mention a separate validation split or how it was handled.
Hardware Specification	Yes	All experiments are trained on an NVIDIA Ge Force 3090 Ti with 1000 epochs within two days.
Software Dependencies	No	The paper mentions various software components and backbones (e.g., CLIP, BERT, Point Net++, DGCNN, POSA, P2R-Net) but does not provide specific version numbers for any of them.
Experiment Setup	Yes	All experiments are trained on an NVIDIA Ge Force 3090 Ti with 1000 epochs within two days. (from main text). Also, Appendix C Table 7 lists hyperparameters: N 1024, M 8, DCLIP of (iii) 512, DBERT of (iii) 768, dtext of (iii) 128, dv of (iv) 32, dF of (v) 128, dtime of (vii) 32, Num. attention layers 12, Num. attention heads 8.