Using Left and Right Brains Together: Towards Vision and Language Planning

Authors: Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, Jianguo Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experiments, 4.2. Results, 4.3. Ablation Study
Researcher Affiliation Collaboration Jun Cen 1 2 3 * Chenfei Wu 2 * Xiao Liu 2 Shengming Yin 2 Yixuan Pei 4 Jinglong Yang 1 5 Qifeng Chen 3 Nan Duan 2 Jianguo Zhang 1 6 1Research Institute of Trustworthy Autonomous Systems and Department of Computer Science and Engineering, Southern University of Science and Technology 2Microsoft Research Asia 3The Hong Kong University of Science and Technology 4Xi an Jiaotong University 5City University of Hong Kong 6Peng Cheng Lab, Shenzhen, China.
Pseudocode No The paper describes the framework with text and diagrams but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing their code or a link to a code repository for their method.
Open Datasets Yes Datasets. We evaluate our VLP on various scenarios, covering the open-domain scenario (STAR (Wu et al., 2021) and NEx T-QA (Xiao et al., 2021)), autonomous driving scenario (BDD-X (Kim et al., 2018)), and robotics operation scenario (BAIR (Ebert et al., 2017)).
Dataset Splits No The paper mentions training on BDD-X and BAIR datasets and following training details of ADAPT (Jin et al., 2023), but it does not explicitly provide the train/validation/test splits within its text.
Hardware Specification No The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions various software components such as Chat GPT, Stable Video Diffusion, and LLAVA, but does not provide specific version numbers for these dependencies.
Experiment Setup No The paper states it conducts zero-shot experiments without finetuning LLAVA and follows the training details of ADAPT (Jin et al., 2023) for other models, but it does not explicitly provide specific hyperparameters or detailed training configurations within its own text.