reproducibilityindex.ai

GenSim: Generating Robotic Simulation Tasks via Large Language Models

Authors: Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Chen Bao, Yuzhe Qin, Bailin Wang, Huazhe Xu, Xiaolong Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we propose to automatically generate rich simulation environments and expert demonstrations by exploiting a large language models (LLM) grounding and coding ability. Our approach, dubbed GENSIM, has two modes: goal-directed generation, wherein a target task is given to the LLM and the LLM proposes a task curriculum to solve the target task, and exploratory generation, wherein the LLM bootstraps from previous tasks and iteratively proposes novel tasks that would be helpful in solving more complex tasks. We use GPT4 to expand the existing benchmark by ten times to over 100 tasks, on which we conduct supervised finetuning and evaluate several LLMs including finetuned GPTs and Code Llama on code generation for robotic simulation tasks.
Researcher Affiliation	Academia	Lirui Wang1, Yiyang Ling2,3, Zhecheng Yuan4, Mohit Shridhar5, Chen Bao6, Yuzhe Qin2, Bailin Wang1, Huazhe Xu4, Xiaolong Wang2 1MIT CSAIL, 2UC San Diego, 3Shanghai Jiao Tong University, 4Tsinghua University, 5University of Washington, 6CMU
Pseudocode	No	The paper provides figures illustrating workflows and Python code examples in the appendix, but it does not contain any sections explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code	Yes	See our project website (https://liruiw.github.io/gensim), demo (https://huggingface.co/spaces/Gen-Sim/Gen Sim), and code (https://github.com/liruiw/Gen Sim) for more details.
Open Datasets	Yes	By initializing the task library with 10 human-curated tasks (Shridhar et al., 2022) , we use Gen Sim to scale it up and generate over 100 tasks (Figure 1).
Dataset Splits	No	The paper states "The result is averaged over two different task splits." but does not provide specific details on the percentages or exact composition of these splits for training, validation, and test sets, which are necessary for full reproducibility.
Hardware Specification	Yes	For Code-LLama experiments (Rozi ere et al., 2023), we use the open-source Code-LLa MA-Instruct 7B and Code-LLa MA-Instruct 13B models. We use Huggingface transformers for Lo RA (Hu et al., 2021) finetuning with quantization and parameter-efficient finetuning for 10 epochs on 2 V-100 GPUs.
Software Dependencies	No	The paper mentions using "Open AI s finetuning API," "Huggingface transformers," and "Lo RA," but does not provide specific version numbers for these software components, which is necessary for a reproducible description of ancillary software.
Experiment Setup	Yes	The input observation is derived from top-down RGB-D images captured via a Real Sense camera. We downscale the original image dimensions from 640 480 to 320 160 for policy input. To enhance the transition from simulation to reality, we incorporate data augmentation techniques such as color jittering. The training on GPT4 generated dataset with batch size=1 takes around 4 hours to finish 3000 iterations.