reproducibilityindex.ai

Pre-Trained Multi-Goal Transformers with Prompt Optimization for Efficient Online Adaptation

Authors: Haoqi Yuan, Yuhui Fu, Feiyang Xie, Zongqing Lu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate MGPO across diverse domains, including maze navigation, the robotic simulation environment Kitchen, and the open-world game Crafter. Our results demonstrate that MGPO significantly surpasses prior methods in terms of sample efficiency, online adaptation performance, robustness, and interpretability compared with existing methods.In this section, we present experimental results obtained across various domains to evaluate the efficacy of MGPO.
Researcher Affiliation	Academia	1School of Computer Science, Peking University 2Yuanpei College, Peking University 3Beijing Academy of Artificial Intelligence
Pseudocode	Yes	Our prompt optimization method is detailed in Algorithm 1 in Appendix E.1, where the implementations of UCB and BPE are also provided.Algorithm 1 Prompt Optimization in MGPO-UCB and MGPO-BPE
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We have released the code.
Open Datasets	Yes	Kitchen [16] is a robotic environment... We verified that the Kitchen datasets provided in D4RL [13] do not contain diverse transitions between the five subtasks. To collect trajectories completing different sets of subtasks, we use PPO [46] to train a policy for each sub-task using a shaped reward function and varied initial states from the Kitchen-mixed-v0 dataset [13].Crafter [18]: A simplified benchmark of the open-world game Minecraft... The dataset is collected using policies from AD [23].
Dataset Splits	No	The paper describes training and testing but does not explicitly provide details about a separate validation dataset split with specific percentages or sample counts for hyperparameter tuning or early stopping, beyond the online adaptation phase.
Hardware Specification	Yes	All models are trained on a lab machine with a single NVIDIA RTX 4090 GPU and Intel i9 CPUs.
Software Dependencies	No	The paper mentions the use of "GPT-2" as a backbone and describes the model architecture, but it does not specify software dependencies like programming language, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries with their specific version numbers.
Experiment Setup	Yes	Table 6: Hyperparameters used in pre-training for all environments. Embedding dimension 128 Number of layers 3 Number of attention heads 1 Activation Re LU Batch size 64 Learning rate 1e-4 Learning rate decay weight 1e-4 Dropout 0.1 Warmup steps 10000.In Maze Runner, we sample prompts from the agent s locations in the whole trajectory. To augment the diversity of task goals and trajectory lengths, we truncate the trajectory at a random timestep h for each sampled trajectory and use oh to represent its task goal.