reproducibilityindex.ai

Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback

Authors: Marcel Torne Villasevil, Max Balsells I Pamies, Zihan Wang, Samedh Desai, Tao Chen, Pulkit Agrawal, Abhishek Gupta

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we show that Hu GE learns to successfully accomplish long-horizon tasks, and tasks with large combinatorial exploration spaces through little human supervision. To demonstrate these experimentally, we test on several goal-reaching domains in simulation, shown in 4, in the Mu Jo Co [54] and Py Bullet [17] simulators where we compare against state-of-the-art baselines. Furthermore, we show the beneﬁts of our method by learning policies directly on a real-world Lo Co Bot robot.
Researcher Affiliation	Academia	1Massachusetts Institute of Technology 2Harvard University 3University of Washington {marcelto,taochen,pulkitag}@mit.edu {balsells,avinwang,samedh,abhgupta}@cs.washington.edu
Pseudocode	Yes	Algorithm 1 Hu GE: Guided Exploration with Human Feedback. Algorithm 2 Policy Exploration.
Open Source Code	Yes	Project website at https://human-guided-exploration.github.io/Hu GE/. The code is available at github.com/Improbable-AI/human-guided-exploration
Open Datasets	No	The paper describes several goal-reaching domains in simulation (MuJoCo, PyBullet) and real-world robot tasks. It also mentions collecting 'crowdsourced pilot data' for experiments, but does not provide access information (link, DOI, formal citation) for a pre-existing or released public dataset used for training.
Dataset Splits	No	The paper does not explicitly provide details about dataset splits for training, validation, or testing, such as percentages, absolute counts, or references to standard predefined splits.
Hardware Specification	Yes	For training the models and running the experiments, we had access to several workstations with one Ge Force RTX 2080 Ti or one Ge Force RTX 3090.
Software Dependencies	No	The paper mentions 'Optimize Adam' and 'Pybullet, a python module for physics simulation', but does not specify version numbers for Python, deep learning frameworks (e.g., PyTorch, TensorFlow), or other key software libraries.
Experiment Setup	Yes	The details of the parameters with which the results have been obtained will be disclosed in this section. In particular, Table G.4 depicts the parameters used for the different benchmarks, while Table G.3 contains the hyperparameter conﬁguration used for the different algorithms.