Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback
Authors: Marcel Torne Villasevil, Max Balsells I Pamies, Zihan Wang, Samedh Desai, Tao Chen, Pulkit Agrawal, Abhishek Gupta
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we show that Hu GE learns to successfully accomplish long-horizon tasks, and tasks with large combinatorial exploration spaces through little human supervision. To demonstrate these experimentally, we test on several goal-reaching domains in simulation, shown in 4, in the Mu Jo Co [54] and Py Bullet [17] simulators where we compare against state-of-the-art baselines. Furthermore, we show the benefits of our method by learning policies directly on a real-world Lo Co Bot robot. |
| Researcher Affiliation | Academia | 1Massachusetts Institute of Technology 2Harvard University 3University of Washington EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Hu GE: Guided Exploration with Human Feedback. Algorithm 2 Policy Exploration. |
| Open Source Code | Yes | Project website at https://human-guided-exploration.github.io/Hu GE/. The code is available at github.com/Improbable-AI/human-guided-exploration |
| Open Datasets | No | The paper describes several goal-reaching domains in simulation (MuJoCo, PyBullet) and real-world robot tasks. It also mentions collecting 'crowdsourced pilot data' for experiments, but does not provide access information (link, DOI, formal citation) for a pre-existing or released public dataset used for training. |
| Dataset Splits | No | The paper does not explicitly provide details about dataset splits for training, validation, or testing, such as percentages, absolute counts, or references to standard predefined splits. |
| Hardware Specification | Yes | For training the models and running the experiments, we had access to several workstations with one Ge Force RTX 2080 Ti or one Ge Force RTX 3090. |
| Software Dependencies | No | The paper mentions 'Optimize Adam' and 'Pybullet, a python module for physics simulation', but does not specify version numbers for Python, deep learning frameworks (e.g., PyTorch, TensorFlow), or other key software libraries. |
| Experiment Setup | Yes | The details of the parameters with which the results have been obtained will be disclosed in this section. In particular, Table G.4 depicts the parameters used for the different benchmarks, while Table G.3 contains the hyperparameter configuration used for the different algorithms. |