Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback
Authors: Marcel Torne Villasevil, Max Balsells I Pamies, Zihan Wang, Samedh Desai, Tao Chen, Pulkit Agrawal, Abhishek Gupta
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we show that Hu GE learns to successfully accomplish long-horizon tasks, and tasks with large combinatorial exploration spaces through little human supervision. To demonstrate these experimentally, we test on several goal-reaching domains in simulation, shown in 4, in the Mu Jo Co [54] and Py Bullet [17] simulators where we compare against state-of-the-art baselines. Furthermore, we show the benefits of our method by learning policies directly on a real-world Lo Co Bot robot. |
| Researcher Affiliation | Academia | 1Massachusetts Institute of Technology 2Harvard University 3University of Washington {marcelto,taochen,pulkitag}@mit.edu {balsells,avinwang,samedh,abhgupta}@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1 Hu GE: Guided Exploration with Human Feedback. Algorithm 2 Policy Exploration. |
| Open Source Code | Yes | Project website at https://human-guided-exploration.github.io/Hu GE/. The code is available at github.com/Improbable-AI/human-guided-exploration |
| Open Datasets | No | The paper describes several goal-reaching domains in simulation (MuJoCo, PyBullet) and real-world robot tasks. It also mentions collecting 'crowdsourced pilot data' for experiments, but does not provide access information (link, DOI, formal citation) for a pre-existing or released public dataset used for training. |
| Dataset Splits | No | The paper does not explicitly provide details about dataset splits for training, validation, or testing, such as percentages, absolute counts, or references to standard predefined splits. |
| Hardware Specification | Yes | For training the models and running the experiments, we had access to several workstations with one Ge Force RTX 2080 Ti or one Ge Force RTX 3090. |
| Software Dependencies | No | The paper mentions 'Optimize Adam' and 'Pybullet, a python module for physics simulation', but does not specify version numbers for Python, deep learning frameworks (e.g., PyTorch, TensorFlow), or other key software libraries. |
| Experiment Setup | Yes | The details of the parameters with which the results have been obtained will be disclosed in this section. In particular, Table G.4 depicts the parameters used for the different benchmarks, while Table G.3 contains the hyperparameter configuration used for the different algorithms. |