Discovering and Achieving Goals via World Models
Authors: Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, Deepak Pathak
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | LEXA substantially outperforms previous approaches to unsupervised goal reaching, both on prior benchmarks and on a new challenging benchmark with 40 test tasks spanning across four robotic manipulation and locomotion domains. We evaluate LEXA on prior benchmarks used by Skew Fit [37], DISCERN [50], and Plan2Explore [43] in Section 3.3. Since these benchmarks are largely saturated, we also introduce a new challenging benchmark shown in Figure 2. We evaluate LEXA on this benchmark is Section 3.2. Ablation of different components We ablated components of LEXA on the Robo Bins environment in Figure 8. |
| Researcher Affiliation | Academia | Russell Mendonca* Carnegie Mellon University Oleh Rybkin* University of Pennsylvania Kostas Daniilidis University of Pennsylvania Danijar Hafner University of Toronto Deepak Pathak Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1: Latent Explorer Achiever (LEXA) 1: initialize: World model M, Replay buffer D, Explorer πe(at | zt), Achiever πg(at | zt, g) 2: while exploring do 3: Train M on D 4: Train πe in imagination of M to maximize exploration rewards P 5: Train πg in imagination of M to maximize P t rg t (zt, g) for images g D. 6: (Optional) Train d(zi, zj) to predict distances j i on the imagination data from last step. 7: Deploy πe in the environment to explore and grow D. 8: Deploy πg in the environment to achieve a goal image g D to grow D. 9: end while 10: while evaluating do 11: given: Evaluation goal g 12: Deploy πg in the world to reach g. 13: end while |
| Open Source Code | No | The paper provides a "Project page: https://orybkin.github.io/lexa/" but does not explicitly state that the source code for the methodology is available at this link or in supplementary materials, which is required for a positive classification according to the prompt's strict criteria for direct code access. |
| Open Datasets | Yes | Our new benchmark defines goal images for a diverse set of four existing environments as follows: Robo Yoga We use the walker and quadruped domains of the Deep Mind Control Suite [48] to define the Robo Yoga benchmark, consisting of 12 goal images that correspond to different body poses for each of the two environments, such as lying down, standing up, and balancing. Robo Bins Based on Meta World [53], we create a scene with a Sawyer robotic arm, two bins, and two blocks of different colors. Robo Kitchen The last benchmark involves the challenging kitchen environment from [22], where a franka robot can interact with various objects including a burner, light switch, sliding cabinet, hinge cabinet, microwave, or kettle. |
| Dataset Splits | No | The paper describes its training process and the use of a replay buffer, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages, sample counts, or specific split files) required for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. It only mentions the use of robotic manipulation and locomotion environments. |
| Software Dependencies | No | The paper does not specify the version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA, or other libraries/solvers) used in their experiments. |
| Experiment Setup | No | The paper mentions using the "Adam optimizer" and the "Dreamer algorithm [24]" for training, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or other detailed system-level training configurations. While it refers to an "imagination horizon", no specific value is given. |