ELDEN: Exploration via Local Dependencies
Authors: Zizhao Wang, Jiaheng Hu, Peter Stone, Roberto Martín-Martín
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of ELDEN on four different domains with complex dependencies, ranging from 2D grid worlds to 3D robotic tasks. In all domains, ELDEN correctly identifies local dependencies and learns successful policies, significantly outperforming previous state-of-the-art exploration methods. |
| Researcher Affiliation | Collaboration | Zizhao Wang University of Texas at Austin zizhao.wang@utexas.edu Jiaheng Hu University of Texas at Austin jhu@cs.utexas.edu Peter Stone University of Texas at Austin, Sony AI pstone@cs.utexas.edu Roberto Martín-Martín University of Texas at Austin robertomm@cs.utexas.edu |
| Pseudocode | Yes | Algorithm 1 Training of ELDEN (on-policy) |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating that the source code for ELDEN is publicly available. |
| Open Datasets | Yes | We evaluate ELDEN in four simulated environments with different objects that have complex and chained dependencies: (1) CARWASH, (2) THAWING, (4) 2D MINECRAFT and (3) KITCHEN. Both CARWASH and THAWING are long-horizon household tasks in discrete gridworld from the Mini-BEHAVIOR Benchmark [12]. MINECRAFT 2D is an environment modified from the one used by Andreas et al. [1], where the agent needs to master a complex technology tree to finish the task. KITCHEN is a continuous robot table-top manipulation domain implemented in Robo Suite [36]. |
| Dataset Splits | No | The paper mentions training dynamics models on 'pre-collected transition data' and evaluating them on 'unseen episodes', but it does not specify explicit training, validation, and test dataset splits with percentages or sample counts for the main RL task. |
| Hardware Specification | Yes | The experiments were conducted on machines of the following configurations: Nvidia 2080 Ti GPU; AMD Ryzen Threadripper 3970X 32-Core Processor Nvidia A40 GPU; Intel(R) Xeon(R) Gold 6342 CPU @2.80GHz Nvidia A100 GPU; Intel(R) Xeon(R) Gold 6342 CPU @2.80GHz |
| Software Dependencies | No | The paper mentions software components like PPO, Adam optimizer, and specific environments (e.g., Mini-BEHAVIOR, Robo Suite), but it does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | The hyperparameters used for evaluating local dependency detection of each method are provided in Table 3. Unless specified otherwise, the parameters are shared across all environments. During policy learning, all methods share the same PPO and training hyperparameters, provided in Table 4. |