Active Exploration for Inverse Reinforcement Learning
Authors: David Lindner, Andreas Krause, Giorgia Ramponi
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate Ace IRL in simulations and find that it significantly outperforms more naive exploration strategies. |
| Researcher Affiliation | Academia | David Lindner Department of Computer Science ETH Zurich david.lindner@inf.ethz.ch Andreas Krause Department of Computer Science ETH Zurich krausea@ethz.ch Giorgia Ramponi ETH AI Center giorgia.ramponi@ai.ethz.ch |
| Pseudocode | Yes | Algorithm 1 Ace IRL algorithm for IRL in an unknown environment. |
| Open Source Code | Yes | We provide code to reproduce out experiments at https://github.com/lasgroup/aceirl. |
| Open Datasets | No | The paper describes using simulated environments (Four Paths, Random MDPs, Double Chain, Chain, Gridworld), some of which are based on prior work (Kaufmann et al., 2021; Metelli et al., 2021). However, it does not provide concrete access information (e.g., links, DOIs, or specific citations to publicly available static datasets) for these simulated environments as 'datasets'. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, or test dataset splits. The experiments are conducted in simulated environments based on sample complexity and episodes, rather than fixed dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper cites external tools like CVXPY and Conic optimization libraries in its references but does not provide specific version numbers for any ancillary software dependencies used in their implementation (e.g., Python, PyTorch, TensorFlow, or specific library versions). |
| Experiment Setup | No | The paper describes the simulated environments and high-level algorithmic components but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or specific training configurations in the main text. |