NeoRL: Efficient Exploration for Nonepisodic RL
Authors: Bhavya , Lenart Treven, Florian Dorfler, Stelian Coros, Andreas Krause
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate NEORL on the Pendulum-v1 and Mountain Car environment from the Open AI gym benchmark suite (Brockman et al., 2016), Cartpole, Reacher, and Swimmer from the Deep Mind control suite (Tassa et al., 2018), the racecar simulator from Kabzan et al. (2020), and a soft robotic arm from Tekinalp et al. (2024). ... In Figure 1 we report the normalized average cost and cumulative regret of NEORL, NEMEAN, NEPETS, and NETS. |
| Researcher Affiliation | Academia | Bhavya Sukhija , Lenart Treven, Florian Dörfler, Stelian Coros, Andreas Krause ETH Zurich, Switzerland |
| Pseudocode | Yes | Algorithm 1 NEORL: NONEPISODIC OPTIMISTIC RL; Algorithm 2 Practical NEORL: |
| Open Source Code | Yes | The code for our experiments is available online.2 https://github.com/lasgroup/opax/tree/neorl |
| Open Datasets | Yes | We evaluate NEORL on the Pendulum-v1 and Mountain Car environment from the Open AI gym benchmark suite (Brockman et al., 2016), Cartpole, Reacher, and Swimmer from the Deep Mind control suite (Tassa et al., 2018), the racecar simulator from Kabzan et al. (2020), and a soft robotic arm from Tekinalp et al. (2024). |
| Dataset Splits | No | The paper discusses training parameters like 'Batch size' for model dynamics but does not explicitly provide information on validation dataset splits. |
| Hardware Specification | Yes | All our experiments within 1-8 hours3 on a GPU (NVIDIA Ge Force RTX 2080 Ti). |
| Software Dependencies | No | The paper mentions software components like 'Open AI gym benchmark suite', 'Deep Mind control suite', and 'i CEM optimizer', but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Table 2: Hyperparameters for results in Section 4. Environment i CEM parameters Model training parameters Number of samples Number of elites Optimizer steps HMPC Particles Number of ensembles Network architecture Learning rate Batch size Number of epochs H Action Repeat |