NeoRL: Efficient Exploration for Nonepisodic RL

Authors: Bhavya , Lenart Treven, Florian Dorfler, Stelian Coros, Andreas Krause

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate NEORL on the Pendulum-v1 and Mountain Car environment from the Open AI gym benchmark suite (Brockman et al., 2016), Cartpole, Reacher, and Swimmer from the Deep Mind control suite (Tassa et al., 2018), the racecar simulator from Kabzan et al. (2020), and a soft robotic arm from Tekinalp et al. (2024). ... In Figure 1 we report the normalized average cost and cumulative regret of NEORL, NEMEAN, NEPETS, and NETS.
Researcher Affiliation Academia Bhavya Sukhija , Lenart Treven, Florian Dörfler, Stelian Coros, Andreas Krause ETH Zurich, Switzerland
Pseudocode Yes Algorithm 1 NEORL: NONEPISODIC OPTIMISTIC RL; Algorithm 2 Practical NEORL:
Open Source Code Yes The code for our experiments is available online.2 https://github.com/lasgroup/opax/tree/neorl
Open Datasets Yes We evaluate NEORL on the Pendulum-v1 and Mountain Car environment from the Open AI gym benchmark suite (Brockman et al., 2016), Cartpole, Reacher, and Swimmer from the Deep Mind control suite (Tassa et al., 2018), the racecar simulator from Kabzan et al. (2020), and a soft robotic arm from Tekinalp et al. (2024).
Dataset Splits No The paper discusses training parameters like 'Batch size' for model dynamics but does not explicitly provide information on validation dataset splits.
Hardware Specification Yes All our experiments within 1-8 hours3 on a GPU (NVIDIA Ge Force RTX 2080 Ti).
Software Dependencies No The paper mentions software components like 'Open AI gym benchmark suite', 'Deep Mind control suite', and 'i CEM optimizer', but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Table 2: Hyperparameters for results in Section 4. Environment i CEM parameters Model training parameters Number of samples Number of elites Optimizer steps HMPC Particles Number of ensembles Network architecture Learning rate Batch size Number of epochs H Action Repeat