reproducibilityindex.ai

NeoRL: Efficient Exploration for Nonepisodic RL

Authors: Bhavya , Lenart Treven, Florian Dorfler, Stelian Coros, Andreas Krause

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate NEORL on the Pendulum-v1 and Mountain Car environment from the Open AI gym benchmark suite (Brockman et al., 2016), Cartpole, Reacher, and Swimmer from the Deep Mind control suite (Tassa et al., 2018), the racecar simulator from Kabzan et al. (2020), and a soft robotic arm from Tekinalp et al. (2024). ... In Figure 1 we report the normalized average cost and cumulative regret of NEORL, NEMEAN, NEPETS, and NETS.
Researcher Affiliation	Academia	Bhavya Sukhija , Lenart Treven, Florian Dörfler, Stelian Coros, Andreas Krause ETH Zurich, Switzerland
Pseudocode	Yes	Algorithm 1 NEORL: NONEPISODIC OPTIMISTIC RL; Algorithm 2 Practical NEORL:
Open Source Code	Yes	The code for our experiments is available online.2 https://github.com/lasgroup/opax/tree/neorl
Open Datasets	Yes	We evaluate NEORL on the Pendulum-v1 and Mountain Car environment from the Open AI gym benchmark suite (Brockman et al., 2016), Cartpole, Reacher, and Swimmer from the Deep Mind control suite (Tassa et al., 2018), the racecar simulator from Kabzan et al. (2020), and a soft robotic arm from Tekinalp et al. (2024).
Dataset Splits	No	The paper discusses training parameters like 'Batch size' for model dynamics but does not explicitly provide information on validation dataset splits.
Hardware Specification	Yes	All our experiments within 1-8 hours3 on a GPU (NVIDIA Ge Force RTX 2080 Ti).
Software Dependencies	No	The paper mentions software components like 'Open AI gym benchmark suite', 'Deep Mind control suite', and 'i CEM optimizer', but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Table 2: Hyperparameters for results in Section 4. Environment i CEM parameters Model training parameters Number of samples Number of elites Optimizer steps HMPC Particles Number of ensembles Network architecture Learning rate Batch size Number of epochs H Action Repeat