Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

NeoRL: Efficient Exploration for Nonepisodic RL

Authors: Bhavya , Lenart Treven, Florian Dorfler, Stelian Coros, Andreas Krause

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate NEORL on the Pendulum-v1 and Mountain Car environment from the Open AI gym benchmark suite (Brockman et al., 2016), Cartpole, Reacher, and Swimmer from the Deep Mind control suite (Tassa et al., 2018), the racecar simulator from Kabzan et al. (2020), and a soft robotic arm from Tekinalp et al. (2024). ... In Figure 1 we report the normalized average cost and cumulative regret of NEORL, NEMEAN, NEPETS, and NETS.
Researcher Affiliation Academia Bhavya Sukhija , Lenart Treven, Florian Dรถrfler, Stelian Coros, Andreas Krause ETH Zurich, Switzerland
Pseudocode Yes Algorithm 1 NEORL: NONEPISODIC OPTIMISTIC RL; Algorithm 2 Practical NEORL:
Open Source Code Yes The code for our experiments is available online.2 https://github.com/lasgroup/opax/tree/neorl
Open Datasets Yes We evaluate NEORL on the Pendulum-v1 and Mountain Car environment from the Open AI gym benchmark suite (Brockman et al., 2016), Cartpole, Reacher, and Swimmer from the Deep Mind control suite (Tassa et al., 2018), the racecar simulator from Kabzan et al. (2020), and a soft robotic arm from Tekinalp et al. (2024).
Dataset Splits No The paper discusses training parameters like 'Batch size' for model dynamics but does not explicitly provide information on validation dataset splits.
Hardware Specification Yes All our experiments within 1-8 hours3 on a GPU (NVIDIA Ge Force RTX 2080 Ti).
Software Dependencies No The paper mentions software components like 'Open AI gym benchmark suite', 'Deep Mind control suite', and 'i CEM optimizer', but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Table 2: Hyperparameters for results in Section 4. Environment i CEM parameters Model training parameters Number of samples Number of elites Optimizer steps HMPC Particles Number of ensembles Network architecture Learning rate Batch size Number of epochs H Action Repeat