reproducibilityindex.ai

Random Latent Exploration for Deep Reinforcement Learning

Authors: Srinath V. Mahankali, Zhang-Wei Hong, Ayush Sekhari, Alexander Rakhlin, Pulkit Agrawal

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the practical effectiveness of RLE, we evaluate it on the challenging ATARI and ISAACGYM benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches, including action-noise and randomized value function exploration.
Researcher Affiliation	Academia	1Massachusetts Institute of Technology. Correspondence to: Srinath Mahankali <srinathm@mit.edu>.
Pseudocode	Yes	Algorithm 1 Random Latent Exploration (RLE). Algorithm 2 Detailed Pseudocode for Random Latent Exploration (RLE).
Open Source Code	Yes	Project website: https://srinathm1359.github. io/random-latent-exploration
Open Datasets	Yes	We evaluate our method in the well-known ATARI benchmark (Bellemare et al., 2013). ... ISAACGYM a popular continuous control deep RL benchmark (Makoviychuk et al., 2021). ... toy experiments on the the FOURROOM environment (Sutton et al., 1999).
Dataset Splits	No	The paper specifies training durations ('train each agent with 5 different random seeds for 40 million frames') and uses aggregate metrics like Interquartile Mean (IQM) and Probability of Improvement (POI) for evaluation, with confidence intervals estimated using bootstrapping. However, it does not describe explicit 'training/test/validation dataset splits' in the traditional sense, as is common in supervised learning. For reinforcement learning benchmarks, the entire environment is typically used for training and evaluation without a distinct validation 'dataset' split.
Hardware Specification	No	The paper mentions 'MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing HPC resources' but does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for the experiments.
Software Dependencies	No	The paper states, 'We implement our method on top of the popular RL algorithm, Proximal Policy Optimization (PPO) (Schulman et al., 2017)' and mentions using the 'cleanrl codebase (Huang et al., 2022)'. However, it does not provide specific version numbers for PPO, Clean RL, Python, PyTorch, or any other software libraries or dependencies, which would be necessary for full reproducibility.
Experiment Setup	Yes	The hyperparameters and implementation details of all the algorithms and PPO are deferred to Appendix B. ... Table 2. Hyperparameters and network architectures for FOURROOM experiments. ... Table 3. Hyperparameters and network architectures for ATARI experiments. ... Table 4. Hyperparameters and network architectures for Isaac Gym experiments.