Random Latent Exploration for Deep Reinforcement Learning
Authors: Srinath V. Mahankali, Zhang-Wei Hong, Ayush Sekhari, Alexander Rakhlin, Pulkit Agrawal
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the practical effectiveness of RLE, we evaluate it on the challenging ATARI and ISAACGYM benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches, including action-noise and randomized value function exploration. |
| Researcher Affiliation | Academia | 1Massachusetts Institute of Technology. Correspondence to: Srinath Mahankali <srinathm@mit.edu>. |
| Pseudocode | Yes | Algorithm 1 Random Latent Exploration (RLE). Algorithm 2 Detailed Pseudocode for Random Latent Exploration (RLE). |
| Open Source Code | Yes | Project website: https://srinathm1359.github. io/random-latent-exploration |
| Open Datasets | Yes | We evaluate our method in the well-known ATARI benchmark (Bellemare et al., 2013). ... ISAACGYM a popular continuous control deep RL benchmark (Makoviychuk et al., 2021). ... toy experiments on the the FOURROOM environment (Sutton et al., 1999). |
| Dataset Splits | No | The paper specifies training durations ('train each agent with 5 different random seeds for 40 million frames') and uses aggregate metrics like Interquartile Mean (IQM) and Probability of Improvement (POI) for evaluation, with confidence intervals estimated using bootstrapping. However, it does not describe explicit 'training/test/validation dataset splits' in the traditional sense, as is common in supervised learning. For reinforcement learning benchmarks, the entire environment is typically used for training and evaluation without a distinct validation 'dataset' split. |
| Hardware Specification | No | The paper mentions 'MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing HPC resources' but does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for the experiments. |
| Software Dependencies | No | The paper states, 'We implement our method on top of the popular RL algorithm, Proximal Policy Optimization (PPO) (Schulman et al., 2017)' and mentions using the 'cleanrl codebase (Huang et al., 2022)'. However, it does not provide specific version numbers for PPO, Clean RL, Python, PyTorch, or any other software libraries or dependencies, which would be necessary for full reproducibility. |
| Experiment Setup | Yes | The hyperparameters and implementation details of all the algorithms and PPO are deferred to Appendix B. ... Table 2. Hyperparameters and network architectures for FOURROOM experiments. ... Table 3. Hyperparameters and network architectures for ATARI experiments. ... Table 4. Hyperparameters and network architectures for Isaac Gym experiments. |