Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Understanding the Evolution of Linear Regions in Deep Reinforcement Learning
Authors: Setareh Cohan, Nam Hee Kim, David Rolnick, Michiel van de Panne
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We seek to understand how observed region counts and their densities evolve during deep reinforcement learning using empirical results that span a range of continuous control tasks and policy network dimensions. |
| Researcher Affiliation | Academia | Setareh Cohan Department of Computer Science University of British Columbia EMAIL Nam Hee Kim Department of Computer Science Aalto University EMAIL David Rolnick School of Computer Science Mc Gill University EMAIL Michiel van de Panne Department of Computer Science University of British Columbia EMAIL |
| Pseudocode | No | The paper describes the region counting method in prose but does not include a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Our code is available at https://github.com/setarehc/deep_rl_regions. |
| Open Datasets | Yes | We conduct our experiments on four continuous control tasks including Half Cheetah-v2, Walker-v2, Ant-v2, and Swimmer-v2 environments from the Open AI gym benchmark suits [Brockman et al., 2016]. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits with percentages, sample counts, or references to predefined static splits, as data is generated through interaction with RL environments. |
| Hardware Specification | No | The paper mentions 'computational resources provided by Compute Canada' and states that 'Our experiments require modest compute resources,' but it does not provide specific hardware details like GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions using 'Stable-Baselines3 implementations of the PPO algorithm' but does not provide specific version numbers for Stable-Baselines3 or other software dependencies. |
| Experiment Setup | Yes | We train 18 policy network configurations with N {32, 48, 64, 96, 128, 192} neurons, widths w {8, 16, 32, 64}, and depths d {1, 2, 3, 4}. We use a fixed value function network structure of (64, 64) in all of our experiments. We adopt the network initialization and hyperparameters of PPO from Stable Baselines3 [Raffin et al., 2021] and train our policy networks on 2M samples (i.e. 2M timesteps in the environment). |