Understanding the Evolution of Linear Regions in Deep Reinforcement Learning
Authors: Setareh Cohan, Nam Hee Kim, David Rolnick, Michiel van de Panne
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We seek to understand how observed region counts and their densities evolve during deep reinforcement learning using empirical results that span a range of continuous control tasks and policy network dimensions. |
| Researcher Affiliation | Academia | Setareh Cohan Department of Computer Science University of British Columbia setarehc@cs.ubc.ca Nam Hee Kim Department of Computer Science Aalto University namhee.kim@aalto.fi David Rolnick School of Computer Science Mc Gill University drolnick@cs.mcgill.ca Michiel van de Panne Department of Computer Science University of British Columbia van@cs.ubc.ca |
| Pseudocode | No | The paper describes the region counting method in prose but does not include a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Our code is available at https://github.com/setarehc/deep_rl_regions. |
| Open Datasets | Yes | We conduct our experiments on four continuous control tasks including Half Cheetah-v2, Walker-v2, Ant-v2, and Swimmer-v2 environments from the Open AI gym benchmark suits [Brockman et al., 2016]. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits with percentages, sample counts, or references to predefined static splits, as data is generated through interaction with RL environments. |
| Hardware Specification | No | The paper mentions 'computational resources provided by Compute Canada' and states that 'Our experiments require modest compute resources,' but it does not provide specific hardware details like GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions using 'Stable-Baselines3 implementations of the PPO algorithm' but does not provide specific version numbers for Stable-Baselines3 or other software dependencies. |
| Experiment Setup | Yes | We train 18 policy network configurations with N {32, 48, 64, 96, 128, 192} neurons, widths w {8, 16, 32, 64}, and depths d {1, 2, 3, 4}. We use a fixed value function network structure of (64, 64) in all of our experiments. We adopt the network initialization and hyperparameters of PPO from Stable Baselines3 [Raffin et al., 2021] and train our policy networks on 2M samples (i.e. 2M timesteps in the environment). |