reproducibilityindex.ai

Understanding the Evolution of Linear Regions in Deep Reinforcement Learning

Authors: Setareh Cohan, Nam Hee Kim, David Rolnick, Michiel van de Panne

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We seek to understand how observed region counts and their densities evolve during deep reinforcement learning using empirical results that span a range of continuous control tasks and policy network dimensions.
Researcher Affiliation	Academia	Setareh Cohan Department of Computer Science University of British Columbia setarehc@cs.ubc.ca Nam Hee Kim Department of Computer Science Aalto University namhee.kim@aalto.fi David Rolnick School of Computer Science Mc Gill University drolnick@cs.mcgill.ca Michiel van de Panne Department of Computer Science University of British Columbia van@cs.ubc.ca
Pseudocode	No	The paper describes the region counting method in prose but does not include a structured pseudocode or algorithm block.
Open Source Code	Yes	Our code is available at https://github.com/setarehc/deep_rl_regions.
Open Datasets	Yes	We conduct our experiments on four continuous control tasks including Half Cheetah-v2, Walker-v2, Ant-v2, and Swimmer-v2 environments from the Open AI gym benchmark suits [Brockman et al., 2016].
Dataset Splits	No	The paper does not provide specific training/validation/test dataset splits with percentages, sample counts, or references to predefined static splits, as data is generated through interaction with RL environments.
Hardware Specification	No	The paper mentions 'computational resources provided by Compute Canada' and states that 'Our experiments require modest compute resources,' but it does not provide specific hardware details like GPU/CPU models or memory.
Software Dependencies	No	The paper mentions using 'Stable-Baselines3 implementations of the PPO algorithm' but does not provide specific version numbers for Stable-Baselines3 or other software dependencies.
Experiment Setup	Yes	We train 18 policy network conﬁgurations with N {32, 48, 64, 96, 128, 192} neurons, widths w {8, 16, 32, 64}, and depths d {1, 2, 3, 4}. We use a ﬁxed value function network structure of (64, 64) in all of our experiments. We adopt the network initialization and hyperparameters of PPO from Stable Baselines3 [Rafﬁn et al., 2021] and train our policy networks on 2M samples (i.e. 2M timesteps in the environment).