reproducibilityindex.ai

Steady State Analysis of Episodic Reinforcement Learning

Authors: Huang Bojun

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, the paper also proposes and experimentally validates a perturbation method that facilitates rapid steady-state convergence in real-world RL tasks.
Researcher Affiliation	Industry	Huang Bojun Rakuten Institute of Technology, Tokyo, Japan bojhuang@gmail.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It provides mathematical derivations and proofs but no algorithm listings.
Open Source Code	No	The paper includes no explicit statement or link indicating that its own source code for the described methodology is open-source or available. It cites external benchmarks like PyBullet Gymperium, but not its own implementation.
Open Datasets	Yes	We examined the impact of the changing AEL to the quality of policy gradient estimation in the Hopper environment of the Robo School Bullet benchmark [1]. ... [1] Pybullet gymperium. URL https://github.com/benelot/pybullet-gym.
Dataset Splits	No	The paper describes a reinforcement learning setup where data is generated through continuous interaction with an environment, rather than using pre-defined static dataset splits for training, validation, and testing. Therefore, no explicit validation dataset splits are mentioned.
Hardware Specification	No	The paper mentions running experiments but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory specifications) used for these experiments.
Software Dependencies	No	The paper mentions using 'Robo School Bullet benchmark [1]' but does not provide specific version numbers for this or any other software dependencies, libraries, or programming languages used in the experiments.
Experiment Setup	Yes	For SSPG with AEL: learning rate = 0.001, beta1=0.9, beta2=0.999, epsilon=1e-8. For SSPG w/o AEL: learning rate = 0.01 (10x faster)... used data from the single step at t = 3 AEL across all rollouts to make each policy update.