Steady State Analysis of Episodic Reinforcement Learning

Authors: Huang Bojun

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, the paper also proposes and experimentally validates a perturbation method that facilitates rapid steady-state convergence in real-world RL tasks.
Researcher Affiliation Industry Huang Bojun Rakuten Institute of Technology, Tokyo, Japan bojhuang@gmail.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It provides mathematical derivations and proofs but no algorithm listings.
Open Source Code No The paper includes no explicit statement or link indicating that its own source code for the described methodology is open-source or available. It cites external benchmarks like PyBullet Gymperium, but not its own implementation.
Open Datasets Yes We examined the impact of the changing AEL to the quality of policy gradient estimation in the Hopper environment of the Robo School Bullet benchmark [1]. ... [1] Pybullet gymperium. URL https://github.com/benelot/pybullet-gym.
Dataset Splits No The paper describes a reinforcement learning setup where data is generated through continuous interaction with an environment, rather than using pre-defined static dataset splits for training, validation, and testing. Therefore, no explicit validation dataset splits are mentioned.
Hardware Specification No The paper mentions running experiments but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory specifications) used for these experiments.
Software Dependencies No The paper mentions using 'Robo School Bullet benchmark [1]' but does not provide specific version numbers for this or any other software dependencies, libraries, or programming languages used in the experiments.
Experiment Setup Yes For SSPG with AEL: learning rate = 0.001, beta1=0.9, beta2=0.999, epsilon=1e-8. For SSPG w/o AEL: learning rate = 0.01 (10x faster)... used data from the single step at t = 3 AEL across all rollouts to make each policy update.