Steady State Analysis of Episodic Reinforcement Learning
Authors: Huang Bojun
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, the paper also proposes and experimentally validates a perturbation method that facilitates rapid steady-state convergence in real-world RL tasks. |
| Researcher Affiliation | Industry | Huang Bojun Rakuten Institute of Technology, Tokyo, Japan bojhuang@gmail.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It provides mathematical derivations and proofs but no algorithm listings. |
| Open Source Code | No | The paper includes no explicit statement or link indicating that its own source code for the described methodology is open-source or available. It cites external benchmarks like PyBullet Gymperium, but not its own implementation. |
| Open Datasets | Yes | We examined the impact of the changing AEL to the quality of policy gradient estimation in the Hopper environment of the Robo School Bullet benchmark [1]. ... [1] Pybullet gymperium. URL https://github.com/benelot/pybullet-gym. |
| Dataset Splits | No | The paper describes a reinforcement learning setup where data is generated through continuous interaction with an environment, rather than using pre-defined static dataset splits for training, validation, and testing. Therefore, no explicit validation dataset splits are mentioned. |
| Hardware Specification | No | The paper mentions running experiments but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory specifications) used for these experiments. |
| Software Dependencies | No | The paper mentions using 'Robo School Bullet benchmark [1]' but does not provide specific version numbers for this or any other software dependencies, libraries, or programming languages used in the experiments. |
| Experiment Setup | Yes | For SSPG with AEL: learning rate = 0.001, beta1=0.9, beta2=0.999, epsilon=1e-8. For SSPG w/o AEL: learning rate = 0.01 (10x faster)... used data from the single step at t = 3 AEL across all rollouts to make each policy update. |