reproducibilityindex.ai

Tempo Adaptation in Non-stationary Reinforcement Learning

Authors: Hyunin Lee, Yuhao Ding, Jongmin Lee, Ming Jin, Javad Lavaei, Somayeh Sojoudi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental evaluation on various high-dimensional nonstationary environments shows that the Pro ST framework achieves a higher online return at suboptimal {t}1 K than the existing methods. We evaluate Pro ST-G with four baselines in three Mujoco environments each with five different non-stationary speeds and two non-stationary datasets.
Researcher Affiliation	Academia	1UC Berkeley, Berkeley, CA 94709 2Virginia Tech, Blacksburg, VA 24061
Pseudocode	Yes	We elaborate on the above procedure in Algorithm 1 given in Appendix F.1.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	No	The paper mentions using “three Mujoco environments” and “real data A and B” but does not provide specific links, DOIs, repository names, or formal citations with authors/years for public access to these environments or datasets.
Dataset Splits	No	The paper does not explicitly provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning into training, validation, and test sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using the “soft actor-critic (SAC) algorithm [23]” but does not specify its version or any other software dependencies with version numbers.
Experiment Setup	Yes	The experiments are performed over five different policy training times π {1,2,3,4,5}, aligned with SAC s number of gradient steps G {38,76,114,152,190}, under a fixed environment speed. We generate ok = sin(2π πk/37), which satisfies Assumption 1 (see Appendix E.1). The shaded areas of Figures 3 (a), (b) and (c) are 95 % confidence area among three different noise bounds of 0.01,0.02 and 0.03 in ok. We elaborate on the training details in Appendix E.2.