reproducibilityindex.ai

Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces

Authors: Brahma S Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct an empirical study on various queuing networks and traffic signal control problems and find that our approach performs competitively against strong baselines with knowledge of the transition dynamics. Our code is available here: https: //github.com/Badger-RL/STOP.
Researcher Affiliation	Academia	1University of Wisconsin Madison, USA. Correspondence to: Brahma S. Pavse <pavse@wisc.edu>.
Pseudocode	Yes	In Appendix B, we include the pseudo-code.
Open Source Code	Yes	Our code is available here: https: //github.com/Badger-RL/STOP.
Open Datasets	Yes	For the N-model network, the authors state:
Dataset Splits	No	The paper focuses on an online reinforcement learning setting, which does not involve explicit train/validation/test dataset splits. Performance is evaluated continuously over interaction time-steps.
Hardware Specification	Yes	For all experiments, we used the following compute infrastructure: Distributed cluster on HTCondor framework Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz Disk space: 5GB
Software Dependencies	No	The paper mentions using
Experiment Setup	Yes	We set the rollout buffer length to 200 and keep all other hyperparameters for STOP and the baseline the same (Huang et al., 2022).