Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces

Authors: Brahma S Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct an empirical study on various queuing networks and traffic signal control problems and find that our approach performs competitively against strong baselines with knowledge of the transition dynamics. Our code is available here: https: //github.com/Badger-RL/STOP.
Researcher Affiliation Academia 1University of Wisconsin Madison, USA. Correspondence to: Brahma S. Pavse <pavse@wisc.edu>.
Pseudocode Yes In Appendix B, we include the pseudo-code.
Open Source Code Yes Our code is available here: https: //github.com/Badger-RL/STOP.
Open Datasets Yes For the N-model network, the authors state:
Dataset Splits No The paper focuses on an online reinforcement learning setting, which does not involve explicit train/validation/test dataset splits. Performance is evaluated continuously over interaction time-steps.
Hardware Specification Yes For all experiments, we used the following compute infrastructure: Distributed cluster on HTCondor framework Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz Disk space: 5GB
Software Dependencies No The paper mentions using
Experiment Setup Yes We set the rollout buffer length to 200 and keep all other hyperparameters for STOP and the baseline the same (Huang et al., 2022).