Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces
Authors: Brahma S Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct an empirical study on various queuing networks and traffic signal control problems and find that our approach performs competitively against strong baselines with knowledge of the transition dynamics. Our code is available here: https: //github.com/Badger-RL/STOP. |
| Researcher Affiliation | Academia | 1University of Wisconsin Madison, USA. Correspondence to: Brahma S. Pavse <pavse@wisc.edu>. |
| Pseudocode | Yes | In Appendix B, we include the pseudo-code. |
| Open Source Code | Yes | Our code is available here: https: //github.com/Badger-RL/STOP. |
| Open Datasets | Yes | For the N-model network, the authors state: |
| Dataset Splits | No | The paper focuses on an online reinforcement learning setting, which does not involve explicit train/validation/test dataset splits. Performance is evaluated continuously over interaction time-steps. |
| Hardware Specification | Yes | For all experiments, we used the following compute infrastructure: Distributed cluster on HTCondor framework Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz Disk space: 5GB |
| Software Dependencies | No | The paper mentions using |
| Experiment Setup | Yes | We set the rollout buffer length to 200 and keep all other hyperparameters for STOP and the baseline the same (Huang et al., 2022). |