Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces
Authors: Brahma S Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct an empirical study on various queuing networks and traffic signal control problems and find that our approach performs competitively against strong baselines with knowledge of the transition dynamics. Our code is available here: https: //github.com/Badger-RL/STOP. |
| Researcher Affiliation | Academia | 1University of Wisconsin Madison, USA. Correspondence to: Brahma S. Pavse <EMAIL>. |
| Pseudocode | Yes | In Appendix B, we include the pseudo-code. |
| Open Source Code | Yes | Our code is available here: https: //github.com/Badger-RL/STOP. |
| Open Datasets | Yes | For the N-model network, the authors state: |
| Dataset Splits | No | The paper focuses on an online reinforcement learning setting, which does not involve explicit train/validation/test dataset splits. Performance is evaluated continuously over interaction time-steps. |
| Hardware Specification | Yes | For all experiments, we used the following compute infrastructure: Distributed cluster on HTCondor framework Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz Disk space: 5GB |
| Software Dependencies | No | The paper mentions using |
| Experiment Setup | Yes | We set the rollout buffer length to 200 and keep all other hyperparameters for STOP and the baseline the same (Huang et al., 2022). |