Anytime-Competitive Reinforcement Learning with Policy Prior
Authors: Jianyi Yang, Pengfei Li, Tongxin Li, Adam Wierman, Shaolei Ren
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the application of carbonintelligent computing verify the reward performance and cost constraint guarantee of ACRL. We experiment with the application of resource management for carbon-aware computing [49] to empirically show the benefits of ACRL. Figure 1(a) gives the regret changing the in first 500 episodes. Figure 1(b) shows the regret with different λ and b, demonstrating the trade-off between reward optimization and the satisfaction of anytime competitive constraints. Figure 1(c) shows the probability of the violation of the anytime competitive constraints by RL and constrained RL. |
| Researcher Affiliation | Academia | Jianyi Yang UC Riverside Riverside, CA, USA jyang239@ucr.edu Pengfei Li UC Riverside Riverside, CA, USA pli081@ucr.edu Tongxin Li CUHK Shenzhen Shenzhen, Guangdong, China litongxin@cuhk.edu.cn Adam Wierman Caltech Pasadena, CA, USA adamw@caltech.edu Shaolei Ren UC Riverside Riverside, CA, USA shaolei@ucr.edu |
| Pseudocode | Yes | Algorithm 1 Anytime-Competitive Decision-making (ACD) Algorithm 2 Anytime-Competitive Reinforcement Learning (ACRL) |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for its methodology. |
| Open Datasets | Yes | We experiment with the application of resource management for carbon-aware computing [49]... The concrete settings can be found in Appendix A. Appendix A: Experiment Setup ...electricity price data from California ISO [47] and carbon intensity data provided by WattTime. [47] California Independent System Operator. Calfornia renewable datasets. https://www.caiso.com/Pages/default.aspx, 2023. |
| Dataset Splits | No | The paper mentions using datasets for experiments but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper discusses applications in cloud workload scheduling and datacenters, implying computational resources. However, it does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper describes its algorithms but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The concrete settings can be found in Appendix A. Appendix A: Experiment Setup... The time horizon is set to H = 24. For a time step h, the state xh = (carbon price, electricity price). The action ah represents the workload scheduling decision... The reward rh is the sum of revenues... The cost ch is the computing latency. ...The workload processing rate is set to 2... |