Anytime-Competitive Reinforcement Learning with Policy Prior

Authors: Jianyi Yang, Pengfei Li, Tongxin Li, Adam Wierman, Shaolei Ren

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the application of carbonintelligent computing verify the reward performance and cost constraint guarantee of ACRL. We experiment with the application of resource management for carbon-aware computing [49] to empirically show the benefits of ACRL. Figure 1(a) gives the regret changing the in first 500 episodes. Figure 1(b) shows the regret with different λ and b, demonstrating the trade-off between reward optimization and the satisfaction of anytime competitive constraints. Figure 1(c) shows the probability of the violation of the anytime competitive constraints by RL and constrained RL.
Researcher Affiliation Academia Jianyi Yang UC Riverside Riverside, CA, USA jyang239@ucr.edu Pengfei Li UC Riverside Riverside, CA, USA pli081@ucr.edu Tongxin Li CUHK Shenzhen Shenzhen, Guangdong, China litongxin@cuhk.edu.cn Adam Wierman Caltech Pasadena, CA, USA adamw@caltech.edu Shaolei Ren UC Riverside Riverside, CA, USA shaolei@ucr.edu
Pseudocode Yes Algorithm 1 Anytime-Competitive Decision-making (ACD) Algorithm 2 Anytime-Competitive Reinforcement Learning (ACRL)
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository for its methodology.
Open Datasets Yes We experiment with the application of resource management for carbon-aware computing [49]... The concrete settings can be found in Appendix A. Appendix A: Experiment Setup ...electricity price data from California ISO [47] and carbon intensity data provided by WattTime. [47] California Independent System Operator. Calfornia renewable datasets. https://www.caiso.com/Pages/default.aspx, 2023.
Dataset Splits No The paper mentions using datasets for experiments but does not specify any training, validation, or test dataset splits.
Hardware Specification No The paper discusses applications in cloud workload scheduling and datacenters, implying computational resources. However, it does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper describes its algorithms but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The concrete settings can be found in Appendix A. Appendix A: Experiment Setup... The time horizon is set to H = 24. For a time step h, the state xh = (carbon price, electricity price). The action ah represents the workload scheduling decision... The reward rh is the sum of revenues... The cost ch is the computing latency. ...The workload processing rate is set to 2...