RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation
Authors: Zelei Cheng, Xian Wu, Jiahao Yu, Sabrina Yang, Gang Wang, Xinyu Xing
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate RICE in various popular RL environments and real-world applications. The results demonstrate that RICE significantly outperforms existing refining schemes in enhancing agent performance. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Northwestern University, Evanston, Illinois, USA 2Presentation High School, San Jose, California, USA 3Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA. |
| Pseudocode | Yes | Algorithm 1 Training the Mask Network. Algorithm 2 Refining the DRL Agent. |
| Open Source Code | Yes | 1The source code of RICE can be found in https://github.com/chengzelei/RICE |
| Open Datasets | Yes | We evaluate the performance of RICE using four Mu Jo Co games and four DRLbased real-world applications, including cryptocurrency mining (Bar-Zur et al., 2023), autonomous cyber defense (Cage Challenge 2) (CAGE, 2022), autonomous driving (Li et al., 2022), and malware mutation (Raff et al., 2017). |
| Dataset Splits | No | The paper describes training and testing procedures and dataset usage for experiments, but it does not specify explicit validation dataset splits (e.g., percentages or sample counts for validation sets) to reproduce the data partitioning. |
| Hardware Specification | Yes | We train the agents on a server with 8 NVIDIA A100 GPUs for all the learning algorithms. |
| Software Dependencies | Yes | We implement the proposed method using Py Torch (Paszke et al., 2019). We implement our method in four selected Mu Jo Co games based on Stable-Baselines3 (Raffin et al., 2021). |
| Experiment Setup | Yes | Table 3. Hyper-parameter choices in Experiment I-V. Selfish represents Selfish Mining. Cage represents Cage Challenge 2. Auto represents Autonomous Driving. Malware represents Malware Mutation. Hyper-parameter p 0.25 0.25 0.50 0.50 0.25 0.50 0.25 0.50 λ 0.001 0.01 0.001 0.01 0.001 0.01 0.01 0.01 α 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 |