RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation

Authors: Zelei Cheng, Xian Wu, Jiahao Yu, Sabrina Yang, Gang Wang, Xinyu Xing

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate RICE in various popular RL environments and real-world applications. The results demonstrate that RICE significantly outperforms existing refining schemes in enhancing agent performance.
Researcher Affiliation Academia 1Department of Computer Science, Northwestern University, Evanston, Illinois, USA 2Presentation High School, San Jose, California, USA 3Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
Pseudocode Yes Algorithm 1 Training the Mask Network. Algorithm 2 Refining the DRL Agent.
Open Source Code Yes 1The source code of RICE can be found in https://github.com/chengzelei/RICE
Open Datasets Yes We evaluate the performance of RICE using four Mu Jo Co games and four DRLbased real-world applications, including cryptocurrency mining (Bar-Zur et al., 2023), autonomous cyber defense (Cage Challenge 2) (CAGE, 2022), autonomous driving (Li et al., 2022), and malware mutation (Raff et al., 2017).
Dataset Splits No The paper describes training and testing procedures and dataset usage for experiments, but it does not specify explicit validation dataset splits (e.g., percentages or sample counts for validation sets) to reproduce the data partitioning.
Hardware Specification Yes We train the agents on a server with 8 NVIDIA A100 GPUs for all the learning algorithms.
Software Dependencies Yes We implement the proposed method using Py Torch (Paszke et al., 2019). We implement our method in four selected Mu Jo Co games based on Stable-Baselines3 (Raffin et al., 2021).
Experiment Setup Yes Table 3. Hyper-parameter choices in Experiment I-V. Selfish represents Selfish Mining. Cage represents Cage Challenge 2. Auto represents Autonomous Driving. Malware represents Malware Mutation. Hyper-parameter p 0.25 0.25 0.50 0.50 0.25 0.50 0.25 0.50 λ 0.001 0.01 0.001 0.01 0.001 0.01 0.01 0.01 α 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001