reproducibilityindex.ai

RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation

Authors: Zelei Cheng, Xian Wu, Jiahao Yu, Sabrina Yang, Gang Wang, Xinyu Xing

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate RICE in various popular RL environments and real-world applications. The results demonstrate that RICE significantly outperforms existing refining schemes in enhancing agent performance.
Researcher Affiliation	Academia	1Department of Computer Science, Northwestern University, Evanston, Illinois, USA 2Presentation High School, San Jose, California, USA 3Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
Pseudocode	Yes	Algorithm 1 Training the Mask Network. Algorithm 2 Refining the DRL Agent.
Open Source Code	Yes	1The source code of RICE can be found in https://github.com/chengzelei/RICE
Open Datasets	Yes	We evaluate the performance of RICE using four Mu Jo Co games and four DRLbased real-world applications, including cryptocurrency mining (Bar-Zur et al., 2023), autonomous cyber defense (Cage Challenge 2) (CAGE, 2022), autonomous driving (Li et al., 2022), and malware mutation (Raff et al., 2017).
Dataset Splits	No	The paper describes training and testing procedures and dataset usage for experiments, but it does not specify explicit validation dataset splits (e.g., percentages or sample counts for validation sets) to reproduce the data partitioning.
Hardware Specification	Yes	We train the agents on a server with 8 NVIDIA A100 GPUs for all the learning algorithms.
Software Dependencies	Yes	We implement the proposed method using Py Torch (Paszke et al., 2019). We implement our method in four selected Mu Jo Co games based on Stable-Baselines3 (Raffin et al., 2021).
Experiment Setup	Yes	Table 3. Hyper-parameter choices in Experiment I-V. Selfish represents Selfish Mining. Cage represents Cage Challenge 2. Auto represents Autonomous Driving. Malware represents Malware Mutation. Hyper-parameter p 0.25 0.25 0.50 0.50 0.25 0.50 0.25 0.50 λ 0.001 0.01 0.001 0.01 0.001 0.01 0.01 0.01 α 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001