reproducibilityindex.ai

Robust Deep Reinforcement Learning through Bootstrapped Opportunistic Curriculum

Authors: Junlin Wu, Yevgeniy Vorobeychik

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the efficacy of the proposed BCL framework in boosting robustness of DQN-style approaches with minimal reduction in nominal (non-adversarial) reward through extensive experiments on the Pong, Freeway, Bank Heist, and Road Runner Open AI domains.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO, USA.
Pseudocode	Yes	Algorithm 1 BCL algorithm. Input: ϵ, K, Kmin, V (ϵ), fθ0. fθ fθ0 // Initialization {ϵi}L i=1 Curriculum(ϵ) // Create curriculum (i, ϵbest) Choose Next(fθ, {ϵi}, 0, V (ϵ)) while ϵbest < ϵ do for k = 1, . . . , K do fθk Train(fθ, ϵi) Vk Eval(fθk, ϵi) if k Kmin and Vk V (ϵi) then break end if end for // Find the best model among training results k arg maxk [K] Vk fθ fθk (i, ϵbest) Choose Next(fθ, {ϵi}, i, V (ϵ)) end while return fθ
Open Source Code	Yes	Our code is available at: https://github.com/jlwu002/BCL.
Open Datasets	Yes	We evaluate the proposed approach using four Atari-2600 games from the Open AI Gym (Bellemare et al., 2013): Pong, Freeway, Bank Heist, and Road Runner.
Dataset Splits	No	The paper mentions training for "4.5 million frames" and evaluating over "20 test episodes", but does not explicitly specify a distinct validation dataset split or its size/percentage for model tuning during training.
Hardware Specification	Yes	Each adversarial training run takes 10 hours for Fruit Bot (with RI-FGSM), and 34 hours for Jumper (with 10-step PGD) on a single Ge Force RTX 2080Ti GPU.
Software Dependencies	No	The paper mentions software components like DQN, PPO, and ADAM OPTIMIZER, but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Table 1. DQN specific hyperparameters (AT runs) PARAMETER VALUE DISCOUNT FACTOR (γ) 0.99 BUFFER SIZE 50000 REPLAY INITIAL 256 BATCH SIZE 128 OPTIMIZER ADAM OPTIMIZER LEARNING RATE 0.000125