Robust Deep Reinforcement Learning through Bootstrapped Opportunistic Curriculum

Authors: Junlin Wu, Yevgeniy Vorobeychik

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the efficacy of the proposed BCL framework in boosting robustness of DQN-style approaches with minimal reduction in nominal (non-adversarial) reward through extensive experiments on the Pong, Freeway, Bank Heist, and Road Runner Open AI domains.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO, USA.
Pseudocode Yes Algorithm 1 BCL algorithm. Input: ϵ, K, Kmin, V (ϵ), fθ0. fθ fθ0 // Initialization {ϵi}L i=1 Curriculum(ϵ) // Create curriculum (i, ϵbest) Choose Next(fθ, {ϵi}, 0, V (ϵ)) while ϵbest < ϵ do for k = 1, . . . , K do fθk Train(fθ, ϵi) Vk Eval(fθk, ϵi) if k Kmin and Vk V (ϵi) then break end if end for // Find the best model among training results k arg maxk [K] Vk fθ fθk (i, ϵbest) Choose Next(fθ, {ϵi}, i, V (ϵ)) end while return fθ
Open Source Code Yes Our code is available at: https://github.com/jlwu002/BCL.
Open Datasets Yes We evaluate the proposed approach using four Atari-2600 games from the Open AI Gym (Bellemare et al., 2013): Pong, Freeway, Bank Heist, and Road Runner.
Dataset Splits No The paper mentions training for "4.5 million frames" and evaluating over "20 test episodes", but does not explicitly specify a distinct validation dataset split or its size/percentage for model tuning during training.
Hardware Specification Yes Each adversarial training run takes 10 hours for Fruit Bot (with RI-FGSM), and 34 hours for Jumper (with 10-step PGD) on a single Ge Force RTX 2080Ti GPU.
Software Dependencies No The paper mentions software components like DQN, PPO, and ADAM OPTIMIZER, but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Table 1. DQN specific hyperparameters (AT runs) PARAMETER VALUE DISCOUNT FACTOR (γ) 0.99 BUFFER SIZE 50000 REPLAY INITIAL 256 BATCH SIZE 128 OPTIMIZER ADAM OPTIMIZER LEARNING RATE 0.000125