Robust Deep Reinforcement Learning through Bootstrapped Opportunistic Curriculum
Authors: Junlin Wu, Yevgeniy Vorobeychik
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the efficacy of the proposed BCL framework in boosting robustness of DQN-style approaches with minimal reduction in nominal (non-adversarial) reward through extensive experiments on the Pong, Freeway, Bank Heist, and Road Runner Open AI domains. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO, USA. |
| Pseudocode | Yes | Algorithm 1 BCL algorithm. Input: ϵ, K, Kmin, V (ϵ), fθ0. fθ fθ0 // Initialization {ϵi}L i=1 Curriculum(ϵ) // Create curriculum (i, ϵbest) Choose Next(fθ, {ϵi}, 0, V (ϵ)) while ϵbest < ϵ do for k = 1, . . . , K do fθk Train(fθ, ϵi) Vk Eval(fθk, ϵi) if k Kmin and Vk V (ϵi) then break end if end for // Find the best model among training results k arg maxk [K] Vk fθ fθk (i, ϵbest) Choose Next(fθ, {ϵi}, i, V (ϵ)) end while return fθ |
| Open Source Code | Yes | Our code is available at: https://github.com/jlwu002/BCL. |
| Open Datasets | Yes | We evaluate the proposed approach using four Atari-2600 games from the Open AI Gym (Bellemare et al., 2013): Pong, Freeway, Bank Heist, and Road Runner. |
| Dataset Splits | No | The paper mentions training for "4.5 million frames" and evaluating over "20 test episodes", but does not explicitly specify a distinct validation dataset split or its size/percentage for model tuning during training. |
| Hardware Specification | Yes | Each adversarial training run takes 10 hours for Fruit Bot (with RI-FGSM), and 34 hours for Jumper (with 10-step PGD) on a single Ge Force RTX 2080Ti GPU. |
| Software Dependencies | No | The paper mentions software components like DQN, PPO, and ADAM OPTIMIZER, but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Table 1. DQN specific hyperparameters (AT runs) PARAMETER VALUE DISCOUNT FACTOR (γ) 0.99 BUFFER SIZE 50000 REPLAY INITIAL 256 BATCH SIZE 128 OPTIMIZER ADAM OPTIMIZER LEARNING RATE 0.000125 |