Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning
Authors: Byungchan Ko, Jungseul Ok
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiment Train and test tasks. We use the Open AI Procgen benchmark of 16 video games [5], where a main character tries to achieve a specific goal, e.g., finding exit (Maze) or collecting coins (Coinrun), while avoiding enemies given a 2D map. At each time t, visual observation ot is given as an image of size 64 64. A train or test task is to achieve a high score on a set of environments configured by game and mode, where a mode describes predefined sets of levels (e.g., complexity of map) and backgrounds. Cobbe et al. [5] provide easy mode for each game, consisting of 200 levels and a certain set of backgrounds. ... All results in the main paper are averaged over five runs. |
| Researcher Affiliation | Collaboration | Byungchan Ko NALBI kbc@nalbi.ai Jungseul Ok GSAI, POSTECH jungseul@postech.ac.kr This work was done while Byungchan Ko studied in GSAI, POSTECH. |
| Pseudocode | Yes | Algorithm 1 In DA Require: N, I, ϕ, S, T 1: Initialize θ close to origin. 2: for n = 1, 2, . . . , N do 3: // RL training 4: Store sampled transitions to D; 5: Optimize RL objective LPPO(θ) with D; 6: // Distillation 7: if n [S, T] and mod(n 1, I) = 0 then 8: Store θold θ; 9: Minimize LDA(θ) for D, θold and ϕ; 10: end if 11: end for |
| Open Source Code | Yes | https://github.com/kbc6723/es-da |
| Open Datasets | Yes | We use the Open AI Procgen benchmark of 16 video games [5] |
| Dataset Splits | No | We simplify easy mode and train agents in easybg mode, of which the only difference from easy mode [5] is showing only a single background. ... Then, we evaluate generalization capabilities using two modes: test-bg and test-lv, which contain unseen backgrounds and levels, respectively, in addition to easybg mode that we use for training. The paper describes training on 'easybg' mode and evaluating on 'test-bg' and 'test-lv' modes, which serve as test sets for generalization. It does not explicitly mention a separate 'validation' data split for hyperparameter tuning or model selection in the traditional supervised learning sense. |
| Hardware Specification | No | The main paper text does not specify hardware details such as GPU/CPU models or specific compute resources used for experiments. It states in the ethics checklist, 'We explain about training time in the supplementary material,' implying these details are not in the main body. |
| Software Dependencies | No | The paper mentions using 'Proximal Policy Optimization (PPO) [27] as a baseline' but does not specify any software versions for PPO, other libraries, or programming languages used. |
| Experiment Setup | No | The paper describes the experimental setup in terms of methods (In DA, Ex DA, UCB-Ex DA), tasks, and augmentations. It refers to hyperparameters like N, I, S, T, M in algorithms and 'c is the UCB exploration coefficient', but explicitly states, 'We refer to the supplementary material for the hyperparameter choice,' indicating that specific numerical values are not provided in the main text. |