Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation
Authors: Nicklas Hansen, Hao Su, Xiaolong Wang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive empirical evaluation of image-based RL using both Conv Nets and Vision Transformers (Vi T) on a family of benchmarks based on Deep Mind Control Suite, as well as in robotic manipulation tasks. Our method greatly improves stability and sample efficiency of Conv Nets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL in environments with unseen visuals. |
| Researcher Affiliation | Academia | Nicklas Hansen1 Hao Su1 Xiaolong Wang1 1University of California, San Diego nihansen@ucsd.edu {haosu,xiw012}@eng.ucsd.edu |
| Pseudocode | Yes | Algorithm 1 Generic SVEA off-policy algorithm (I naïve augmentation, I our modifications) |
| Open Source Code | Yes | Website and code is available at: https://nicklashansen.github.io/SVEA. |
| Open Datasets | Yes | We perform extensive empirical evaluation on the Deep Mind Control Suite [64] and extensions of it, including the DMControl Generalization Benchmark [21] and the Distracting Control Suite [60], as well as a set of robotic manipulation tasks. |
| Dataset Splits | No | The paper states that methods are "trained for 500k frames and evaluated on all 5 tasks from DMControl-GB", but it does not specify explicit train/validation/test data splits (e.g., percentages or counts) within these benchmarks or for their custom robotic manipulation tasks. |
| Hardware Specification | No | The paper mentions running experiments and computational costs but does not provide specific details about the hardware used (e.g., CPU or GPU models, memory). |
| Software Dependencies | No | The paper states, "We use Adam [32] as our optimizer, with a learning rate of 1e-3, beta=(0.9, 0.999), and no weight decay." However, it does not specify the versions of the software frameworks (e.g., PyTorch, TensorFlow) or specific libraries used. |
| Experiment Setup | Yes | We use a batch size of 256 for ConvNet experiments and 128 for ViT experiments. We use Adam [32] as our optimizer, with a learning rate of 1e-3, beta=(0.9, 0.999), and no weight decay. |