Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

Authors: Nicklas Hansen, Hao Su, Xiaolong Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive empirical evaluation of image-based RL using both Conv Nets and Vision Transformers (Vi T) on a family of benchmarks based on Deep Mind Control Suite, as well as in robotic manipulation tasks. Our method greatly improves stability and sample efficiency of Conv Nets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL in environments with unseen visuals.
Researcher Affiliation Academia Nicklas Hansen1 Hao Su1 Xiaolong Wang1 1University of California, San Diego nihansen@ucsd.edu {haosu,xiw012}@eng.ucsd.edu
Pseudocode Yes Algorithm 1 Generic SVEA off-policy algorithm (I naïve augmentation, I our modifications)
Open Source Code Yes Website and code is available at: https://nicklashansen.github.io/SVEA.
Open Datasets Yes We perform extensive empirical evaluation on the Deep Mind Control Suite [64] and extensions of it, including the DMControl Generalization Benchmark [21] and the Distracting Control Suite [60], as well as a set of robotic manipulation tasks.
Dataset Splits No The paper states that methods are "trained for 500k frames and evaluated on all 5 tasks from DMControl-GB", but it does not specify explicit train/validation/test data splits (e.g., percentages or counts) within these benchmarks or for their custom robotic manipulation tasks.
Hardware Specification No The paper mentions running experiments and computational costs but does not provide specific details about the hardware used (e.g., CPU or GPU models, memory).
Software Dependencies No The paper states, "We use Adam [32] as our optimizer, with a learning rate of 1e-3, beta=(0.9, 0.999), and no weight decay." However, it does not specify the versions of the software frameworks (e.g., PyTorch, TensorFlow) or specific libraries used.
Experiment Setup Yes We use a batch size of 256 for ConvNet experiments and 128 for ViT experiments. We use Adam [32] as our optimizer, with a learning rate of 1e-3, beta=(0.9, 0.999), and no weight decay.