reproducibilityindex.ai

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

Authors: Nicklas Hansen, Hao Su, Xiaolong Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive empirical evaluation of image-based RL using both Conv Nets and Vision Transformers (Vi T) on a family of benchmarks based on Deep Mind Control Suite, as well as in robotic manipulation tasks. Our method greatly improves stability and sample efﬁciency of Conv Nets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL in environments with unseen visuals.
Researcher Affiliation	Academia	Nicklas Hansen1 Hao Su1 Xiaolong Wang1 1University of California, San Diego nihansen@ucsd.edu {haosu,xiw012}@eng.ucsd.edu
Pseudocode	Yes	Algorithm 1 Generic SVEA off-policy algorithm (I naïve augmentation, I our modiﬁcations)
Open Source Code	Yes	Website and code is available at: https://nicklashansen.github.io/SVEA.
Open Datasets	Yes	We perform extensive empirical evaluation on the Deep Mind Control Suite [64] and extensions of it, including the DMControl Generalization Benchmark [21] and the Distracting Control Suite [60], as well as a set of robotic manipulation tasks.
Dataset Splits	No	The paper states that methods are "trained for 500k frames and evaluated on all 5 tasks from DMControl-GB", but it does not specify explicit train/validation/test data splits (e.g., percentages or counts) within these benchmarks or for their custom robotic manipulation tasks.
Hardware Specification	No	The paper mentions running experiments and computational costs but does not provide specific details about the hardware used (e.g., CPU or GPU models, memory).
Software Dependencies	No	The paper states, "We use Adam [32] as our optimizer, with a learning rate of 1e-3, beta=(0.9, 0.999), and no weight decay." However, it does not specify the versions of the software frameworks (e.g., PyTorch, TensorFlow) or specific libraries used.
Experiment Setup	Yes	We use a batch size of 256 for ConvNet experiments and 128 for ViT experiments. We use Adam [32] as our optimizer, with a learning rate of 1e-3, beta=(0.9, 0.999), and no weight decay.