Behavior From the Void: Unsupervised Active Pre-Training

Authors: Hao Liu, Pieter Abbeel

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate APT by exposing task-specific reward after a long unsupervised pre-training phase. In Atari games, APT achieves human-level performance on 12 games and obtains highly competitive performance compared to canonical fully supervised RL algorithms. On DMControl suite, APT beats all baselines in terms of asymptotic performance and data efficiency and dramatically improves performance on tasks that are extremely difficult to train from scratch.
Researcher Affiliation Academia Hao Liu UC Berkeley hao.liu@cs.berkeley.edu Pieter Abbeel UC Berkeley pabbeel@cs.berkeley.edu
Pseudocode Yes Algorithm 1: Training APT
Open Source Code No The paper does not include an explicit statement about releasing the source code for the described methodology or provide a link to a code repository for APT.
Open Datasets Yes We test APT in Deep Mind Control Suite [DMControl; 58] and the Atari suite [9].
Dataset Splits No The paper mentions 'pre-training phase' and 'testing period' but does not explicitly provide details on how validation data was split or used for hyperparameter tuning. It uses a replay buffer for training but doesn't define a distinct validation set or its properties.
Hardware Specification No The paper mentions 'GPU-based data augmentations' but does not provide specific details on CPU, GPU models, memory, or any other hardware used for running the experiments.
Software Dependencies No Kornia [50] is used for efficient GPU-based data augmentations. Our model is implemented in Numpy [21] and Py Torch [45]. No version numbers are specified for these libraries.
Experiment Setup Yes For our Deep Mind control suite and Atari games experiments, we largely follow Dr Q, except we perform two gradient steps per environment step instead of one. Following Dr Q, the representation encoder fθ( ) is implemented by the convolutional residual network followed by a fully-connected layer, a Layer Norm and a Tanh non-linearity. We decrease the output dimension of the fully-connected layer after the convnet from 50 to 15. We find it helps to use spectral normalization [39] to normalize the weights and use ELU [15] as the non-linearity in between convolutional layers.