reproducibilityindex.ai

Behavior From the Void: Unsupervised Active Pre-Training

Authors: Hao Liu, Pieter Abbeel

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate APT by exposing task-speciﬁc reward after a long unsupervised pre-training phase. In Atari games, APT achieves human-level performance on 12 games and obtains highly competitive performance compared to canonical fully supervised RL algorithms. On DMControl suite, APT beats all baselines in terms of asymptotic performance and data efﬁciency and dramatically improves performance on tasks that are extremely difﬁcult to train from scratch.
Researcher Affiliation	Academia	Hao Liu UC Berkeley hao.liu@cs.berkeley.edu Pieter Abbeel UC Berkeley pabbeel@cs.berkeley.edu
Pseudocode	Yes	Algorithm 1: Training APT
Open Source Code	No	The paper does not include an explicit statement about releasing the source code for the described methodology or provide a link to a code repository for APT.
Open Datasets	Yes	We test APT in Deep Mind Control Suite [DMControl; 58] and the Atari suite [9].
Dataset Splits	No	The paper mentions 'pre-training phase' and 'testing period' but does not explicitly provide details on how validation data was split or used for hyperparameter tuning. It uses a replay buffer for training but doesn't define a distinct validation set or its properties.
Hardware Specification	No	The paper mentions 'GPU-based data augmentations' but does not provide specific details on CPU, GPU models, memory, or any other hardware used for running the experiments.
Software Dependencies	No	Kornia [50] is used for efﬁcient GPU-based data augmentations. Our model is implemented in Numpy [21] and Py Torch [45]. No version numbers are specified for these libraries.
Experiment Setup	Yes	For our Deep Mind control suite and Atari games experiments, we largely follow Dr Q, except we perform two gradient steps per environment step instead of one. Following Dr Q, the representation encoder fθ( ) is implemented by the convolutional residual network followed by a fully-connected layer, a Layer Norm and a Tanh non-linearity. We decrease the output dimension of the fully-connected layer after the convnet from 50 to 15. We ﬁnd it helps to use spectral normalization [39] to normalize the weights and use ELU [15] as the non-linearity in between convolutional layers.