Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

Authors: Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G Bellemare

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment s reward function.
Researcher Affiliation Collaboration 1 Mc Gill University, 2 Université de Montréal, 3 Mila Québec AI Institute, 4 University of Oxford 5 Google Research Brain Team
Pseudocode Yes Algorithm 1 gives pseudo-code for the method as implemented with a fixed replay memory.
Open Source Code Yes We have released a reference implementation along with notebooks demonstrating how to download and use our pre-trained representations at: https://github.com/google-research/google-research/tree/master/pvn.
Open Datasets Yes During the representation pre-training phase, we use transition data from offline Atari datasets in RL Unplugged (Agarwal et al., 2020; Gulcehre et2 al., 2020)
Dataset Splits No The paper mentions 'pre-training phase,' 'online RL phase,' and 'evaluation' but does not explicitly provide details about specific training/validation/test dataset splits, such as percentages or sample counts for data partitioning.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper lists several software packages and libraries (e.g., Jax, Flax, Optax, Numpy, Pandas, Matplotlib, Seaborn) with citations, but it does not provide specific version numbers for these software dependencies, which is necessary for reproducible ancillary software details.
Experiment Setup Yes In the tables below we report all relevant hyperparameter choices for both our offline pre-training phase, and online learning phase. Table 1: PVN Hyperparameters. Table 2: Online Hyperparameters.