Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks
Authors: Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G Bellemare
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment s reward function. |
| Researcher Affiliation | Collaboration | 1 Mc Gill University, 2 Université de Montréal, 3 Mila Québec AI Institute, 4 University of Oxford 5 Google Research Brain Team |
| Pseudocode | Yes | Algorithm 1 gives pseudo-code for the method as implemented with a fixed replay memory. |
| Open Source Code | Yes | We have released a reference implementation along with notebooks demonstrating how to download and use our pre-trained representations at: https://github.com/google-research/google-research/tree/master/pvn. |
| Open Datasets | Yes | During the representation pre-training phase, we use transition data from offline Atari datasets in RL Unplugged (Agarwal et al., 2020; Gulcehre et2 al., 2020) |
| Dataset Splits | No | The paper mentions 'pre-training phase,' 'online RL phase,' and 'evaluation' but does not explicitly provide details about specific training/validation/test dataset splits, such as percentages or sample counts for data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper lists several software packages and libraries (e.g., Jax, Flax, Optax, Numpy, Pandas, Matplotlib, Seaborn) with citations, but it does not provide specific version numbers for these software dependencies, which is necessary for reproducible ancillary software details. |
| Experiment Setup | Yes | In the tables below we report all relevant hyperparameter choices for both our offline pre-training phase, and online learning phase. Table 1: PVN Hyperparameters. Table 2: Online Hyperparameters. |