Unsupervised State Representation Learning in Atari
Authors: Ankesh Anand, Evan Racah, Sherjil Ozair, Yoshua Bengio, Marc-Alexandre Côté, R Devon Hjelm
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a new benchmark based on Atari 2600 games where we evaluate representations based on how well they capture the ground truth state variables. ... Finally, we compare our technique with other state-of-the-art generative and contrastive representation learning methods. |
| Researcher Affiliation | Collaboration | Ankesh Anand Mila, Université de Montréal; Microsoft Research; Evan Racah Mila, Université de Montréal; Sherjil Ozair Mila, Université de Montréal; Yoshua Bengio Mila, Université de Montréal; Marc-Alexandre Côté Microsoft Research; R Devon Hjelm Microsoft Research Mila, Université de Montréal |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code associated with this work is available at https://github.com/mila-iqia/atari-representation-learning |
| Open Datasets | Yes | To systematically evaluate the ability of different representation learning methods at capturing the true underlying factors of variation, we propose a benchmark based on Atari 2600 games using the Arcade Learning Environment [ALE, 28]. ... We make this available with an easy-to-use gym wrapper, which returns this information with no change to existing code using gym interfaces. |
| Dataset Splits | Yes | We train each linear probe with 35,000 frames and use 5,000 and 10,000 frames each for validation and test respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions "Py Torch [79] and Weights&Biases" but does not specify version numbers for these software components. |
| Experiment Setup | Yes | For all our experiments, we used = 0.2. All methods use the same base encoder architecture, which is the CNN from [73], but adapted for the full 160x210 Atari frame size. To ensure a fair comparison, we use a representation size of 256 for each method. We train each linear probe with 35,000 frames and use 5,000 and 10,000 frames each for validation and test respectively. We use early stopping and a learning rate scheduler based on plateaus in the validation loss. |