Combined Reinforcement Learning via Abstract Representations

Authors: Vincent Francois-Lavet, Yoshua Bengio, Doina Precup, Joelle Pineau3582-3589

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experimental section, we show for two contrasting domains that the CRAR agent is able to build an interpretable low-dimensional representation of the task and that it can use it for efficient planning. We also show that the CRAR agent leads to effective multi-task generalization and that it can efficiently be used for transfer learning.
Researcher Affiliation Collaboration Vincent Franc ois-Lavet Mc Gill University, Mila vincent.francois-lavet@mcgill.ca Doina Precup Mc Gill University, Mila, Deep Mind dprecup@cs.mcgill.ca Yoshua Bengio Universit e de Montreal, Mila yoshua.bengio@mila.quebec Joelle Pineau Mc Gill University, Mila, Facebook AI Research jpineau@cs.mcgill.ca
Pseudocode No No explicit pseudocode or algorithm block labeled as such was found in the paper.
Open Source Code Yes The source code for all experiments is available at https://github.com/Vin F/deer/
Open Datasets No The paper describes datasets it generated for its experiments (e.g., “5000 transitions obtained with a purely random policy” for Labyrinth task, and “2 * 10^5 steps” for meta-learning labyrinths), but it does not provide concrete access information (link, DOI, formal citation) for these datasets to be publicly available.
Dataset Splits No The paper mentions “training set” and evaluation “at test time” but does not provide specific details on dataset splits (e.g., percentages, exact counts, or citations to predefined splits) for training, validation, and testing. It refers to data gathered offline on a training set and then evaluates on new tasks from the distribution.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, or cloud computing instance specifications) used for running the experiments were provided in the paper.
Software Dependencies No The paper mentions general techniques like “RMSprop” for optimization and “DQN” algorithms, but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x).
Experiment Setup Yes Details, hyper-parameters along with an ablation study are provided in Appendix B. α = 5 10 4, β = 0.2 and decreasing α by 10% every 2000 training steps.