Combined Reinforcement Learning via Abstract Representations
Authors: Vincent Francois-Lavet, Yoshua Bengio, Doina Precup, Joelle Pineau3582-3589
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experimental section, we show for two contrasting domains that the CRAR agent is able to build an interpretable low-dimensional representation of the task and that it can use it for efficient planning. We also show that the CRAR agent leads to effective multi-task generalization and that it can efficiently be used for transfer learning. |
| Researcher Affiliation | Collaboration | Vincent Franc ois-Lavet Mc Gill University, Mila vincent.francois-lavet@mcgill.ca Doina Precup Mc Gill University, Mila, Deep Mind dprecup@cs.mcgill.ca Yoshua Bengio Universit e de Montreal, Mila yoshua.bengio@mila.quebec Joelle Pineau Mc Gill University, Mila, Facebook AI Research jpineau@cs.mcgill.ca |
| Pseudocode | No | No explicit pseudocode or algorithm block labeled as such was found in the paper. |
| Open Source Code | Yes | The source code for all experiments is available at https://github.com/Vin F/deer/ |
| Open Datasets | No | The paper describes datasets it generated for its experiments (e.g., “5000 transitions obtained with a purely random policy” for Labyrinth task, and “2 * 10^5 steps” for meta-learning labyrinths), but it does not provide concrete access information (link, DOI, formal citation) for these datasets to be publicly available. |
| Dataset Splits | No | The paper mentions “training set” and evaluation “at test time” but does not provide specific details on dataset splits (e.g., percentages, exact counts, or citations to predefined splits) for training, validation, and testing. It refers to data gathered offline on a training set and then evaluates on new tasks from the distribution. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, or cloud computing instance specifications) used for running the experiments were provided in the paper. |
| Software Dependencies | No | The paper mentions general techniques like “RMSprop” for optimization and “DQN” algorithms, but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | Details, hyper-parameters along with an ablation study are provided in Appendix B. α = 5 10 4, β = 0.2 and decreasing α by 10% every 2000 training steps. |