Self-Attentional Credit Assignment for Transfer in Reinforcement Learning
Authors: Johan Ferret, Raphael Marinier, Matthieu Geist, Olivier Pietquin
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we aim to answer the following questions: can SECRET improve the sample efficiency of learning for RL agents? Does it generalize and/or transfer? How does it compare to transfer baselines? Is the credit assigned by SECRET interpretable? |
| Researcher Affiliation | Industry | Johan Ferret , Rapha el Marinier , Matthieu Geist and Olivier Pietquin Google Research, Brain Team {jferret, raphaelm, mfgeist, pietquin}@google.com |
| Pseudocode | No | The paper describes the methodology in prose and mathematical formulations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about open-sourcing the code for the described methodology or provide a link to a code repository. |
| Open Datasets | Yes | We use the keys doors puzzle 3D environment from DMLab [Beattie et al., 2016] |
| Dataset Splits | No | The paper discusses training on a 'source distribution' and evaluating on 'target environments' or 'held-out environments' but does not provide specific numerical train/validation/test splits or detailed splitting methodologies. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions general algorithms and architectures such as 'Q-learning', 'PPO', 'Transformer decoder', and 'convolutional layers', but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We use Q-learning [Watkins and Dayan, 1992] (tabular, with a learning rate of 0.1 and ϵ = 0.1) for experiments in Triggers except for out-of-domain transfer to environments with modified dynamics where we use DQN [Mnih et al., 2015]. We use PPO [Schulman et al., 2017] for in-domain experiments in DMLab, with identical hyperparameters as in Episodic Curiosity [Savinov et al., 2019], whose code is open-source... In Triggers experiments, we use 128 units per dense layer, 32 convolutional filters and a single convolutional layer to process partial states. We use a dropout rate of 0.1 after dense layers, a dropout rate of 0.2 in the self-attention mechanism and in the normalization blocks of the Transformer. Class weights in the loss function are set to w(1) = w( 1) = 0.499, w(0) = 0.002. In DMLab experiments, we use 16 convolutional filters and two convolutional layers to process partial states, and otherwise identical hyperparameters. |