reproducibilityindex.ai

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

Authors: Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Although our method can in general be applied to a wide range of problems, we use Atari games as a testing environment to demonstrate these methods. In the following experiments, we validate the Actor-Mimic method by demonstrating its effectiveness at both multitask and transfer learning in the Arcade Learning Environment (ALE).
Researcher Affiliation	Academia	Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov Department of Computer Science University of Toronto Toronto, Ontario, Canada {eparisotto,jimmy,rsalakhu}@cs.toronto.edu
Pseudocode	No	The paper describes the Actor-Mimic method through textual explanations and mathematical formulations, but it does not present a dedicated pseudocode block or algorithm listing.
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets	Yes	Although our method can in general be applied to a wide range of problems, we use Atari games as a testing environment to demonstrate these methods. The Arcade Learning Environment (ALE) (Bellemare et al., 2013)
Dataset Splits	No	The paper describes training and testing epochs for evaluation and uses a replay memory, but it does not specify explicit train/validation/test dataset splits (e.g., in percentages or sample counts) for reproducibility.
Hardware Specification	Yes	Processing 5 million frames with the large model is equivalent to around 4 days of compute time on a NVIDIA GTX Titan.
Software Dependencies	No	All of our Actor-Mimic Networks (AMNs) were trained using the Adam (Kingma & Ba, 2015) optimization algorithm. For the experiments using the DQN algorithm, we optimize the networks with RMSProp. The paper mentions software tools like Adam and RMSProp but does not specify their version numbers.
Experiment Setup	Yes	For the transfer experiments with the feature regression objective, we set the scaling parameter β to 0.01 and the feature prediction network ﬁ was set to a linear projection from the AMN features to the ith expert features. For the policy regression objective, we use a softmax temperature of 1 in all cases. Additionally, during training for all AMNs we use an ϵ-greedy policy with ϵ set to a constant 0.1. For AMNs we use a per-game 100,000 frame replay memory. We use the full 1,000,000 frame replay memory when training any DQN.