reproducibilityindex.ai

Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives

Authors: Anirudh Goyal, Shagun Sodhani, Jonathan Binas, Xue Bin Peng, Sergey Levine, Yoshua Bengio

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we brieﬂy outline the tasks that we used to evaluate our proposed method and direct the reader to the appendix for the complete details of each task along with the hyperparameters used for the model. We designed experiments to address the following questions: a) Learning primitives Can an ensemble of primitives be learned over a distribution of tasks? b) Transfer Learning using primitives Can the learned primitives be transferred to unseen/unsolvable sparse environments? c) Comparison to centralized methods How does our method compare to approaches where the primitives are trained using an explicit meta-controller, in a centralized way?
Researcher Affiliation	Collaboration	1 Mila, University of Montreal; 2 Facebook AI Research; work done while the author was at Mila, University of Montreal; 3 University of California, Berkeley.
Pseudocode	No	The paper describes the objective function and mechanisms in text, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code	No	The paper refers to third-party code for baselines (e.g., 'https://github.com/jeanharb/option_critic', 'https://github.com/openai/mlsh') but does not provide a link or explicit statement about the availability of their own source code for the proposed method.
Open Datasets	Yes	Four Room Maze: We consider the Four-rooms gridworld environment (Sutton et al., 1999c)...
Dataset Splits	No	The paper does not explicitly state training/validation/test dataset splits with specific percentages or sample counts for the environments used.
Hardware Specification	No	The authors acknowledge Google Cloud credits were used ('The authors are very grateful to Google for giving Google Cloud credits used in this project.'), but no specific hardware models (GPU/CPU) or detailed cloud instance specifications are provided.
Software Dependencies	Yes	All the models (proposed as well as the baselines) are implemented in Pytorch 1.1 unless stated otherwise. (Paszke et al., 2017).
Experiment Setup	Yes	Table 1 lists the different hyperparameters for the Mini Grid tasks. Parameter Value Learning Algorithm A2C (Wu et al., 2017) Opitimizer RMSProp (Tieleman & Hinton, 2012) learning rate 7 10 4 batch size 64 discount 0.99 lambda (for GAE (Schulman et al., 2015)) 0.95 entropy coefﬁcient 10 2 loss coefﬁcient 0.5 Maximum gradient norm 0.5