Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives

Authors: Anirudh Goyal, Shagun Sodhani, Jonathan Binas, Xue Bin Peng, Sergey Levine, Yoshua Bengio

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we briefly outline the tasks that we used to evaluate our proposed method and direct the reader to the appendix for the complete details of each task along with the hyperparameters used for the model. We designed experiments to address the following questions: a) Learning primitives Can an ensemble of primitives be learned over a distribution of tasks? b) Transfer Learning using primitives Can the learned primitives be transferred to unseen/unsolvable sparse environments? c) Comparison to centralized methods How does our method compare to approaches where the primitives are trained using an explicit meta-controller, in a centralized way?
Researcher Affiliation Collaboration 1 Mila, University of Montreal; 2 Facebook AI Research; work done while the author was at Mila, University of Montreal; 3 University of California, Berkeley.
Pseudocode No The paper describes the objective function and mechanisms in text, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code No The paper refers to third-party code for baselines (e.g., 'https://github.com/jeanharb/option_critic', 'https://github.com/openai/mlsh') but does not provide a link or explicit statement about the availability of their own source code for the proposed method.
Open Datasets Yes Four Room Maze: We consider the Four-rooms gridworld environment (Sutton et al., 1999c)...
Dataset Splits No The paper does not explicitly state training/validation/test dataset splits with specific percentages or sample counts for the environments used.
Hardware Specification No The authors acknowledge Google Cloud credits were used ('The authors are very grateful to Google for giving Google Cloud credits used in this project.'), but no specific hardware models (GPU/CPU) or detailed cloud instance specifications are provided.
Software Dependencies Yes All the models (proposed as well as the baselines) are implemented in Pytorch 1.1 unless stated otherwise. (Paszke et al., 2017).
Experiment Setup Yes Table 1 lists the different hyperparameters for the Mini Grid tasks. Parameter Value Learning Algorithm A2C (Wu et al., 2017) Opitimizer RMSProp (Tieleman & Hinton, 2012) learning rate 7 10 4 batch size 64 discount 0.99 lambda (for GAE (Schulman et al., 2015)) 0.95 entropy coefficient 10 2 loss coefficient 0.5 Maximum gradient norm 0.5