Modular Multitask Reinforcement Learning with Policy Sketches

Authors: Jacob Andreas, Dan Klein, Sergey Levine

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the effectiveness of our approach on a maze navigation game and a 2-D Minecraft-inspired crafting game. Both games feature extremely sparse rewards that can be obtained only after completing a number of high-level subgoals (e.g. escaping from a sequence of locked rooms or collecting and combining various ingredients in the proper order). Experiments illustrate two main advantages of our approach.
Researcher Affiliation Academia Jacob Andreas, Dan Klein, and Sergey Levine Computer Science Division University of California, Berkeley {jda,klein,svlevine}@eecs.berkeley.edu
Pseudocode Yes Algorithm 1 DO-STEP(Π, curriculum) [...] Algorithm 2 TRAIN-POLICIES()
Open Source Code Yes We have released our code at http://github.com/jacobandreas/psketch.
Open Datasets No The paper describes custom-built 'maze environment' and 'crafting environment' for its experiments. It does not provide access information (links, citations, or repository names) for publicly available or open datasets.
Dataset Splits No The paper describes the use of training and test sets but does not explicitly mention a separate 'validation' dataset split or its characteristics for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions that RMSPROP was used but does not provide specific version numbers for other key software components, libraries, or frameworks used in the experiments.
Experiment Setup Yes In all our experiments, we implement each subpolicy as a multilayer perceptron with Re LU nonlinearities and a hidden layer with 128 hidden units, and each critic as a linear function of the current state. [...] The gradient steps given in lines 8 and 9 of Algorithm 1 are implemented using RMSPROP (Tieleman, 2012) with a step size of 0.001 and gradient clipping to a unit norm. We take the batch size parameter D in Algorithm 1 to be 2000, and set γ = 0.9 in both environments. For curriculum learning, the improvement threshold rgood is set to 0.8.