reproducibilityindex.ai

Modular Multitask Reinforcement Learning with Policy Sketches

Authors: Jacob Andreas, Dan Klein, Sergey Levine

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the effectiveness of our approach on a maze navigation game and a 2-D Minecraft-inspired crafting game. Both games feature extremely sparse rewards that can be obtained only after completing a number of high-level subgoals (e.g. escaping from a sequence of locked rooms or collecting and combining various ingredients in the proper order). Experiments illustrate two main advantages of our approach.
Researcher Affiliation	Academia	Jacob Andreas, Dan Klein, and Sergey Levine Computer Science Division University of California, Berkeley {jda,klein,svlevine}@eecs.berkeley.edu
Pseudocode	Yes	Algorithm 1 DO-STEP(Π, curriculum) [...] Algorithm 2 TRAIN-POLICIES()
Open Source Code	Yes	We have released our code at http://github.com/jacobandreas/psketch.
Open Datasets	No	The paper describes custom-built 'maze environment' and 'crafting environment' for its experiments. It does not provide access information (links, citations, or repository names) for publicly available or open datasets.
Dataset Splits	No	The paper describes the use of training and test sets but does not explicitly mention a separate 'validation' dataset split or its characteristics for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions that RMSPROP was used but does not provide specific version numbers for other key software components, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	In all our experiments, we implement each subpolicy as a multilayer perceptron with Re LU nonlinearities and a hidden layer with 128 hidden units, and each critic as a linear function of the current state. [...] The gradient steps given in lines 8 and 9 of Algorithm 1 are implemented using RMSPROP (Tieleman, 2012) with a step size of 0.001 and gradient clipping to a unit norm. We take the batch size parameter D in Algorithm 1 to be 2000, and set γ = 0.9 in both environments. For curriculum learning, the improvement threshold rgood is set to 0.8.