Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Modular Multitask Reinforcement Learning with Policy Sketches
Authors: Jacob Andreas, Dan Klein, Sergey Levine
ICLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of our approach on a maze navigation game and a 2-D Minecraft-inspired crafting game. Both games feature extremely sparse rewards that can be obtained only after completing a number of high-level subgoals (e.g. escaping from a sequence of locked rooms or collecting and combining various ingredients in the proper order). Experiments illustrate two main advantages of our approach. |
| Researcher Affiliation | Academia | Jacob Andreas, Dan Klein, and Sergey Levine Computer Science Division University of California, Berkeley EMAIL |
| Pseudocode | Yes | Algorithm 1 DO-STEP(Π, curriculum) [...] Algorithm 2 TRAIN-POLICIES() |
| Open Source Code | Yes | We have released our code at http://github.com/jacobandreas/psketch. |
| Open Datasets | No | The paper describes custom-built 'maze environment' and 'crafting environment' for its experiments. It does not provide access information (links, citations, or repository names) for publicly available or open datasets. |
| Dataset Splits | No | The paper describes the use of training and test sets but does not explicitly mention a separate 'validation' dataset split or its characteristics for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions that RMSPROP was used but does not provide specific version numbers for other key software components, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | In all our experiments, we implement each subpolicy as a multilayer perceptron with Re LU nonlinearities and a hidden layer with 128 hidden units, and each critic as a linear function of the current state. [...] The gradient steps given in lines 8 and 9 of Algorithm 1 are implemented using RMSPROP (Tieleman, 2012) with a step size of 0.001 and gradient clipping to a unit norm. We take the batch size parameter D in Algorithm 1 to be 2000, and set γ = 0.9 in both environments. For curriculum learning, the improvement threshold rgood is set to 0.8. |