Modular Multitask Reinforcement Learning with Policy Sketches
Authors: Jacob Andreas, Dan Klein, Sergey Levine
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of our approach on a maze navigation game and a 2-D Minecraft-inspired crafting game. Both games feature extremely sparse rewards that can be obtained only after completing a number of high-level subgoals (e.g. escaping from a sequence of locked rooms or collecting and combining various ingredients in the proper order). Experiments illustrate two main advantages of our approach. |
| Researcher Affiliation | Academia | Jacob Andreas, Dan Klein, and Sergey Levine Computer Science Division University of California, Berkeley {jda,klein,svlevine}@eecs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 DO-STEP(Π, curriculum) [...] Algorithm 2 TRAIN-POLICIES() |
| Open Source Code | Yes | We have released our code at http://github.com/jacobandreas/psketch. |
| Open Datasets | No | The paper describes custom-built 'maze environment' and 'crafting environment' for its experiments. It does not provide access information (links, citations, or repository names) for publicly available or open datasets. |
| Dataset Splits | No | The paper describes the use of training and test sets but does not explicitly mention a separate 'validation' dataset split or its characteristics for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions that RMSPROP was used but does not provide specific version numbers for other key software components, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | In all our experiments, we implement each subpolicy as a multilayer perceptron with Re LU nonlinearities and a hidden layer with 128 hidden units, and each critic as a linear function of the current state. [...] The gradient steps given in lines 8 and 9 of Algorithm 1 are implemented using RMSPROP (Tieleman, 2012) with a step size of 0.001 and gradient clipping to a unit norm. We take the batch size parameter D in Algorithm 1 to be 2000, and set γ = 0.9 in both environments. For curriculum learning, the improvement threshold rgood is set to 0.8. |