Modular Multitask Reinforcement Learning with Policy Sketches
Authors: Jacob Andreas, Dan Klein, Sergey Levine
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of our approach in three environments featuring both discrete and continuous control, and with sparse rewards that can be obtained only after completing a number of high-level subgoals. Experiments show that using our approach to learn policies guided by sketches gives better performance than existing techniques for learning task-specific or shared policies, while naturally inducing a library of interpretable primitive behaviors that can be recombined to rapidly adapt to new tasks. |
| Researcher Affiliation | Academia | Jacob Andreas 1 Dan Klein 1 Sergey Levine 1 1University of California, Berkeley. |
| Pseudocode | Yes | Algorithm 1 TRAIN-STEP( , curriculum) Algorithm 2 TRAIN-LOOP() |
| Open Source Code | Yes | We have released code at http://github.com/ jacobandreas/psketch. |
| Open Datasets | No | The crafting environment (Figure 3a) is inspired by the popular game Minecraft, but is implemented in a discrete 2-D world. The maze environment (not pictured) corresponds closely to the the light world described by Konidaris & Barto (2007). The cliff environment (Figure 3b) is intended to demonstrate the applicability of our approach to problems involving high-dimensional continuous control. The paper describes custom environments and does not provide specific access information (link, DOI, etc.) for any public or open dataset used for training. |
| Dataset Splits | No | The paper describes experimental environments and training procedures (TRAIN-STEP, TRAIN-LOOP) but does not provide explicit numerical or proportional train/validation/test dataset splits for reproducibility. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running experiments were mentioned. |
| Software Dependencies | No | The paper mentions implementation details like 'feedforward neural network with Re LU nonlinearities' and 'RMSPROP' but does not provide specific software dependencies or library version numbers (e.g., PyTorch version, TensorFlow version). |
| Experiment Setup | Yes | In all our experiments, we implement each subpolicy as a feedforward neural network with Re LU nonlinearities and a hidden layer with 128 hidden units, and each critic as a linear function of the current state. ... The gradient steps given in lines 8 and 9 of Algorithm 1 are implemented using RMSPROP (Tieleman, 2012) with a step size of 0.001 and gradient clipping to a unit norm. We take the batch size D in Algorithm 1 to be 2000, and set γ = 0.9 in both environments. For curriculum learning, the improvement threshold rgood is 0.8. |