reproducibilityindex.ai

Modular Multitask Reinforcement Learning with Policy Sketches

Authors: Jacob Andreas, Dan Klein, Sergey Levine

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the effectiveness of our approach in three environments featuring both discrete and continuous control, and with sparse rewards that can be obtained only after completing a number of high-level subgoals. Experiments show that using our approach to learn policies guided by sketches gives better performance than existing techniques for learning task-speciﬁc or shared policies, while naturally inducing a library of interpretable primitive behaviors that can be recombined to rapidly adapt to new tasks.
Researcher Affiliation	Academia	Jacob Andreas 1 Dan Klein 1 Sergey Levine 1 1University of California, Berkeley.
Pseudocode	Yes	Algorithm 1 TRAIN-STEP( , curriculum) Algorithm 2 TRAIN-LOOP()
Open Source Code	Yes	We have released code at http://github.com/ jacobandreas/psketch.
Open Datasets	No	The crafting environment (Figure 3a) is inspired by the popular game Minecraft, but is implemented in a discrete 2-D world. The maze environment (not pictured) corresponds closely to the the light world described by Konidaris & Barto (2007). The cliff environment (Figure 3b) is intended to demonstrate the applicability of our approach to problems involving high-dimensional continuous control. The paper describes custom environments and does not provide specific access information (link, DOI, etc.) for any public or open dataset used for training.
Dataset Splits	No	The paper describes experimental environments and training procedures (TRAIN-STEP, TRAIN-LOOP) but does not provide explicit numerical or proportional train/validation/test dataset splits for reproducibility.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running experiments were mentioned.
Software Dependencies	No	The paper mentions implementation details like 'feedforward neural network with Re LU nonlinearities' and 'RMSPROP' but does not provide specific software dependencies or library version numbers (e.g., PyTorch version, TensorFlow version).
Experiment Setup	Yes	In all our experiments, we implement each subpolicy as a feedforward neural network with Re LU nonlinearities and a hidden layer with 128 hidden units, and each critic as a linear function of the current state. ... The gradient steps given in lines 8 and 9 of Algorithm 1 are implemented using RMSPROP (Tieleman, 2012) with a step size of 0.001 and gradient clipping to a unit norm. We take the batch size D in Algorithm 1 to be 2000, and set γ = 0.9 in both environments. For curriculum learning, the improvement threshold rgood is 0.8.