Learning Task Decomposition with Ordered Memory Policy Network

Authors: Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Craft and Dial demonstrate that our model can achieve higher task decomposition performance under both unsupervised and weakly supervised settings, comparing with strong baselines.
Researcher Affiliation Collaboration Yuchen Lu & Yikang Shen University of Montreal, Mila Montreal, Canada Siyuan Zhou Peking University Beijing, China Aaron Courville University of Montreal, Mila, CIFAR Montreal, Canada Joshua B. Tenenbaum MIT BCS, CBMM, CSAIL Cambridge, United States Chuang Gan MIT-IBM Watson AI Lab Cambridge, United States
Pseudocode Yes Algorithm 1: Data Collection with Gym API; Algorithm 2: Get boundary from a given threshold; Algorithm 3: Automatic threshold selection
Open Source Code No 1Project page: https://ordered-memory-rl.github.io/ (This link is to a project page, not explicitly a code repository for the methodology, and the paper does not contain a statement like 'We release our code at...').
Open Datasets Yes For the discrete action space, we use a grid world environment called Craft adapted from Andreas et al. (2017). For the continuous action space, we have a robotic setting called Dial (Shiarlis et al., 2018). We use the demonstration released by Gupta et al. (2019).
Dataset Splits No The paper describes training on generated demonstrations (e.g., '500 episodes each on Make Axe, Make Shears and Make Bed' for Craft, '1400 trajectories' for Dial), and evaluation metrics like F1 scores, but it does not specify explicit train/validation/test dataset splits with percentages, absolute counts, or references to predefined splits needed to reproduce the data partitioning.
Hardware Specification No The paper does not explicitly describe the hardware used for its experiments, such as specific GPU or CPU models, or detailed cloud/cluster resource specifications.
Software Dependencies No The paper mentions 'Adam optimizer' and implies the use of a deep learning framework ('pytorch' is mentioned in the context of TACO baseline implementation), but it does not provide specific version numbers for these or other key software components, libraries, or dependencies.
Experiment Setup Yes We set the number of slots to be 3 in both Craft and Dial, and each memory has dimension 128. We use Adam optimizer to train our model with β1 = 0.9, β2 = 0.999. The learning rate is 0.001 in Craft and 0.0005 in Dial. We set the length of BPTT to be 64 in both experiments. We clip the gradients with L2 norm 0.2.