Learning Task Decomposition with Ordered Memory Policy Network
Authors: Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Craft and Dial demonstrate that our model can achieve higher task decomposition performance under both unsupervised and weakly supervised settings, comparing with strong baselines. |
| Researcher Affiliation | Collaboration | Yuchen Lu & Yikang Shen University of Montreal, Mila Montreal, Canada Siyuan Zhou Peking University Beijing, China Aaron Courville University of Montreal, Mila, CIFAR Montreal, Canada Joshua B. Tenenbaum MIT BCS, CBMM, CSAIL Cambridge, United States Chuang Gan MIT-IBM Watson AI Lab Cambridge, United States |
| Pseudocode | Yes | Algorithm 1: Data Collection with Gym API; Algorithm 2: Get boundary from a given threshold; Algorithm 3: Automatic threshold selection |
| Open Source Code | No | 1Project page: https://ordered-memory-rl.github.io/ (This link is to a project page, not explicitly a code repository for the methodology, and the paper does not contain a statement like 'We release our code at...'). |
| Open Datasets | Yes | For the discrete action space, we use a grid world environment called Craft adapted from Andreas et al. (2017). For the continuous action space, we have a robotic setting called Dial (Shiarlis et al., 2018). We use the demonstration released by Gupta et al. (2019). |
| Dataset Splits | No | The paper describes training on generated demonstrations (e.g., '500 episodes each on Make Axe, Make Shears and Make Bed' for Craft, '1400 trajectories' for Dial), and evaluation metrics like F1 scores, but it does not specify explicit train/validation/test dataset splits with percentages, absolute counts, or references to predefined splits needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for its experiments, such as specific GPU or CPU models, or detailed cloud/cluster resource specifications. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' and implies the use of a deep learning framework ('pytorch' is mentioned in the context of TACO baseline implementation), but it does not provide specific version numbers for these or other key software components, libraries, or dependencies. |
| Experiment Setup | Yes | We set the number of slots to be 3 in both Craft and Dial, and each memory has dimension 128. We use Adam optimizer to train our model with β1 = 0.9, β2 = 0.999. The learning rate is 0.001 in Craft and 0.0005 in Dial. We set the length of BPTT to be 64 in both experiments. We clip the gradients with L2 norm 0.2. |