Composable Planning with Attributes
Authors: Amy Zhang, Sainbayar Sukhbaatar, Adam Lerer, Arthur Szlam, Rob Fergus
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate compositional planning in several environments. We first consider 3D block stacking, and show that we can compose single-action tasks seen during training to perform multi-step tasks. Second, we plan over multi-step policies in 2-D grid world tasks. Finally, we see how our approach scales to a unit-building task in Star Craft. |
| Researcher Affiliation | Collaboration | 1Facebook AI Research, New York, NY, USA 2New York University, New York, NY, USA. |
| Pseudocode | Yes | Algorithm 1 Attribute Planner Training |
| Open Source Code | No | The paper does not contain an explicit statement about the release of its source code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper uses environments like Mujoco, Mazebase, and Star Craft for experiments, generating data within them ('Each training episode is initiated from a random initial state and lasts only one step'), but does not provide concrete access information (link, DOI, specific citation with authors/year for a public dataset) for the data used. |
| Dataset Splits | No | The paper mentions training on '1 million examples' or '10,000 examples' for the attribute detector, but it does not specify explicit training/validation/test dataset splits (e.g., exact percentages or sample counts for each split). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions several environments and platforms like Mujoco, Mazebase, and Star Craft with citations, but it does not specify software dependencies (e.g., programming language, libraries, or frameworks) with version numbers. |
| Experiment Setup | Yes | Models are trained for a total of 30 million steps. AP uses 16 million steps for exploration and 14 million steps for training. ... During the final phase of training we simultaneously compute and c , so we use an exponentially decaying average of the success rate of to deal with it s nonstationarity: c ( i, j) = t=1 γT t St t=1 γT t At ( i, j) where T is the number of training epochs, At is the number of attempted transitions ( i, j) during epoch t, and St is the number of successful transitions. A decay rate of γ = 0.9 is used. |