Composable Planning with Attributes

Authors: Amy Zhang, Sainbayar Sukhbaatar, Adam Lerer, Arthur Szlam, Rob Fergus

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate compositional planning in several environments. We first consider 3D block stacking, and show that we can compose single-action tasks seen during training to perform multi-step tasks. Second, we plan over multi-step policies in 2-D grid world tasks. Finally, we see how our approach scales to a unit-building task in Star Craft.
Researcher Affiliation Collaboration 1Facebook AI Research, New York, NY, USA 2New York University, New York, NY, USA.
Pseudocode Yes Algorithm 1 Attribute Planner Training
Open Source Code No The paper does not contain an explicit statement about the release of its source code or a link to a code repository for the described methodology.
Open Datasets No The paper uses environments like Mujoco, Mazebase, and Star Craft for experiments, generating data within them ('Each training episode is initiated from a random initial state and lasts only one step'), but does not provide concrete access information (link, DOI, specific citation with authors/year for a public dataset) for the data used.
Dataset Splits No The paper mentions training on '1 million examples' or '10,000 examples' for the attribute detector, but it does not specify explicit training/validation/test dataset splits (e.g., exact percentages or sample counts for each split).
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions several environments and platforms like Mujoco, Mazebase, and Star Craft with citations, but it does not specify software dependencies (e.g., programming language, libraries, or frameworks) with version numbers.
Experiment Setup Yes Models are trained for a total of 30 million steps. AP uses 16 million steps for exploration and 14 million steps for training. ... During the final phase of training we simultaneously compute and c , so we use an exponentially decaying average of the success rate of to deal with it s nonstationarity: c ( i, j) = t=1 γT t St t=1 γT t At ( i, j) where T is the number of training epochs, At is the number of attempted transitions ( i, j) during epoch t, and St is the number of successful transitions. A decay rate of γ = 0.9 is used.