reproducibilityindex.ai

Action Schema Networks: Generalised Policies With Deep Learning

Authors: Sam Toyer, Felipe Trevizan, Sylvie Thiébaux, Lexing Xie

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we show that ASNet s learning capability allows it to signiﬁcantly outperform traditional non-learning planners in several challenging domains.
Researcher Affiliation	Collaboration	Sam Toyer,1 Felipe Trevizan,1,2 Sylvie Thi ebaux,1 Lexing Xie1,3 1 Research School of Computer Science, Australian National University 2 Data61, CSIRO 3 Data to Decisions CRC ﬁrst.last@anu.edu.au
Pseudocode	Yes	Algorithm 1 describes a single epoch of exploration and supervised learning.
Open Source Code	Yes	Code and models for this work are available online. 1https://github.com/qxcv/asnets
Open Datasets	Yes	We evaluate ASNets and the baselines on the following probabilistic planning domains: Cosa Nostra Pizza, Probabilistic Blocks World, Triangle Tire World (Little and Thi ebaux 2007). Code and models for this work are available online. 1https://github.com/qxcv/asnets
Dataset Splits	No	The paper describes a set of training problems (Ptrain) and tests on larger problems, but it does not explicitly specify a separate validation dataset split, percentages, or sample counts.
Hardware Specification	Yes	All ASNets were trained and evaluated on a virtual machine equipped with 62GB of memory and an x86-64 processor clocked at 2.3GHz. For training and evaluation, each ASNet was restricted to use a single, dedicated processor core, but resources were otherwise shared. The baseline planners were run in a cluster of x86-64 processors clocked at 2.6GHz and each planner again used only a single core.
Software Dependencies	No	The paper mentions optimizers (Adam) and non-linearities (ELU), but it does not specify versions for any deep learning frameworks (e.g., TensorFlow, PyTorch), programming languages (e.g., Python), or other specific software libraries required for replication.
Experiment Setup	Yes	The hyperparmeters for each ASNet were kept ﬁxed across domains: three action layers and two proposition layers in each network, a hidden representation size of 16 for each internal action and proposition module, and an ELU (Clevert, Unterthiner, and Hochreiter 2016) as the nonlinearity f. The optimiser was conﬁgured with a learning rate of 0.0005 and a batch size of 128, and a hard limit of two hours (7200s) was placed on training. We also applied ℓ2 regularisation with a coefﬁcient of 0.001 on all weights, and dropout on the outputs of each layer except the last with p = 0.25. Each epoch of training alternated between 25 rounds of exploration shared equally among all training problems, and 300 batches of network optimisation (i.e. Texplore = 25/\|Ptrain\| and Ttrain = 300). Sampled trajectory lengths are L = 300 for both training and evaluation.