Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Action Schema Networks: Generalised Policies With Deep Learning
Authors: Sam Toyer, Felipe Trevizan, Sylvie Thiébaux, Lexing Xie
AAAI 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we show that ASNet s learning capability allows it to significantly outperform traditional non-learning planners in several challenging domains. |
| Researcher Affiliation | Collaboration | Sam Toyer,1 Felipe Trevizan,1,2 Sylvie Thi ebaux,1 Lexing Xie1,3 1 Research School of Computer Science, Australian National University 2 Data61, CSIRO 3 Data to Decisions CRC firstEMAIL |
| Pseudocode | Yes | Algorithm 1 describes a single epoch of exploration and supervised learning. |
| Open Source Code | Yes | Code and models for this work are available online. 1https://github.com/qxcv/asnets |
| Open Datasets | Yes | We evaluate ASNets and the baselines on the following probabilistic planning domains: Cosa Nostra Pizza, Probabilistic Blocks World, Triangle Tire World (Little and Thi ebaux 2007). Code and models for this work are available online. 1https://github.com/qxcv/asnets |
| Dataset Splits | No | The paper describes a set of training problems (Ptrain) and tests on larger problems, but it does not explicitly specify a separate validation dataset split, percentages, or sample counts. |
| Hardware Specification | Yes | All ASNets were trained and evaluated on a virtual machine equipped with 62GB of memory and an x86-64 processor clocked at 2.3GHz. For training and evaluation, each ASNet was restricted to use a single, dedicated processor core, but resources were otherwise shared. The baseline planners were run in a cluster of x86-64 processors clocked at 2.6GHz and each planner again used only a single core. |
| Software Dependencies | No | The paper mentions optimizers (Adam) and non-linearities (ELU), but it does not specify versions for any deep learning frameworks (e.g., TensorFlow, PyTorch), programming languages (e.g., Python), or other specific software libraries required for replication. |
| Experiment Setup | Yes | The hyperparmeters for each ASNet were kept fixed across domains: three action layers and two proposition layers in each network, a hidden representation size of 16 for each internal action and proposition module, and an ELU (Clevert, Unterthiner, and Hochreiter 2016) as the nonlinearity f. The optimiser was configured with a learning rate of 0.0005 and a batch size of 128, and a hard limit of two hours (7200s) was placed on training. We also applied ℓ2 regularisation with a coefficient of 0.001 on all weights, and dropout on the outputs of each layer except the last with p = 0.25. Each epoch of training alternated between 25 rounds of exploration shared equally among all training problems, and 300 batches of network optimisation (i.e. Texplore = 25/|Ptrain| and Ttrain = 300). Sampled trajectory lengths are L = 300 for both training and evaluation. |