Action Schema Networks: Generalised Policies With Deep Learning
Authors: Sam Toyer, Felipe Trevizan, Sylvie Thiébaux, Lexing Xie
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we show that ASNet s learning capability allows it to significantly outperform traditional non-learning planners in several challenging domains. |
| Researcher Affiliation | Collaboration | Sam Toyer,1 Felipe Trevizan,1,2 Sylvie Thi ebaux,1 Lexing Xie1,3 1 Research School of Computer Science, Australian National University 2 Data61, CSIRO 3 Data to Decisions CRC first.last@anu.edu.au |
| Pseudocode | Yes | Algorithm 1 describes a single epoch of exploration and supervised learning. |
| Open Source Code | Yes | Code and models for this work are available online. 1https://github.com/qxcv/asnets |
| Open Datasets | Yes | We evaluate ASNets and the baselines on the following probabilistic planning domains: Cosa Nostra Pizza, Probabilistic Blocks World, Triangle Tire World (Little and Thi ebaux 2007). Code and models for this work are available online. 1https://github.com/qxcv/asnets |
| Dataset Splits | No | The paper describes a set of training problems (Ptrain) and tests on larger problems, but it does not explicitly specify a separate validation dataset split, percentages, or sample counts. |
| Hardware Specification | Yes | All ASNets were trained and evaluated on a virtual machine equipped with 62GB of memory and an x86-64 processor clocked at 2.3GHz. For training and evaluation, each ASNet was restricted to use a single, dedicated processor core, but resources were otherwise shared. The baseline planners were run in a cluster of x86-64 processors clocked at 2.6GHz and each planner again used only a single core. |
| Software Dependencies | No | The paper mentions optimizers (Adam) and non-linearities (ELU), but it does not specify versions for any deep learning frameworks (e.g., TensorFlow, PyTorch), programming languages (e.g., Python), or other specific software libraries required for replication. |
| Experiment Setup | Yes | The hyperparmeters for each ASNet were kept fixed across domains: three action layers and two proposition layers in each network, a hidden representation size of 16 for each internal action and proposition module, and an ELU (Clevert, Unterthiner, and Hochreiter 2016) as the nonlinearity f. The optimiser was configured with a learning rate of 0.0005 and a batch size of 128, and a hard limit of two hours (7200s) was placed on training. We also applied ℓ2 regularisation with a coefficient of 0.001 on all weights, and dropout on the outputs of each layer except the last with p = 0.25. Each epoch of training alternated between 25 rounds of exploration shared equally among all training problems, and 300 batches of network optimisation (i.e. Texplore = 25/|Ptrain| and Ttrain = 300). Sampled trajectory lengths are L = 300 for both training and evaluation. |