A Direct Approximation of AIXI Using Logical State Abstractions
Authors: Samuel Yang-Zhao, Tianyu Wang, Kee Siong Ng
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on controlling epidemics on large-scale contact networks validates the agent s performance. |
| Researcher Affiliation | Academia | Samuel Yang-Zhao Australian National University Canberra ACT 2601 samuel.yang-zhao@anu.edu.au Tianyu Wang Australian National University Canberra ACT 2601 tianyu.wang2@anu.edu.au Kee Siong Ng Australian National University Canberra ACT 2601 keesiong.ng@anu.edu.au |
| Pseudocode | No | The paper describes algorithms and processes but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experi- mental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | Yes | We use an email network dataset as the underlying contact network, licensed under a Creative Commons Attribution-Share Alike License, containing 1133 nodes and 5451 edges [44, 45]. |
| Dataset Splits | No | The paper does not explicitly specify dataset splits (e.g., training, validation, test percentages or counts) or cross-validation methods. |
| Hardware Specification | Yes | All experiments were performed on a 12-Core AMD Ryzen Threadripper 1920x processor and 32 gigabytes of memory. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | Yes | The transition model, observation model, Action_Cost(at) are parametrised the same way across all experiments (see Table 1 in Appendix B). A Quarantine(i) action imparts a cost of 1 per node that is quarantined at the given time step. A Vaccinate(i, j) action imparts a lower cost of 0.5 per node. The parameters λ, 1, 2 are varied across experiments. We generate a set of 1489 predicate functions... The Φ-AIXI-CTW agent is trained in an online fashion. The agent explores with probability t at each step t until t < 0.03, where the agent performs in an -greedy way with exploration rate 0.03. RF-BDD was performed with a threshold value of 0.9 across all rewards. |