reproducibilityindex.ai

A Direct Approximation of AIXI Using Logical State Abstractions

Authors: Samuel Yang-Zhao, Tianyu Wang, Kee Siong Ng

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on controlling epidemics on large-scale contact networks validates the agent s performance.
Researcher Affiliation	Academia	Samuel Yang-Zhao Australian National University Canberra ACT 2601 samuel.yang-zhao@anu.edu.au Tianyu Wang Australian National University Canberra ACT 2601 tianyu.wang2@anu.edu.au Kee Siong Ng Australian National University Canberra ACT 2601 keesiong.ng@anu.edu.au
Pseudocode	No	The paper describes algorithms and processes but does not include any pseudocode or algorithm blocks.
Open Source Code	No	Did you include the code, data, and instructions needed to reproduce the main experi- mental results (either in the supplemental material or as a URL)? [No]
Open Datasets	Yes	We use an email network dataset as the underlying contact network, licensed under a Creative Commons Attribution-Share Alike License, containing 1133 nodes and 5451 edges [44, 45].
Dataset Splits	No	The paper does not explicitly specify dataset splits (e.g., training, validation, test percentages or counts) or cross-validation methods.
Hardware Specification	Yes	All experiments were performed on a 12-Core AMD Ryzen Threadripper 1920x processor and 32 gigabytes of memory.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks).
Experiment Setup	Yes	The transition model, observation model, Action_Cost(at) are parametrised the same way across all experiments (see Table 1 in Appendix B). A Quarantine(i) action imparts a cost of 1 per node that is quarantined at the given time step. A Vaccinate(i, j) action imparts a lower cost of 0.5 per node. The parameters λ, 1, 2 are varied across experiments. We generate a set of 1489 predicate functions... The Φ-AIXI-CTW agent is trained in an online fashion. The agent explores with probability t at each step t until t < 0.03, where the agent performs in an -greedy way with exploration rate 0.03. RF-BDD was performed with a threshold value of 0.9 across all rewards.