Dynamic Knowledge Injection for AIXI Agents

Authors: Samuel Yang-Zhao, Kee Siong Ng, Marcus Hutter

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on epidemic control on contact networks validates the agent s practical utility. We validate the agent s performance empirically on multiple experimental domains, including the control of epidemics on large contact networks, and our results demonstrate that Dynamic Hedge AIXI is able to quickly adapt to new knowledge that improves its performance. Experiments Experiment Setup In the Dynamic Knowledge Injection setting, new knowledge arrives from a human operator in the form of new domain-specific predicates. A predicate environment model is then generated from these predicates for the agent to utilise. To display the adaptive behaviour of our agent, we consider the setting where better models are generated for the agent over time. We simulate the dynamic knowledge injection setting by maintaining two sets of predicates I and U representing the informative and uninformative predicates for each domain. Given p [0, 1], a new Φ-BCTW model of depth d is constructed by sampling p d predicates from I and d p d predicates from U. The proportion p initially starts out small and increases over time. In all our experiments, we drop the Φ-BCTW model with the lowest weight and introduce a new Φ-BCTW model every 4K steps. The model is also pre-trained on the preceding 4K steps to ensure it does not perform too poorly to when it is first introduced. The parameter p starts out at p = 0.05 and increases by 0.05 every 4K steps. The full details of the experiment design are provided in the extended version of the paper (Yang-Zhao, Ng, and Hutter 2023). Figures 1 and 2 display the mean learning curves for Dynamic Hedge AIXI, U-Tree, PARSS and Hedge AIXI with standard deviations computed over five random seeds.
Researcher Affiliation Collaboration Samuel Yang-Zhao1, Kee Siong Ng1, Marcus Hutter1, 2 1Australian National University 2Google Deep Mind
Pseudocode Yes Algorithm 1: Dynamic Hedge (modifies Growing Hedge (Mourtada and Maillard 2017)) Algorithm 2: Dynamic Hedge AIXI
Open Source Code No The paper does not contain any explicit statements or links indicating the release of source code for the described methodology.
Open Datasets Yes Biased Rock-Paper-Scissors (RPS). This domain is taken from (Farias et al. 2010). Taxi. The Taxi environment was first introduced in (Dietterich 2000). We use the network dataset from (Rossi and Ahmed 2015; Guimer a et al. 2003), which contains 1133 nodes and 5451 edges.
Dataset Splits No The paper mentions 'mean learning curves... with standard deviations computed over five random seeds' but does not provide explicit train/validation/test dataset split percentages, sample counts, or specific predefined split methodologies.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running its experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9') that are necessary for reproducing the experiments.
Experiment Setup Yes In all our experiments, we drop the Φ-BCTW model with the lowest weight and introduce a new Φ-BCTW model every 4K steps. The model is also pre-trained on the preceding 4K steps to ensure it does not perform too poorly to when it is first introduced. The parameter p starts out at p = 0.05 and increases by 0.05 every 4K steps. The full details of the experiment design are provided in the extended version of the paper (Yang-Zhao, Ng, and Hutter 2023). In practice, we let η = 1, ν = 1 and instantiate Dynamic Hedge AIXI in the case where each specialist is a Φ-BCTW model.