reproducibilityindex.ai

Self-Predictive Universal AI

Authors: Elliot Catt, Jordi Grau-Moya, Marcus Hutter, Matthew Aitchison, Tim Genewein, Grégoire Delétang, Kevin Li, Joel Veness

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	While our work is mainly theoretical, we also conducted experiments (see Appendix B) comparing self-prediction, using a Self-AIXI approximation, against the pure planning approach, using an AIXI approximation, using Context Tree Weighting as predictor and Monte-Carlo Tree Search for the Q-value estimates. ... Results. Figure 1 shows learning curves for the Cheeze Maze, Tiger and 4x4 Grid domains respectively. ... The final performance (as evaluated by the average reward per step over the final 2000 timesteps) of each agent configuration is shown in Table 2.
Researcher Affiliation	Industry	Elliot Catt, Jordi Grau Moya, Marcus Hutter, Matthew Aitchison Tim Genewein, Gregoire Deletang, Kevin Li Wenliang, Joel Veness Google Deep Mind ecatt@google.com
Pseudocode	No	The paper provides definitions, theorems, and proofs but does not include any pseudocode or explicitly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We evaluated across 5 stochastic, partially observable and history dependent domains: Cheeze Maze, Kuhn Poker, 4x4 Grid, Tiger and Biased Scissor/Paper/Rock. The description for each of these domains can once again be found in [20].
Dataset Splits	No	The paper describes an online learning setup with a single run across timesteps and rolling average reward for evaluation, but does not provide explicit train/validation/test dataset splits or cross-validation details.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, or cloud computing resources) used for running the experiments.
Software Dependencies	No	The paper mentions techniques like Monte-Carlo Tree Search and Context Tree Weighting but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	For all experiments, we use a finite m-horizon undiscounted return setup, with each domain specific horizon choice given by Table 3 in [20]. The CTW depth parameter to both the environment model (ˆξ := CTWd) and the self-prediction model (ˆζ := CTWd) was also chosen to match Table 3 in [20]. ... Each environment was evaluated by performing a single online run across 104 timesteps; ... At each timestep t, both agents pick either a random action with probability ϵt := 0.2 0.999t, or otherwise return the estimated best action according to action-value estimates computed with 500 simulations.