Robust agents learn causal world models

Authors: Jonathan Richens, Tom Everitt

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Appendix F we demonstrate learning the underlying CBN from regret-bounded policies using simulated data for randomly generated CIDs similar to Figure 1, and explore how the accuracy of the approximate CBN scales with the regret bound (Figure 3).
Researcher Affiliation Industry Jonathan Richens Google Deep Mind Tom Everitt Google Deep Mind jonrichens@deepmind.com
Pseudocode Yes Algorithm 1 Identify qcrit, d2, d3 using policy oracle. ... Algorithm 2 Graph Learner for simple CID
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets No The paper uses 'synthetic data' and 'simulated data' generated randomly (e.g., 'randomly generated CBNs'), not a publicly available dataset with a specific access link or citation.
Dataset Splits No The paper uses simulated and randomly generated data. It does not specify fixed training, validation, or test splits in the conventional sense used for fixed datasets. Results are averaged over 'randomly generated environments'.
Hardware Specification No The paper does not mention any specific hardware specifications (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers, for replicating the experiments.
Experiment Setup No The paper describes the simulation logic for the regret-bounded agent and how CBNs are randomly generated, but it does not provide specific hyperparameters (e.g., learning rate, batch size, epochs) or detailed system-level training settings commonly found in machine learning experimental setups.