Robust agents learn causal world models
Authors: Jonathan Richens, Tom Everitt
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Appendix F we demonstrate learning the underlying CBN from regret-bounded policies using simulated data for randomly generated CIDs similar to Figure 1, and explore how the accuracy of the approximate CBN scales with the regret bound (Figure 3). |
| Researcher Affiliation | Industry | Jonathan Richens Google Deep Mind Tom Everitt Google Deep Mind jonrichens@deepmind.com |
| Pseudocode | Yes | Algorithm 1 Identify qcrit, d2, d3 using policy oracle. ... Algorithm 2 Graph Learner for simple CID |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | No | The paper uses 'synthetic data' and 'simulated data' generated randomly (e.g., 'randomly generated CBNs'), not a publicly available dataset with a specific access link or citation. |
| Dataset Splits | No | The paper uses simulated and randomly generated data. It does not specify fixed training, validation, or test splits in the conventional sense used for fixed datasets. Results are averaged over 'randomly generated environments'. |
| Hardware Specification | No | The paper does not mention any specific hardware specifications (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers, for replicating the experiments. |
| Experiment Setup | No | The paper describes the simulation logic for the regret-bounded agent and how CBNs are randomly generated, but it does not provide specific hyperparameters (e.g., learning rate, batch size, epochs) or detailed system-level training settings commonly found in machine learning experimental setups. |