reproducibilityindex.ai

Robust agents learn causal world models

Authors: Jonathan Richens, Tom Everitt

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Appendix F we demonstrate learning the underlying CBN from regret-bounded policies using simulated data for randomly generated CIDs similar to Figure 1, and explore how the accuracy of the approximate CBN scales with the regret bound (Figure 3).
Researcher Affiliation	Industry	Jonathan Richens Google Deep Mind Tom Everitt Google Deep Mind jonrichens@deepmind.com
Pseudocode	Yes	Algorithm 1 Identify qcrit, d2, d3 using policy oracle. ... Algorithm 2 Graph Learner for simple CID
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	No	The paper uses 'synthetic data' and 'simulated data' generated randomly (e.g., 'randomly generated CBNs'), not a publicly available dataset with a specific access link or citation.
Dataset Splits	No	The paper uses simulated and randomly generated data. It does not specify fixed training, validation, or test splits in the conventional sense used for fixed datasets. Results are averaged over 'randomly generated environments'.
Hardware Specification	No	The paper does not mention any specific hardware specifications (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependency details, such as library names with version numbers, for replicating the experiments.
Experiment Setup	No	The paper describes the simulation logic for the regret-bounded agent and how CBNs are randomly generated, but it does not provide specific hyperparameters (e.g., learning rate, batch size, epochs) or detailed system-level training settings commonly found in machine learning experimental setups.