Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization

Authors: Mirco Mutti, Riccardo De Santi, Emanuele Rossi, Juan Felipe Calderon, Michael Bronstein, Marcello Restelli

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Numerical Validation We empirically validate the theoretical findings of this work by experimenting on a synthetic example where each environment is a person, and the MDP represents how a series of actions the person can take influences their weight (W) and academic performance (A).
Researcher Affiliation Collaboration 1Politecnico di Milano 2Universit a di Bologna 3ETH Zurich 4Imperial College London 5Twitter 6University of Oxford
Pseudocode Yes Algorithm 1 Causal Transition Model Estimation, Algorithm 2 MDP Causal Structure Estimation, Algorithm 3 MDP Bayesian Network Estimation
Open Source Code No The paper states 'The appendix of this paper can be found at https://arxiv.org/abs/2202.06545.', which points to the paper's appendix, not an explicit code repository or a statement about code release for the methodology.
Open Datasets No The paper states 'We empirically validate the theoretical findings of this work by experimenting on a synthetic example...' and refers to 'Appendix B for details on how transition models of different environments are generated.' This indicates a generated dataset, but no concrete access information (link, DOI, repository) for public availability is provided.
Dataset Splits No The paper mentions using 'A class M of 3 environments is used to estimate the causal transition model' for validation, but does not provide specific percentages or counts for training, validation, or test splits of the data.
Hardware Specification No The paper describes its numerical validation in Section 5 but does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks) used for the experiments.
Experiment Setup No The paper states 'All experiments are repeated 10 times' and mentions sample counts K from Algorithm 1, but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, epochs) or optimizer settings.