Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization
Authors: Mirco Mutti, Riccardo De Santi, Emanuele Rossi, Juan Felipe Calderon, Michael Bronstein, Marcello Restelli
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Numerical Validation We empirically validate the theoretical findings of this work by experimenting on a synthetic example where each environment is a person, and the MDP represents how a series of actions the person can take influences their weight (W) and academic performance (A). |
| Researcher Affiliation | Collaboration | 1Politecnico di Milano 2Universit a di Bologna 3ETH Zurich 4Imperial College London 5Twitter 6University of Oxford |
| Pseudocode | Yes | Algorithm 1 Causal Transition Model Estimation, Algorithm 2 MDP Causal Structure Estimation, Algorithm 3 MDP Bayesian Network Estimation |
| Open Source Code | No | The paper states 'The appendix of this paper can be found at https://arxiv.org/abs/2202.06545.', which points to the paper's appendix, not an explicit code repository or a statement about code release for the methodology. |
| Open Datasets | No | The paper states 'We empirically validate the theoretical findings of this work by experimenting on a synthetic example...' and refers to 'Appendix B for details on how transition models of different environments are generated.' This indicates a generated dataset, but no concrete access information (link, DOI, repository) for public availability is provided. |
| Dataset Splits | No | The paper mentions using 'A class M of 3 environments is used to estimate the causal transition model' for validation, but does not provide specific percentages or counts for training, validation, or test splits of the data. |
| Hardware Specification | No | The paper describes its numerical validation in Section 5 but does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks) used for the experiments. |
| Experiment Setup | No | The paper states 'All experiments are repeated 10 times' and mentions sample counts K from Algorithm 1, but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, epochs) or optimizer settings. |