Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Dynamical View of the Question of Why
Authors: Mehdi Fatemi, Sindhu C. M. Gowda
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, in fairly intricate experiments and through sheer learning, our framework reveals and quantifies causal links, which otherwise seem inexplicable. and 5 EXPERIMENTS |
| Researcher Affiliation | Collaboration | 1 Microsoft Research, 2 University of Toronto and Vector Institute |
| Pseudocode | No | The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured steps formatted like code. |
| Open Source Code | Yes | Our code and pretrained models to replicate the analysis (including figures) presented in this paper is publicly available at: https://github.com/fatemi/dynamical-causality. |
| Open Datasets | No | For the optimizer, we used Pytorch s implementation of Adam optimizer with the Huber loss function, and the results are obtained after training over 200 epochs of 250,000 steps each (each action is repeated 4 times, hence each epoch involves one million Atari frames). and For the remaining 64 samples, we choose 32 samples uniformly from the train data and append it with 6 uniformly selected event B, hypoglycemia transitions (r = 1 and terminal state = True), 2 uniformly selected hyperglycemia transitions (r = 0 and terminal state = True), and remaining samples are sampled from non-zero action samples. (While "train data" is mentioned, no specific split percentages or methodology for generating this data from the simulator or a larger dataset is provided to ensure reproducibility of the exact data partitions.) |
| Dataset Splits | No | No explicit mention of a 'validation' set or its specific split information (percentages, sample counts, or methodology) was found. |
| Hardware Specification | No | No specific hardware details (such as exact GPU/CPU models, processor types, or memory) used for running the experiments were provided in the paper. |
| Software Dependencies | No | The paper mentions software like Pytorch but does not provide specific version numbers for these or other key software components, which is necessary for reproducible software dependencies. |
| Experiment Setup | Yes | Both our network architecture and the base pipeline have the same structure as the original DQN paper Mnih et al. (2015). In particular, there are 3 convolutional layers followed by 2 fully-connected linear layers... For the optimizer, we used Pytorch s implementation of Adam optimizer with the Huber loss function, and the results are obtained after training over 200 epochs of 250,000 steps each... We use a simple deep network with 3 fully connected layers with GELU (Gaussian error linear unit) activation. In particular, we have 13 30, 30 30, and 30 1 fully connected layers... We use a learning rate of 0.00001 and minibatch size of 128. |