reproducibilityindex.ai

Smoke and Mirrors in Causal Downstream Tasks

Authors: Riccardo Cadei, Lukas Lindorfer, Sylvia Cremer, Cordelia Schmid, Francesco Locatello

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To test the practical impact of these considerations, we recorded ISTAnt, the first real-world benchmark for causal inference downstream tasks on high-dimensional observations as an RCT studying how garden ants (Lasius neglectus) respond to microparticles applied onto their colony members by hygienic grooming. Comparing 6 480 models fine-tuned from state-of-the-art visual backbones, we find that the sampling and modeling choices significantly affect the accuracy of the causal estimate, and that classification accuracy is not a proxy thereof. We further validated the analysis, repeating it on a synthetically generated visual data set controlling the causal model.
Researcher Affiliation	Academia	Riccardo Cadei1, Lukas Lindorfer1, Sylvia Cremer1, Cordelia Schmid2, and Francesco Locatello1 1Institute of Science and Technology Austria 2Inria, Ecole normale supérieure, CNRS, PSL Research University
Pseudocode	No	The paper describes methods and theoretical derivations in text and equations, but it does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code	Yes	Code: https://github.com/CausalLearningAI/ISTAnt
Open Datasets	Yes	To test the practical impact of these considerations, we recorded ISTAnt, the first real-world benchmark for causal inference downstream tasks on high-dimensional observations... Data: https://doi.org/10.6084/m9.figshare.26484934.v2... Since our ground-truth estimate of the causal effect depends on the trial s design, we propose a new synthetic benchmark based on MNIST [Le Cun, 1998] controlling for the causal model, and we replicated the analysis.
Dataset Splits	Yes	For validation (used to generate the Figure 6) we consider 1 000 random frames from Du.
Hardware Specification	Yes	We run all the analyses using 48GB of RAM, 20 CPU cores, and a single node GPU (NVIDIA Ge Force RTX2080Ti). (Appendix D. Detailed Experimental Settings) ... We run all the analysis using 10GB of RAM, 8 CPU cores, and a single node GPU (NVIDIA Ge Force RTX2080Ti). (Appendix E.3 Results for Causal MNIST)
Software Dependencies	No	The paper mentions 'Adam optimizer' and 'BCELoss' which are common machine learning components. However, it does not specify software dependencies with version numbers (e.g., 'PyTorch 1.x', 'Python 3.x').
Experiment Setup	Yes	Modeling We modeled f as a composition of a freezed pre-trained encoder e and a multi-layers perceptron h fine-tuned on Ds. For the encoder, we compared six different established Vision Transformers (Vi T), mainly varying the training procedure... For each representation extracted we trained different heads, varying the number of hidden layers (1 or 2 layers with 256 nodes each with Re LU activation), learning rates (0.05, 0.005, 0.0005) for Adam optimizer [Kingma and Ba, 2014] (10 epochs) and target... For each configuration, we repeated the training with five different random seeds. A summary of the architectures and training description is in Appendix D.2.