Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Missingness Bias in Model Debugging
Authors: Saachi Jain, Hadi Salman, Eric Wong, Pengchuan Zhang, Vibhav Vineet, Sai Vemprala, Aleksander Madry
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To systematically measure the impacts of missingness bias, we iteratively remove subregions from the input and analyze the types of mistakes that our models make. See Appendix A for experimental details. We perform an extensive study across various: architectures (Appendix C.3), missingness approximations (Appendix C.4), subregion sizes (Appendix C.5), subregion shapes: patches vs superpixels (Appendix C.6), and datasets (Appendix E). |
| Researcher Affiliation | Collaboration | 1Massachusetts Institute of Technology 2Microsoft Research |
| Pseudocode | No | The paper describes experimental procedures and methods in paragraph text, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/madrylab/missingness. |
| Open Datasets | Yes | We train our models on Image Net (Russakovsky et al., 2015), with a custom (research, non-commercial) license, as found here https://paperswithcode.com/dataset/ imagenet. |
| Dataset Splits | Yes | For all experiments in this paper, we consider 10,000 image subset of the original Image Net validation set (we take every 5th image). |
| Hardware Specification | Yes | For Image Net, we train our models on 4 V100 GPUs each, and training took around 12 hours for Res Net-18 and Vi T-T, and around 20 hours for Res Net50 and Vi T-S. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For Res Nets, we train using SGD with batch size of 512, momentum of 0.9, and weight decay of 1e-4. We train for 90 epochs with an initial learning rate of 0.1 that drops by a factor of 10 every 30 epochs. |