Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations
Authors: Badr Youbi Idrissi, Diane Bouchacourt, Randall Balestriero, Ivan Evtimov, Caner Hazirbas, Nicolas Ballas, Pascal Vincent, Michal Drozdzal, David Lopez-Paz, Mark Ibrahim
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Equipped with Image Net-X, we investigate 2,200 current recognition models and study the types of mistakes as a function of model s (1) architecture e.g. transformer vs. convolutional , (2) learning paradigm e.g. supervised vs. self-supervised , and (3) training procedures e.g. data augmentation. |
| Researcher Affiliation | Industry | Fundamental AI Research (FAIR), Meta AI EMAIL |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | We release all the Image Net-X annotations along with an open-source toolkit to probe existing or new models failure types. The data and code are available at https://facebookresearch.github.io/imagenetx/site/home. |
| Open Datasets | Yes | To address this need, we introduce Image Net-X a set of sixteen human annotations of factors such as pose, background, or lighting for the entire Image Net-1k validation set as well as a random subset of 12k training images. |
| Dataset Splits | Yes | Image Net-X contains human annotations for each of the 50,000 images in the validation set of the Image Net dataset and 12,000 random sample from the training set. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were provided. |
| Software Dependencies | No | The paper mentions data preprocessing using 'Pandas and Numpy, both freely available Python packages', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Each run across all policies share the exact same optimizer (SGD), weight-decay (1e-5), mini-batch size (512), number of epochs (80), and data ordering through training. |