Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Formalizing Spuriousness of Biased Datasets Using Partial Information Decomposition
Authors: Barproda Halder, Faisal Hamman, Pasan Dissanayake, Qiuyi Zhang, Ilia Sucholutsky, Sanghamitra Dutta
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we also perform empirical evaluation to demonstrate the trends of unique, redundant, and synergistic information, as well as our proposed spuriousness measure across 6 benchmark datasets under various experimental settings. We observe an agreement between our preemptive measure of dataset spuriousness and post-training model generalization metrics such as worst-group accuracy, further supporting our proposition. |
| Researcher Affiliation | Collaboration | Barproda Halder EMAIL Department of Electrical and Computer Engineering University of Maryland, College Park [...] Qiuyi Zhang EMAIL Google Research [...] Ilia Sucholutsky EMAIL Department of Computer Science Princeton University |
| Pseudocode | Yes | Algorithm 1: Spuriousness Disentangler: An Autoencoder-Based Explainability Framework |
| Open Source Code | Yes | The code is available at https://github.com/Barproda/spuriousness-disentangler. |
| Open Datasets | Yes | Our evaluation spans six datasets: Waterbird (Wah et al., 2011), Adult (Becker & Kohavi, 1996), Celeb A (Lee et al., 2020), Dominoes (Shah et al., 2020), Spawrious (Lynch et al., 2023), and Colored MNIST (Arjovsky et al., 2019). |
| Dataset Splits | Yes | Table 5: Summary of the datasets (Waterbird Train 3,498 184 56 1,057 Validation 467 466 133 133 Test 2,255 2,255 642 642) |
| Hardware Specification | Yes | All experiments are executed on NVIDIA RTX A4500. |
| Software Dependencies | No | The paper mentions 'DIT package (James et al., 2018)' but does not specify a version number for this or any other software component. |
| Experiment Setup | Yes | The hyperparameters are as follows: a batch size of 64, a learning rate of 0.001, a Cosine Annealing LR scheduler, an Adam optimizer with a weight decay of 0.0001, 50 pretraining epochs, followed by 100 epochs of additional training. When fine-tuning Res Net-50 we use the following hyperparameters: batch size of 64, learning rate of 0.0001, Cosine Annealing LR scheduler, stochastic gradient descent (SGD) optimizer with a weight decay of 0.0001, binary cross-entropy as the loss function, and 100 epochs. |