Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
NeuronTune: Towards Self-Guided Spurious Bias Mitigation
Authors: Guangtao Zheng, Wenqian Ye, Aidong Zhang
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments across different architectures and data modalities demonstrate that our method significantly mitigates spurious bias in a self-guided way. Experiments on vision and text datasets with different model architectures confirm the effectiveness of our method. (Section 1, Introduction) Section 5 is titled "Experiments" and includes details on datasets (Waterbirds, Celeb A, Image Net-9, Image Net-A, Multi NLI, Civil Comments), experimental setup, and comparison tables (Table 1, 2, 3, 4, 5). |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Virginia, Charlottesville, VA, USA. Correspondence to: Guangtao Zheng <EMAIL>. |
| Pseudocode | No | The paper describes the practical implementation of Neuron Tune in Section 4.2 with a flowchart in Figure 1. However, it does not include a clearly labeled pseudocode or algorithm block with structured steps. |
| Open Source Code | Yes | Code is available at https://github.com/gtzheng/Neuron Tune. |
| Open Datasets | Yes | We tested Neuron Tune on four image datasets and two text datasets, each with different types of spurious attributes. (1) Waterbirds (Sagawa et al., 2019) is an image dataset... (2) Celeb A (Liu et al., 2015) is a large-scale image dataset... (3) Image Net-9 (Xiao et al., 2021) is a subset of Image Net (Deng et al., 2009)... (4) Image Net-A (Hendrycks et al., 2021) is a dataset of real-world images... (5) Multi NLI (Williams et al., 2018) is a text classification dataset... (6) Civil Comments (Borkan et al., 2019) is a binary text classification dataset... |
| Dataset Splits | Yes | The dataset uses standard splits provided by the WILDS benchmark (Koh et al., 2021). (Section 5.1) Table 8. Numbers of samples in different groups and different splits of the four datasets. (Appendix A.7) We divided Dval into two equal halves: one half (denoted as Dval/2) was used as DIde, while the other half served as DTune. (Section 5.5) |
| Hardware Specification | No | The paper mentions using Res Net-50, Res Net-18, and BERT models, which are deep neural networks, implying the use of powerful computational resources. However, it does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for conducting the experiments. |
| Software Dependencies | No | The paper mentions using 'Res Net-50', 'Res Net-18', and 'BERT model' as backbones and 'SGD', 'Adam W' as optimizers. It also refers to 'Cosine Annealing' and 'Linear' learning rate schedulers. However, it does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers, nor the version of the programming language used. |
| Experiment Setup | Yes | Table 9. Hyperparameters for ERM training. (Appendix A.8) Table 10. Hyperparameters for Neuron Tune. (Appendix A.8) These tables provide specific details such as initial learning rate, number of epochs, learning rate scheduler, optimizer, backbone, weight decay, and batch size for different datasets. |