Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robust Classification via Regression for Learning with Noisy Labels
Authors: Erik Englesson, Hossein Azizpour
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments, that increases our understanding of the method and shows its strong performance compared to baselines on several datasets (Section 4). |
| Researcher Affiliation | Academia | Erik Englesson, Hossein Azizpour KTH Royal Institute of Technology EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at: github.com/Erik Englesson/SGN. |
| Open Datasets | Yes | We conduct experiments on the CIFAR-N (Wei et al., 2021b), Clothing1M (Xiao et al., 2015), and (mini) Web Vision Li et al. (2017) datasets. |
| Dataset Splits | Yes | For the experiments on the CIFAR (including CIFAR-N) datasets, we implement all baselines in the same shared code base to have an as conclusive comparison as possible. To achieve the best possible performance in this setup, we do a search for method-specific hyperparameters for each method based on a noisy validation set. |
| Hardware Specification | No | All experiments were performed using the supercomputing resource Berzelius provided by the National Supercomputer Centre at Linköping University and the Knut and Alice Wallenberg foundation. |
| Software Dependencies | No | The paper mentions using "Tensor Flow Probability (Dillon et al., 2017)" but does not specify a version number for this or any other software dependency. |
| Experiment Setup | Yes | All methods use the same Wide Res Net (WRN-28-2) architecture, with a constant learning rate (0.01), SGD with momentum (0.9) and weight decay (5e-4), batch size of 128, and standard data augmentation (crop and flip). We used 300 training epochs, but found that the baselines that estimate shifts/labels, ELR (Liu et al., 2020), SOP (Liu et al., 2022), NAL (Lu et al., 2022), and ours, benefited from more training epochs and were trained for twice as long. |