reproducibilityindex.ai

Don’t fear the unlabelled: safe semi-supervised learning via debiasing

Authors: Hugo Schmutz, Olivier HUMBERT, Pierre-Alexandre Mattei

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate debiased versions of different existing SSL methods, such as the Pseudolabel method and Fixmatch, and show that debiasing can compete with classic deep SSL techniques in various settings by providing better calibrated models. Additionally, we provide a theoretical explanation of the intuition of the popular SSL methods. 4 EXPERIMENTS We evaluate the performance of De SSL against different classic methods. In particular, we perform experiments with varying on different datasets MNIST, Derma MNIST, CIFAR-10, CIFAR-100 (Krizhevsky, 2009) and five small datasets of Med MNIST (Yang et al., 2021; 2023) with a fixed .
Researcher Affiliation	Academia	Hugo Schmutz hugo.schmutz@inria.fr Olivier Humbert Pierre-Alexandre Mattei Universit C t d Azur, Inria, Maasai, LJAD, CNRS, Nice, France Universit C t d Azur, TIRO-MATOs, UMR E 4320 CEA, Nice, France Centre Antoine Lacassagne, Nice, France
Pseudocode	No	The paper describes algorithms and methods through text and mathematical equations, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	An implementation of a debiased version of Fixmatch is available at https://github.com/Hugo Schmutz/De Fixmatch
Open Datasets	Yes	We evaluate the performance of De SSL against different classic methods. In particular, we perform experiments with varying on different datasets MNIST, Derma MNIST, CIFAR-10, CIFAR-100 (Krizhevsky, 2009) and five small datasets of Med MNIST (Yang et al., 2021; 2023) with a fixed .
Dataset Splits	Yes	We test the influence of the hyperparameters and report the accuracy, the cross-entropy and the expected calibration error (ECE, Guo et al., 2017) at the epoch of best validation accuracy using 10% of nl as the validation set. De Fixmatch also performs better than Fixmatch on STL-10 using 4200 labelled data for training and 800 for validation.
Hardware Specification	No	Deep Learning experiments of this work required approximately 11,250 hours of GPU computation. In particular, Fixmatch was trained using 4 GPUs. The paper does not specify the specific model or type of GPUs used, nor other hardware details like CPU, memory, or specific cloud instances.
Software Dependencies	No	Python Van Rossum & Drake Jr (1995) Py Torch Paszke et al. (2019) Tensor Flow Abadi et al. (2015) Scikit-learn Pedregosa et al. (2011) Seaborn Waskom et al. (2017) Python imaging library Lundh et al. (2012) Numpy Harris et al. (2020) Pandas Mc Kinney et al. (2010) Rand Augment Cubuk et al. (2020). The paper lists software libraries and their respective publication citations, but it does not provide specific version numbers for these software components (e.g., 'PyTorch 1.9' or 'Scikit-learn 0.24'), which are crucial for reproducibility.
Experiment Setup	Yes	We train a Le Net-like architecture using nl = 1000 labelled data on 10 different splits of the training dataset into a labelled and unlabelled set. We train a CNN-13 from Tarvainen & Valpola (2017) on 5 different splits. We use nl = 4000 and use the rest of the dataset as unlabelled. We used = 1 and a confidence threshold for Pseudo-label = 0.70. We optimised the model s weights using a stochastic gradient descent (SGD) optimiser with a learning rate of 0.1. L( ; x, y) = 1/2 Ex1 weak(x)[ log(p (y\|x1))] + Ex2 strong(x)[ log(p (y\|x2))] , (46) where x1 is a weak augmentation of x and x2 is a strong augmentation. This modification encourages us to choose = 1/2 as the original Fixmatch implementation used = 1.