When Optimizing $f$-Divergence is Robust with Label Noise

Authors: Jiaheng Wei, Yang Liu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we validate our analysis of Df measures robustness via a set of empirical evaluations on 5 datasets: MNIST (Le Cun et al. (1998)), Fashion-MNIST (Xiao et al. (2017)), CIFAR-10 and CIFAR-100 (Krizhevsky et al. (2009)), and Clothing1M (Xiao et al. (2015)).
Researcher Affiliation Academia Jiaheng Wei and Yang Liu Department of Computer Science and Engineering University of California, Santa Cruz Santa Cruz, CA 95060, USA {jiahengwei, yangliu}@ucsc.edu
Pseudocode Yes Algorithm 1 Maximizing Df measures: one step gradient
Open Source Code Yes Our code is available at https: //github.com/UCSC-REAL/Robust-f-divergence-measures.
Open Datasets Yes In this section, we validate our analysis of Df measures robustness via a set of empirical evaluations on 5 datasets: MNIST (Le Cun et al. (1998)), Fashion-MNIST (Xiao et al. (2017)), CIFAR-10 and CIFAR-100 (Krizhevsky et al. (2009)), and Clothing1M (Xiao et al. (2015)).
Dataset Splits No The paper mentions training and testing datasets, but does not provide explicit details on the percentage splits or methodology for creating these splits within the provided text. It states 'Omitted experiment details are available in the appendix.'.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks).
Experiment Setup Yes In experiments, since the estimation of product noisy distribution are unstable when trained on CIFAR-100 training dataset, we use CE as a warm-up (120 epochs) and then switch to train with Df measures. ... At step t, update ht by ascending its stochastic gradient with learning rate ηt