Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization
Authors: Yivan Zhang, Gang Niu, Masashi Sugiyama
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the effectiveness of the proposed method through experiments on benchmark and real-world datasets. |
| Researcher Affiliation | Academia | 1The University of Tokyo, Japan 2RIKEN AIP, Japan. |
| Pseudocode | No | The paper describes algorithmic steps in prose (e.g., Dirichlet posterior update) but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that its source code is open-source or publicly available. |
| Open Datasets | Yes | We evaluated our method on three image classification datasets, namely MNIST (Le Cun et al., 1998), CIFAR10, and CIFAR-100 (Krizhevsky, 2009). We also evaluated our method on a real-world noisy label dataset, Clothing1M (Xiao et al., 2015). |
| Dataset Splits | No | The paper uses well-known datasets that have standard splits but does not explicitly state the training, validation, and test split percentages or sample counts within the text, nor does it cite a specific source for these predefined splits. |
| Hardware Specification | Yes | We implemented data-parallel distributed training on 64 NVIDIA Tesla P100 GPUs by Py Torch (Paszke et al., 2019). |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2019)' but does not specify a version number for PyTorch or any other software libraries or dependencies, which is necessary for reproducibility. |
| Experiment Setup | Yes | For the gradient-based estimation, we initialized the unconstrained matrix with diagonal elements of log(0.5) and off-diagonal elements of log(0.5/(K 1)), so after applying softmax the diagonal elements are 0.5. For the Dirichlet posterior update method, we initialized the concentration matrix with diagonal elements of 10 for MNIST and 100 otherwise and off-diagonal elements of 0. We set β = (0.999, 0.01) and γ = 0.1. We sampled 512 (the same as the batch size) pairs in each batch to calculate the pairwise total variation distance. Other hyperparameters are provided in Appendix E. |