Learning the Latent Causal Structure for Modeling Label Noise

Authors: Yexiong Lin, Yu Yao, Tongliang Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we report the experiment results of our method. We first compare the effectiveness of the proposed data-generating process with existing methods. We then compare the estimation error of noise transition matrices with other methods and the classification performance of the proposed method with that of state-of-the-art methods on synthetic and real-world noisy datasets.
Researcher Affiliation Academia Yexiong Lin Yu Yao Tongliang Liu Sydney AI Centre, The University of Sydney
Pseudocode Yes Algorithm 1 CSGN
Open Source Code Yes The code is available at: https://github.com/tmllab/2024_NeurIPS_CSGN.
Open Datasets Yes We empirically verify the performance of our method on three synthesis datasets, i.e., Fashion-MNIST [51], CIFAR-10 [26], CIFAR-100 [26], and two real-world datasets, i.e., CIFAR-N [48] and Webvision [29].
Dataset Splits Yes We follow the previous work [7] to train the model on the first 50 classes of the Google image subset and test the model on the Web Vision validation set and the Image Net ILSVRC12 validation set.
Hardware Specification Yes We implement our algorithm using PyTorch and conduct experiments on eight RTX-3090 GPUs.
Software Dependencies No The paper mentions using "PyTorch" but does not specify a version number for it or any other software libraries or dependencies used in the experiments.
Experiment Setup Yes The initial learning rate for SGD was set at 0.02 and for Adam at 0.001. Our networks were trained for 200 epochs with a batch size of 64. Both learning rates were reduced by a factor of 10 after 100 epochs. For experiments on Web Vision, we changed the weight decay of SGD to 0.001. The initial learning rate for SGD was set at 0.04 and for Adam at 0.004. Other parameters of optimizers remain unchanged. Our networks were trained for 80 epochs with a batch size of 16. Both learning rates were reduced by a factor of 10 after 40 epochs.