Self-cognitive Denoising in the Presence of Multiple Noisy Label Sources

Authors: Yi-Xuan Sun, Ya-Lin Zhang, Bin Han, Longfei Li, Jun Zhou

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments on various datasets demonstrate the superiority of our method. We empirically validate the proposed method across various datasets and demonstrate that our method surpasses other competing approaches in performance.
Researcher Affiliation Industry 1Ant Group, Hangzhou, China. Correspondence to: Jun Zhou <jun.zhoujun@antgroup.com>.
Pseudocode Yes Algorithm 1 Self-cognitive Denoising for Multiple noisy label sources (SDM)
Open Source Code No The paper does not provide a link or explicit statement about the availability of open-source code for the described methodology.
Open Datasets Yes Six benchmark datasets are adopted in experiments, i.e., Yelp, IMDb, Ag News (AN), SVHN, MNIST, and Bank. Details about these datasets are given in Appendix B.1 due to space constraints. For these NLP datasets, we utilize BERT (Devlin et al., 2018) to pre-extract features... For this dataset, we utilize Res Net-18 (He et al., 2016) to pre-extract features... For this dataset, we utilize Le Net (Le Cun et al., 1998) to pre-extract features... This tabular dataset is from the UCI repository (Markelle et al., 2013).
Dataset Splits Yes For each dataset, we use 70% for training, 25% for testing, and 5% for validation.
Hardware Specification No The paper does not specify the hardware (e.g., GPU models, CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software like Adam, Light GBM, BERT, ResNet-18, and LeNet, but it does not specify version numbers for these components, which is necessary for reproducibility.
Experiment Setup Yes During training, we use Adam (Kingma & Ba, 2015) with an initial learning rate of 0.001, a batch size of 256, and training epochs of 100 for both GΘ and gθ. We set the hyperparameters P = 80, T = 0.1, t0 = 20, λ = 0.9, α = β = 1 in all the experiments. For the multi-tower neural networks GΘ, we use a 3-layer MLP with hidden dimension 128, whose first layer extracts the public features among noisy label sources, and the other two layers are constructed with s = 4 towers to model the information of each source. Similarly, we use a simple 3-layer MLP with hidden dimension 128 to construct the single-tower neural network gθ.