Self-cognitive Denoising in the Presence of Multiple Noisy Label Sources
Authors: Yi-Xuan Sun, Ya-Lin Zhang, Bin Han, Longfei Li, Jun Zhou
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments on various datasets demonstrate the superiority of our method. We empirically validate the proposed method across various datasets and demonstrate that our method surpasses other competing approaches in performance. |
| Researcher Affiliation | Industry | 1Ant Group, Hangzhou, China. Correspondence to: Jun Zhou <jun.zhoujun@antgroup.com>. |
| Pseudocode | Yes | Algorithm 1 Self-cognitive Denoising for Multiple noisy label sources (SDM) |
| Open Source Code | No | The paper does not provide a link or explicit statement about the availability of open-source code for the described methodology. |
| Open Datasets | Yes | Six benchmark datasets are adopted in experiments, i.e., Yelp, IMDb, Ag News (AN), SVHN, MNIST, and Bank. Details about these datasets are given in Appendix B.1 due to space constraints. For these NLP datasets, we utilize BERT (Devlin et al., 2018) to pre-extract features... For this dataset, we utilize Res Net-18 (He et al., 2016) to pre-extract features... For this dataset, we utilize Le Net (Le Cun et al., 1998) to pre-extract features... This tabular dataset is from the UCI repository (Markelle et al., 2013). |
| Dataset Splits | Yes | For each dataset, we use 70% for training, 25% for testing, and 5% for validation. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., GPU models, CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like Adam, Light GBM, BERT, ResNet-18, and LeNet, but it does not specify version numbers for these components, which is necessary for reproducibility. |
| Experiment Setup | Yes | During training, we use Adam (Kingma & Ba, 2015) with an initial learning rate of 0.001, a batch size of 256, and training epochs of 100 for both GΘ and gθ. We set the hyperparameters P = 80, T = 0.1, t0 = 20, λ = 0.9, α = β = 1 in all the experiments. For the multi-tower neural networks GΘ, we use a 3-layer MLP with hidden dimension 128, whose first layer extracts the public features among noisy label sources, and the other two layers are constructed with s = 4 towers to model the information of each source. Similarly, we use a simple 3-layer MLP with hidden dimension 128 to construct the single-tower neural network gθ. |