Can Less be More? When Increasing-to-Balancing Label Noise Rates Considered Beneficial

Authors: Yang Liu, Jialu Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We formally establish the effectiveness of the proposed solution and demonstrate it with extensive experiments. In order to verify the power of our increasing-to-balancing method, we conduct extensive experiments on both unconstrained learning and constrained learning settings.
Researcher Affiliation Academia Yang Liu Computer Science and Engineering University of California, Santa Cruz Santa Cruz, CA 99064 yangliu@ucsc.edu Jialu Wang Computer Science and Engineering University of California, Santa Cruz Santa Cruz, CA 99064 faldict@ucsc.edu
Pseudocode Yes We show pseudocode for an implementation of estimating PA in Figure ?? in Appendix ??. We summarize NOISE+ in Algorithm 1. Figure 3: Pseudocode for Flip. Flip takes the dataset and a small probability ϵ as input, and only flips positive examples with probability ϵ.
Open Source Code Yes The code for reproducing the experimental results is available at https://github.com/UCSC-REAL/Can Less Be More.
Open Datasets Yes The datasets include: the UCI Adult Income dataset [9], the Compas recidivism dataset [2], Fairface [15] face attribute dataset, and CIFAR 10 [16] dataset.
Dataset Splits No The paper mentions training and testing sets, but it does not provide specific details on validation splits, such as percentages, sample counts, or references to predefined validation sets.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or types of computing instances used for running the experiments. It only mentions implementing models and training them.
Software Dependencies No The paper mentions various models and losses (e.g., one-layer perceptron, cross entropy, peer loss, MLP, Res Net-50, vision transformer) but does not specify any software libraries or their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup No The paper mentions types of models used (e.g., one-layer perceptron, MLP, ResNet-50) and that experiments were repeated 5 runs with different random seeds. However, it lacks specific details on hyperparameters such as learning rate, batch size, number of epochs, or optimizer settings, which are crucial for full reproducibility. It states 'Without a careful tuning of training parameters,' implying these details are not provided.