Locally Adaptive Label Smoothing Improves Predictive Churn

Authors: Dara Bahri, Heinrich Jiang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental we present several baselines for reducing churn and show that training on soft labels obtained by adaptively smoothing each example s label based on the example s neighboring labels often outperforms the baselines on churn while improving accuracy on a variety of benchmark classification tasks and model architectures. We now describe the experimental methodology and results for validating our proposed method.
Researcher Affiliation Industry Google Research, Mountain View, USA. Correspondence to: Dara Bahri <dbahri@google.com>.
Pseudocode Yes Algorithm 1 Deep k-NN locally adaptive label smoothing
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of the code for the described methodology.
Open Datasets Yes MNIST:, Fashion MNIST:, SVHN:, Celeb A (Liu et al., 2018), UCI Phishing dataset (Dua & Graff, 2017).
Dataset Splits No We use the standard train and test splits, which consist of 162770 and 19962 images respectively. (for CelebA) and 7406 train and 3649 test examples (for Phishing). (The paper explicitly mentions train and test splits with specific counts or standard use, but it does not provide explicit details for a separate validation split.)
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments, only mentioning general settings like optimizer and minibatch size.
Software Dependencies No The paper mentions using 'Adam optimizer' and specific model architectures like 'Le Net5 CNN' but does not provide specific version numbers for software libraries or dependencies (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes For all datasets we use the Adam optimizer with default learning rate 0.001. We use a minibatch size of 128 throughout. (Also includes specific epoch counts and hidden unit details for each dataset, e.g., 'three-layer MLP with 256 hidden units and Re LU activations for 20 epochs' for MNIST).