Locally Adaptive Label Smoothing Improves Predictive Churn
Authors: Dara Bahri, Heinrich Jiang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we present several baselines for reducing churn and show that training on soft labels obtained by adaptively smoothing each example s label based on the example s neighboring labels often outperforms the baselines on churn while improving accuracy on a variety of benchmark classification tasks and model architectures. We now describe the experimental methodology and results for validating our proposed method. |
| Researcher Affiliation | Industry | Google Research, Mountain View, USA. Correspondence to: Dara Bahri <dbahri@google.com>. |
| Pseudocode | Yes | Algorithm 1 Deep k-NN locally adaptive label smoothing |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the code for the described methodology. |
| Open Datasets | Yes | MNIST:, Fashion MNIST:, SVHN:, Celeb A (Liu et al., 2018), UCI Phishing dataset (Dua & Graff, 2017). |
| Dataset Splits | No | We use the standard train and test splits, which consist of 162770 and 19962 images respectively. (for CelebA) and 7406 train and 3649 test examples (for Phishing). (The paper explicitly mentions train and test splits with specific counts or standard use, but it does not provide explicit details for a separate validation split.) |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments, only mentioning general settings like optimizer and minibatch size. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and specific model architectures like 'Le Net5 CNN' but does not provide specific version numbers for software libraries or dependencies (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | For all datasets we use the Adam optimizer with default learning rate 0.001. We use a minibatch size of 128 throughout. (Also includes specific epoch counts and hidden unit details for each dataset, e.g., 'three-layer MLP with 256 hidden units and Re LU activations for 20 epochs' for MNIST). |