Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment
Authors: Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Bautista Martin, Shih-Yu Sun, Carlos Guestrin, Josh Susskind
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show how this formulation improves performance by simultaneously optimizing the evaluation metric and smoothing the loss landscape. We verify our method in metric learning and classification scenarios, showing considerable improvements over the state-of-the-art on a diverse set of tasks. |
| Researcher Affiliation | Industry | 1Apple Inc., Cupertino, United States. Correspondence to: Chen Huang <chen-huang@apple.com>. |
| Pseudocode | Yes | Algorithm 1 Reinforcement Learning for ALA |
| Open Source Code | No | The paper does not provide any specific statements about releasing source code, nor does it include links to a code repository. |
| Open Datasets | Yes | We experiment on CIFAR-10 (Krizhevsky, 2009) with 50k images for training and 10k images for testing. ... The SOP dataset (Song et al., 2016), and face recognition (FR) experiments on the LFW dataset (Huang et al., 2007). ... Image Net (Deng et al., 2009) classifier |
| Dataset Splits | Yes | For training a loss controller, we divide the training set randomly into a new training set of 40k images and a validation set of 10k images. ... The SOP dataset contains 120,053 images of 22,634 categories. The first 10,000 and 1,318 categories are used for training and validation, and the remaining are used for testing. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments, only general optimization and architecture details. |
| Software Dependencies | No | The paper mentions optimizers like 'Momentum-SGD' and 'RMSProp optimizer' but does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their specific versions) required for replication. |
| Experiment Setup | Yes | The ALA controller is instantiated as an MLP consisting of 2 hidden layers each with 32 Re LU units. Our state st includes a sequence of validation statistics observed from past 10 time-steps. We use a learning rate of 0.001 for policy learning. Training episodes are collected from all child networks every K = 200 gradient descent iterations. We set the discount factor γ = 0.9 (Equation 4), loss parameter updating step β = 0.1 and distance offset α = 1 (Equation 10). |