Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment

Authors: Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Bautista Martin, Shih-Yu Sun, Carlos Guestrin, Josh Susskind

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show how this formulation improves performance by simultaneously optimizing the evaluation metric and smoothing the loss landscape. We verify our method in metric learning and classification scenarios, showing considerable improvements over the state-of-the-art on a diverse set of tasks.
Researcher Affiliation Industry 1Apple Inc., Cupertino, United States. Correspondence to: Chen Huang <chen-huang@apple.com>.
Pseudocode Yes Algorithm 1 Reinforcement Learning for ALA
Open Source Code No The paper does not provide any specific statements about releasing source code, nor does it include links to a code repository.
Open Datasets Yes We experiment on CIFAR-10 (Krizhevsky, 2009) with 50k images for training and 10k images for testing. ... The SOP dataset (Song et al., 2016), and face recognition (FR) experiments on the LFW dataset (Huang et al., 2007). ... Image Net (Deng et al., 2009) classifier
Dataset Splits Yes For training a loss controller, we divide the training set randomly into a new training set of 40k images and a validation set of 10k images. ... The SOP dataset contains 120,053 images of 22,634 categories. The first 10,000 and 1,318 categories are used for training and validation, and the remaining are used for testing.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments, only general optimization and architecture details.
Software Dependencies No The paper mentions optimizers like 'Momentum-SGD' and 'RMSProp optimizer' but does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their specific versions) required for replication.
Experiment Setup Yes The ALA controller is instantiated as an MLP consisting of 2 hidden layers each with 32 Re LU units. Our state st includes a sequence of validation statistics observed from past 10 time-steps. We use a learning rate of 0.001 for policy learning. Training episodes are collected from all child networks every K = 200 gradient descent iterations. We set the discount factor γ = 0.9 (Equation 4), loss parameter updating step β = 0.1 and distance offset α = 1 (Equation 10).