Training Deep Neural Networks via Direct Loss Minimization

Authors: Yang Song, Alexander Schwing, Richard, Raquel Urtasun

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach in the context of maximizing average precision for ranking problems. Towards this goal, we develop a novel dynamic programming algorithm that can efficiently compute the weight updates. Our approach proves superior to a variety of baselines in the context of action classification and object detection, especially in the presence of label noise.
Researcher Affiliation Academia Yang Song SONGYANG12@MAILS.TSINGHUA.EDU.CN Dept. of Physics, Tsinghua University, Beijing 100084, China Alexander G. Schwing ASCHWING@CS.TORONTO.EDU Richard S. Zemel ZEMEL@CS.TORONTO.EDU Raquel Urtasun URTASUN@CS.TORONTO.EDU Dept. of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
Pseudocode Yes Figure 1. Our algorithm for direct loss minimization. Figure 2. Our algorithm for AP loss-augmented maximization or minimization.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes In the next experiment we use the PASCAL VOC2012 action classification dataset provided by Everingham et al. (2014). For object detection we use the PASCAL VOC2012 object detection dataset collected by Everingham et al. (2014).
Dataset Splits Yes We then randomly divide the generated data into a training set containing 10,000 elements and a test set containing the rest. For each of the 10 target classes, we divide the trainval dataset into equal-sized training, validation and test sets.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) are provided for the experimental setup.
Software Dependencies No The paper mentions using specific network architectures (e.g., Krizhevsky et al. (2012)), but does not provide version numbers for any software libraries or frameworks used.
Experiment Setup Yes For all algorithms we used the entire available training set in a single batch and performed 300 iterations. For our final results, we use a learning rate of 0.1, a regularization parameter of 1 10 7, and ϵ = 0.1 for all classes.