Training Deep Neural Networks via Direct Loss Minimization
Authors: Yang Song, Alexander Schwing, Richard, Raquel Urtasun
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach in the context of maximizing average precision for ranking problems. Towards this goal, we develop a novel dynamic programming algorithm that can efficiently compute the weight updates. Our approach proves superior to a variety of baselines in the context of action classification and object detection, especially in the presence of label noise. |
| Researcher Affiliation | Academia | Yang Song SONGYANG12@MAILS.TSINGHUA.EDU.CN Dept. of Physics, Tsinghua University, Beijing 100084, China Alexander G. Schwing ASCHWING@CS.TORONTO.EDU Richard S. Zemel ZEMEL@CS.TORONTO.EDU Raquel Urtasun URTASUN@CS.TORONTO.EDU Dept. of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada |
| Pseudocode | Yes | Figure 1. Our algorithm for direct loss minimization. Figure 2. Our algorithm for AP loss-augmented maximization or minimization. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | In the next experiment we use the PASCAL VOC2012 action classification dataset provided by Everingham et al. (2014). For object detection we use the PASCAL VOC2012 object detection dataset collected by Everingham et al. (2014). |
| Dataset Splits | Yes | We then randomly divide the generated data into a training set containing 10,000 elements and a test set containing the rest. For each of the 10 target classes, we divide the trainval dataset into equal-sized training, validation and test sets. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts) are provided for the experimental setup. |
| Software Dependencies | No | The paper mentions using specific network architectures (e.g., Krizhevsky et al. (2012)), but does not provide version numbers for any software libraries or frameworks used. |
| Experiment Setup | Yes | For all algorithms we used the entire available training set in a single batch and performed 300 iterations. For our final results, we use a learning rate of 0.1, a regularization parameter of 1 10 7, and ϵ = 0.1 for all classes. |