Learning with Average Top-k Loss

Authors: Yanbo Fan, Siwei Lyu, Yiming Ying, Baogang Hu

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the applicability of minimum average top-k learning for binary classification and regression using synthetic and real datasets. We perform extensive experiments to validate the effectiveness of the MATk learning. In this section, we demonstrate the behaviors of MATk learning coupled with different individual losses for binary classification and regression on synthetic and real datasets
Researcher Affiliation Academia 1Department of Computer Science, University at Albany, SUNY 2Department of Mathematics and Statistics, University at Albany, SUNY 3National Laboratory of Pattern Recognition, CASIA 4University of Chinese Academy of Sciences (UCAS) {yanbo.fan,hubg}@nlpr.ia.cn, slyu@albany.edu, yying@albany.edu
Pseudocode Yes Algorithm: The reformulation of the ATk loss in Eq.(2) also facilitates development of optimization algorithms for the minimum ATk learning. As practical supervised learning problems usually use a parametric form of f, as f(x; w), where w is the parameter, the corresponding minimum ATk objective becomes min w,λ 0 1 n Pi=1 [ℓ(f(xi; w), yi) λ]+ + Ω(w) + k nλ (3) It is not hard to see that if ℓ(f(x; w), y) is convex with respect to w, the objective function of in Eq.(3) is a convex function for w and λ jointly. This leads to an immediate stochastic (projected) gradient descent [3, 21] for solving (3). For instance, with Ω(w) = 1 2C w 2, where C > 0 is a regularization factor, at the t-th iteration, the corresponding MATk objective can be minimized by first randomly sampling (xit, yit) from the training set and then updating the parameters as w(t+1) w(t) ηt wℓ(f(xit; w(t)), yit) I[ℓ(f(xit;w(t)),yit)>λ(t)] + w(t) λ(t+1) h λ(t) ηt k n I[ℓ(f(xit;w(t),yit)>λ(t)] i where wℓ(f(x; w), y) denotes the sub-gradient with respect to w, and ηt 1 t is the step size.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It cites external libraries like LIBSVM, but does not provide its own code.
Open Datasets Yes We conduct experiments on binary classification using eight benchmark datasets from the UCI3 and KEEL4 data repositories... 3https://archive.ics.uci.edu/ml/datasets.html 4http://sci2s.ugr.es/keel/datasets.php
Dataset Splits Yes For each dataset, we randomly sample 50%, 25%, 25% examples as training, validation and testing sets, respectively.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes During training, we select parameters C (regularization factor) and k (number of top losses) on the validation set. Parameter C is searched on grids of log10 scale in the range of [10 5, 105] (extended when optimal value is on the boundary), and k is searched on grids of log10 scale in the range of [1, n].