Learning with Average Top-k Loss
Authors: Yanbo Fan, Siwei Lyu, Yiming Ying, Baogang Hu
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the applicability of minimum average top-k learning for binary classification and regression using synthetic and real datasets. We perform extensive experiments to validate the effectiveness of the MATk learning. In this section, we demonstrate the behaviors of MATk learning coupled with different individual losses for binary classification and regression on synthetic and real datasets |
| Researcher Affiliation | Academia | 1Department of Computer Science, University at Albany, SUNY 2Department of Mathematics and Statistics, University at Albany, SUNY 3National Laboratory of Pattern Recognition, CASIA 4University of Chinese Academy of Sciences (UCAS) {yanbo.fan,hubg}@nlpr.ia.cn, slyu@albany.edu, yying@albany.edu |
| Pseudocode | Yes | Algorithm: The reformulation of the ATk loss in Eq.(2) also facilitates development of optimization algorithms for the minimum ATk learning. As practical supervised learning problems usually use a parametric form of f, as f(x; w), where w is the parameter, the corresponding minimum ATk objective becomes min w,λ 0 1 n Pi=1 [ℓ(f(xi; w), yi) λ]+ + Ω(w) + k nλ (3) It is not hard to see that if ℓ(f(x; w), y) is convex with respect to w, the objective function of in Eq.(3) is a convex function for w and λ jointly. This leads to an immediate stochastic (projected) gradient descent [3, 21] for solving (3). For instance, with Ω(w) = 1 2C w 2, where C > 0 is a regularization factor, at the t-th iteration, the corresponding MATk objective can be minimized by first randomly sampling (xit, yit) from the training set and then updating the parameters as w(t+1) w(t) ηt wℓ(f(xit; w(t)), yit) I[ℓ(f(xit;w(t)),yit)>λ(t)] + w(t) λ(t+1) h λ(t) ηt k n I[ℓ(f(xit;w(t),yit)>λ(t)] i where wℓ(f(x; w), y) denotes the sub-gradient with respect to w, and ηt 1 t is the step size. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It cites external libraries like LIBSVM, but does not provide its own code. |
| Open Datasets | Yes | We conduct experiments on binary classification using eight benchmark datasets from the UCI3 and KEEL4 data repositories... 3https://archive.ics.uci.edu/ml/datasets.html 4http://sci2s.ugr.es/keel/datasets.php |
| Dataset Splits | Yes | For each dataset, we randomly sample 50%, 25%, 25% examples as training, validation and testing sets, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | During training, we select parameters C (regularization factor) and k (number of top losses) on the validation set. Parameter C is searched on grids of log10 scale in the range of [10 5, 105] (extended when optimal value is on the boundary), and k is searched on grids of log10 scale in the range of [1, n]. |