reproducibilityindex.ai

Learning with Average Top-k Loss

Authors: Yanbo Fan, Siwei Lyu, Yiming Ying, Baogang Hu

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the applicability of minimum average top-k learning for binary classiﬁcation and regression using synthetic and real datasets. We perform extensive experiments to validate the effectiveness of the MATk learning. In this section, we demonstrate the behaviors of MATk learning coupled with different individual losses for binary classiﬁcation and regression on synthetic and real datasets
Researcher Affiliation	Academia	1Department of Computer Science, University at Albany, SUNY 2Department of Mathematics and Statistics, University at Albany, SUNY 3National Laboratory of Pattern Recognition, CASIA 4University of Chinese Academy of Sciences (UCAS) {yanbo.fan,hubg}@nlpr.ia.cn, slyu@albany.edu, yying@albany.edu
Pseudocode	Yes	Algorithm: The reformulation of the ATk loss in Eq.(2) also facilitates development of optimization algorithms for the minimum ATk learning. As practical supervised learning problems usually use a parametric form of f, as f(x; w), where w is the parameter, the corresponding minimum ATk objective becomes min w,λ 0 1 n Pi=1 [ℓ(f(xi; w), yi) λ]+ + Ω(w) + k nλ (3) It is not hard to see that if ℓ(f(x; w), y) is convex with respect to w, the objective function of in Eq.(3) is a convex function for w and λ jointly. This leads to an immediate stochastic (projected) gradient descent [3, 21] for solving (3). For instance, with Ω(w) = 1 2C w 2, where C > 0 is a regularization factor, at the t-th iteration, the corresponding MATk objective can be minimized by ﬁrst randomly sampling (xit, yit) from the training set and then updating the parameters as w(t+1) w(t) ηt wℓ(f(xit; w(t)), yit) I[ℓ(f(xit;w(t)),yit)>λ(t)] + w(t) λ(t+1) h λ(t) ηt k n I[ℓ(f(xit;w(t),yit)>λ(t)] i where wℓ(f(x; w), y) denotes the sub-gradient with respect to w, and ηt 1 t is the step size.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It cites external libraries like LIBSVM, but does not provide its own code.
Open Datasets	Yes	We conduct experiments on binary classiﬁcation using eight benchmark datasets from the UCI3 and KEEL4 data repositories... 3https://archive.ics.uci.edu/ml/datasets.html 4http://sci2s.ugr.es/keel/datasets.php
Dataset Splits	Yes	For each dataset, we randomly sample 50%, 25%, 25% examples as training, validation and testing sets, respectively.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	During training, we select parameters C (regularization factor) and k (number of top losses) on the validation set. Parameter C is searched on grids of log10 scale in the range of [10 5, 105] (extended when optimal value is on the boundary), and k is searched on grids of log10 scale in the range of [1, n].