Non-convex Conditional Gradient Sliding

Authors: Chao Qu, Yan Li, Huan Xu

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we test our algorithm in the batch (NCGS) and finite-sum setting (NCGS-VR) and compare them with Frank-Wolfe counterparts (FW and SVFW (Reddi et al., 2016b)), as well as SVRG, a projection based algorithm.
Researcher Affiliation Academia 1EE faculty, Technion, Israel 2H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, USA.
Pseudocode Yes Algorithm 1 Procedure of u+ = condg(l, u, γ, η) Algorithm 2 Non-convex conditional gradient sliding (NCGS) Algorithm 3 Stochastic Non-convex conditional gradient sliding Algorithm 4 Variance reduction Non-convex conditional gradient sliding (NCGS-VR)
Open Source Code No The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets Yes We test our algorithms on three real datasets: Aloi (n=108000, d=128) (Geusebroek et al., 2005), Covertype (n=581012, d=54) (Blackard & Dean, 1999) and Sensorless Drive Diagnosis Data Set (n=58509, d=49) (Bator, 2015).
Dataset Splits No The paper describes the datasets used and how they were generated or processed, but it does not specify any training, validation, or test splits by percentage, absolute numbers, or reference to standard splits.
Hardware Specification No The paper mentions 'cpu time' in its figures, implying CPU usage, but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instances used for the experiments.
Software Dependencies No The paper does not provide specific software dependencies or their version numbers used in the experiments.
Experiment Setup Yes In particular, we optimize the following trace norm constrained non-convex problem using the candidate algorithms. (i,j) Ω fi,j(θ) s.t. θ R, (9) where Ωis the set of observed entries, fi,j = 1 exp( (θi,j Yi,j)2 σ ) , Yi,j is the observation of (i, j) s entry, is the nuclear norm. Here fi,j is a smoothed ℓ0 loss with enhanced robustness to outliers in the data, thus it can solve sparse+low rank matrix completion in Chandrasekaran et al. (2009). Obviously, fi,j is non-convex and satisfies assumptions in our algorithm 2,3,4. We compare our non-convex conditional gradient sliding method with the Frank-Wolfe method in Fig 1. Particularly, we report the result of the batch setting in the left panel. The dimension of the matrix is 200 200, rank r = 5, the probability of observing each entry is 0.1. The sparse noise is sampled uniformly from [ 3, 3]. Each entry is corrupted by noise with probability 0.05. We set σ = 1, R = 5 in Problem (9). We observe that our algorithm 2 (NCGS) clearly outperforms the non-convex Frank-Wolfe method (FW). In the right panel, we treat Problem (9) as a finite-sum problem, and thus solve it using Algorithm 4 (NCGS-VR) and compare the performance with the SVFW (Reddi et al., 2016b). We set the dimension of the matrix as 400 400, rank r = 8, σ = 1, R = 8. The way to generate sparse noise and the probability to observe the entry are same with the setting of batch case. We observe that our NCGS-VR uses around 50 cpu-time to achieve 10 3 accuracy of squared gradient mapping, while SVFW needs more than 300 cpu-time. We choose the mini-batch b = n2/3 and m = n1/3 as that in Reddi et al. (2016a).