Adaptive Accelerated Gradient Converging Method under H\"{o}lderian Error Bound Condition

Authors: Mingrui Liu, Tianbao Yang

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental Results We conduct some experiments to demonstrate the effectiveness of ada AGC for solving problems of type (1). Specifically, we compare ada AGC, PG with option II that returns the solution with historically minimal proximal gradient, FISTA, unconditional restarting FISTA (ur FISTA) [6] for optimizing the squared hinge loss (classification), square loss (regression), huber loss (with ρ = 1) (regression) with ℓ1 and ℓ regularization, which are cases of (11), and we also consider the ℓ1 constrained ℓp norm regression (7) with varying p. We use three datasets from the Lib SVM website [5], which are splice (n = 1000, d = 60) for classification, and bodyfat Table 2: squared hinge loss with ℓ1 norm (left) and ℓ norm (right) regularization on splice data (n = 252, d = 14), cpusmall (n = 8192, d = 12) for regression.
Researcher Affiliation Academia Mingrui Liu, Tianbao Yang Department of Computer Science The University of Iowa, Iowa City, IA 52242 mingrui-liu, tianbao-yang@uiowa.edu
Pseudocode Yes Algorithm 1: ADG
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the methodology described is publicly available.
Open Datasets Yes We use three datasets from the Lib SVM website [5], which are splice (n = 1000, d = 60) for classification, and bodyfat (n = 252, d = 14), cpusmall (n = 8192, d = 12) for regression.
Dataset Splits No The paper mentions training (e.g., 'training examples'), but it does not specify exact dataset split percentages (e.g., '80/10/10 split'), absolute sample counts, or refer to predefined splits with citations for training, validation, or test sets. No cross-validation setup is mentioned either.
Hardware Specification No The paper does not provide any specific hardware details such as CPU or GPU models, processor types, or memory used for running the experiments. It only mentions the datasets used.
Software Dependencies No The paper does not list specific software components with their version numbers (e.g., Python, PyTorch, CUDA, or specific solvers like CPLEX) that are needed to replicate the experiments.
Experiment Setup Yes For problems covered by (11), we fix λ = 1/n, and the parameter s in (7) is set to s = 100. We use the backtracking in PG, ada AGC and FISTA to search for the smoothness parameter. In ada AGC, we set c0 = 2, γ = 2 for the ℓ1 constrained ℓp norm regression and c0 = 10, γ = 2 for the rest problems. For fairness, for ur FISTA and ada AGC, we use the same initial estimate of unknown parameter (i.e., c). Each algorithm starts at the same initial point, which is set to be zero, and we stop each algorithm when the norm of its proximal gradient is less than a prescribed threshold ϵ and report the total number of proximal mappings.