Gradient Descent Converges Linearly for Logistic Regression on Separable Data

Authors: Kyriakos Axiotis, Maxim Sviridenko

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In order to numerically validate our algorithm, we run logistic regression on the UCI adult binary classification dataset. In order to simulate a separable dataset, we first run gradient descent on the whole data, and then discard the misclassified data points. This gives us a separable dataset. Then, we run two variants of gradient descent: One with constant step size given by β 1, and one with increasing step size given by ηt = β 1f(x 0)/f(x T ), with no other change. This is motivated by our findings, which suggest that the step size should increase proportionally to the decrease of the loss. As we can see in Figure 2, the error in the case of fixed step size decays as poly(1/t), while in the case of increasing step size we have linear convergence (albeit with a low rate because the margins are in the order of 10 6).
Researcher Affiliation Collaboration This work was performed while the first author was at MIT. 1Google Research, New York, NY, USA 2Yahoo! Research, New York, NY, USA.
Pseudocode Yes Algorithm 1 Greedy Coordinate Descent
Open Source Code No The paper does not provide concrete access to source code for the methodology described. There are no explicit statements about code release, repository links, or mentions of code in supplementary materials.
Open Datasets Yes In order to numerically validate our algorithm, we run logistic regression on the UCI adult binary classification dataset.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. It only mentions using the 'whole data' and then creating a 'separable dataset' by discarding points.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions running logistic regression.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup Yes Then, we run two variants of gradient descent: One with constant step size given by β 1, and one with increasing step size given by ηt = β 1f(x 0)/f(x T ), with no other change.