Unified Methods for Exploiting Piecewise Linear Structure in Convex Optimization

Authors: Tyler B. Johnson, Carlos Guestrin

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We include empirical evaluations that compare the scalability of screening and working set methods on real-world problems. ... While many screening tests have been proposed for large-scale optimization, we have not seen the scalability of screening studied in prior literature. Surprisingly, although our screening test significantly improves upon many prior results, we find that screening scales poorly as the size of the problem increases. In fact, in many cases, screening has negligible effect on overall convergence times. In contrast, our working set algorithm improves convergence times considerably in a number of cases. This result suggests that compared to screening, working set algorithms are significantly more useful for scaling optimization to large problems.
Researcher Affiliation Academia Tyler B. Johnson University of Washington, Seattle tbjohns@washington.edu Carlos Guestrin University of Washington, Seattle guestrin@cs.washington.edu
Pseudocode Yes Algorithm 1 PW-BLITZ
Open Source Code No The paper does not contain an unambiguous statement of source code release for the described methodology or a direct link to a code repository.
Open Datasets Yes We train an SVM model on the Higgs boson dataset2. This dataset was generated by a team of particle physicists. The classification task is to determine whether an event corresponds to the Higgs boson. In order to learn an accurate model, we performed feature engineering on this dataset, resulting in 8010 features. In this experiment, we consider subsets of examples with size m = 104, 105, and 106. Footnotes: 1https://www.kaggle.com/c/ClaimPredictionChallenge and 2https://archive.ics.uci.edu/ml/datasets/HIGGS
Dataset Splits No The paper mentions '250,000 training instances' and 'subsets of examples with size m = 104, 105, and 106', but does not provide specific percentages or counts for training, validation, or test splits. It hints at a validation set by mentioning 'minimizes validation loss' but lacks details on its size or how it was separated.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions implementing 'dual coordinate ascent (DCA)' and referencing 'LIBLINEAR library [12]', but it does not provide specific version numbers for any software, libraries, or solvers used in the experiments.
Experiment Setup No The paper mentions setting 'λ so that exactly 5% of groups have nonzero weight' and 'C is a tuning parameter', as well as applying screening 'After every five DCA epochs'. However, it does not provide specific concrete hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) or detailed system-level training settings.