Proximal Quasi-Newton for Computationally Intensive L1-regularized M-estimators

Authors: Kai Zhong, Ian En-Hsu Yen, Inderjit S Dhillon, Pradeep K Ravikumar

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, the proposed algorithm converges considerably faster than current state-of-the-art on the problems of sequence labeling and hierarchical classification. ... 6 Numerical Experiments We compare our approach, Prox-QN, with four other methods, Proximal Gradient (Prox-GD), OWLQN [23], SGD [21] and BCD [16].
Researcher Affiliation Academia Kai Zhong 1 Ian E.H. Yen 2 Inderjit S. Dhillon 2 Pradeep Ravikumar 2 1 Institute for Computational Engineering & Sciences 2 Department of Computer Science University of Texas at Austin zhongkai@ices.utexas.edu, {ianyen,inderjit,pradeepr}@cs.utexas.edu
Pseudocode Yes Algorithm 1 Proximal Quasi-Newton Algorithm (Prox-QN)
Open Source Code No The paper only provides a link to a third-party tool (OWL-QN) used for comparison, not the open-source code for their proposed method (Prox-QN). The text states: 'For OWL-QN, we directly use the OWL-QN optimizer developed by Andrew et al.1'.
Open Datasets Yes The dataset 2 was preprocessed by Taskar et al. [19] and was originally collected by Kassel [20], and contains 6877 words (instances). 2http://www.seas.upenn.edu/~taskar/ocr/ ... The dataset comes from Task1 of the dry-run dataset of LSHTC13. It has 4,463 samples, each with J=51,033 raw features. The hierarchical tree has 2,388 classes which includes 1,139 leaf labels. 3http://lshtc.iit.demokritos.gr/node/1
Dataset Splits Yes We randomly divide the dataset into two part: training part with 6216 words and testing part with 661 words.
Hardware Specification Yes All the experiments are executed on 2.8GHz Intel Xeon E5-2680 v2 Ivy Bridge processor with 1/4TB memory and Linux OS.
Software Dependencies No The paper mentions 'OWL-QN optimizer' and 'svm-scale program in the LIBSVM package' but does not provide specific version numbers for any software dependencies required to reproduce the experiments.
Experiment Setup Yes For OWL-QN, we directly use the OWL-QN optimizer developed by Andrew et al.1, where we set the memory size as m = 10, which is the same as that in Prox-QN. ... In our experiment, λ is set as 100... The learning rate η0 for SGD is tuned to be 2 10 4 for best performance. In BCD, the unigram parameters are grouped into J blocks according to the x features while the bigram parameters are grouped into one block. ... We set λ = 1 to achieve a relative high testing accuracy and high sparsity of the optimal solution. The SGD initial learning rate is tuned to be η0 = 10 for best performance.