Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Proximal Quasi-Newton for Computationally Intensive L1-regularized M-estimators
Authors: Kai Zhong, Ian En-Hsu Yen, Inderjit S Dhillon, Pradeep K Ravikumar
NeurIPS 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, the proposed algorithm converges considerably faster than current state-of-the-art on the problems of sequence labeling and hierarchical classification. ... 6 Numerical Experiments We compare our approach, Prox-QN, with four other methods, Proximal Gradient (Prox-GD), OWLQN [23], SGD [21] and BCD [16]. |
| Researcher Affiliation | Academia | Kai Zhong 1 Ian E.H. Yen 2 Inderjit S. Dhillon 2 Pradeep Ravikumar 2 1 Institute for Computational Engineering & Sciences 2 Department of Computer Science University of Texas at Austin EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Proximal Quasi-Newton Algorithm (Prox-QN) |
| Open Source Code | No | The paper only provides a link to a third-party tool (OWL-QN) used for comparison, not the open-source code for their proposed method (Prox-QN). The text states: 'For OWL-QN, we directly use the OWL-QN optimizer developed by Andrew et al.1'. |
| Open Datasets | Yes | The dataset 2 was preprocessed by Taskar et al. [19] and was originally collected by Kassel [20], and contains 6877 words (instances). 2http://www.seas.upenn.edu/~taskar/ocr/ ... The dataset comes from Task1 of the dry-run dataset of LSHTC13. It has 4,463 samples, each with J=51,033 raw features. The hierarchical tree has 2,388 classes which includes 1,139 leaf labels. 3http://lshtc.iit.demokritos.gr/node/1 |
| Dataset Splits | Yes | We randomly divide the dataset into two part: training part with 6216 words and testing part with 661 words. |
| Hardware Specification | Yes | All the experiments are executed on 2.8GHz Intel Xeon E5-2680 v2 Ivy Bridge processor with 1/4TB memory and Linux OS. |
| Software Dependencies | No | The paper mentions 'OWL-QN optimizer' and 'svm-scale program in the LIBSVM package' but does not provide specific version numbers for any software dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | For OWL-QN, we directly use the OWL-QN optimizer developed by Andrew et al.1, where we set the memory size as m = 10, which is the same as that in Prox-QN. ... In our experiment, λ is set as 100... The learning rate η0 for SGD is tuned to be 2 10 4 for best performance. In BCD, the unigram parameters are grouped into J blocks according to the x features while the bigram parameters are grouped into one block. ... We set λ = 1 to achieve a relative high testing accuracy and high sparsity of the optimal solution. The SGD initial learning rate is tuned to be η0 = 10 for best performance. |