Greedy Column Subset Selection: New Bounds and Distributed Algorithms
Authors: Jason Altschuler, Aditya Bhaskara, Gang Fu, Vahab Mirrokni, Afshin Rostamizadeh, Morteza Zadimoghaddam
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we study the greedy algorithm for the column subset selection problem from a theoretical and empirical perspective and show its effectiveness in a distributed setting. Finally, we validate the effectiveness of this distributed algorithm via an empirical study. In this section we present an empirical investigation of the GREEDY, GREEDY++ and DISTGREEDY algorithms. |
| Researcher Affiliation | Collaboration | Jason Altschuler JASONMA@PRINCETON.EDU Princeton University, Princeton, NJ 08544; Aditya Bhaskara BHASKARA@CS.UTAH.EDU School of Computing, 50 S. Central Campus Drive, Salt Lake City, UT 84112; Gang (Thomas) Fu THOMASFU@GOOGLE.COM Vahab Mirrokni MIRROKNI@GOOGLE.COM Afshin Rostamizadeh ROSTAMI@GOOGLE.COM Morteza Zadimoghaddam ZADIM@GOOGLE.COM Google, 76 9th Avenue, New York, NY 10011 |
| Pseudocode | Yes | Algorithm 1 GREEDY(A Rm n A, B Rm n B, k n B); Algorithm 2 DISTGREEDY(A, B, k, ℓ) |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. No links or explicit statements about code release are present. |
| Open Datasets | Yes | We investigate using these algorithms using two datasets, one with a small set of columns (mnist)... as well as a sparse dataset with a large number of columns (news20.binary)... Both datasets can be found at: www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/multiclass.html. |
| Dataset Splits | No | The paper specifies training and test sets but does not explicitly mention a validation dataset split. For MNIST, it states "60,000 instances to train with and 10,000 instances for our test set." For news20.binary, it states "14,996 examples to train with and hold-out 5,000 examples to test with." |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using the "LIBLINEAR library (Fan et al., 2008)" for training a linear SVM model, but it does not specify a version number for this library or any other software dependencies. |
| Experiment Setup | Yes | For both datasets we report test error for the best choice of regularization parameter c {10 3, . . . , 104}. We run GREEDY++ and DISTGREEDY with n k log(10) marginal gain evaluations per iteration and the distributed algorithm uses s = p n k machines with each machine recieving n k columns. |