On Coresets for Regularized Regression
Authors: Rachit Chhaya, Anirban Dasgupta, Supratim Shit
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that the modified version of lasso also induces sparsity in solution, similar to the original lasso. We also obtain smaller coresets for ℓp regression with ℓp regularization. We extend our methods to multi response regularized regression. Finally, we empirically demonstrate the coreset performance for the modified lasso and the ℓ1 regression with ℓ1 regularization. In this section we describe the empirical results supporting our claims. We performed experiments for the modified lasso and the RLAD problem. |
| Researcher Affiliation | Academia | 1Computer Science and Engineering, Indian Institute of Technology Gandhinagar, Gandhinagar, India . Correspondence to: Rachit Chhaya <rachit.chhaya@iitgn.ac.in>. |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | We also provide open code for the same here |
| Open Datasets | Yes | We generated a matrix A of size 100000 × 30 in which there a few rows with high leverage scores. The construction of this matrix is described in (Yang et al., 2015) where they refer to it as an NG matrix. We used the Combined Cycle Power Plant Data Set (Tüfekci, 2014) available at the UCI Machine learning repository. |
| Dataset Splits | No | No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning was found. The paper mentions using "entire data" or "subsample" without specifying splits. |
| Hardware Specification | Yes | All the experiments were performed in Matlab R2017a on a machine with 16GB memory and 8 cores of 3.40 GHz. |
| Software Dependencies | Yes | All the experiments were performed in Matlab R2017a on a machine with 16GB memory and 8 cores of 3.40 GHz. |
| Experiment Setup | Yes | We generated a matrix A of size 100000 × 30 in which there a few rows with high leverage scores. The construction of this matrix is described in (Yang et al., 2015) where they refer to it as an NG matrix. The NG (non uniform leverage scores with good condition number) matrix is generated by the following matlab command: NG=[alpha*randn(n-d/2,d/2); (10^(-8))*rand(n-d/2,d/2); zeros(d/2,d/2) eye(d/2)]; Here we used alpha= 0.00065 to get a condition number of about 5. A solution vector x R30 was generated randomly and a response vector b = Ax + (10^-5) ||b||2 / ||e||2 e where e is a vector of noise, was also generated. In our first experiment we solved the modified lasso problem on the entire data matrix A and response vector b to see the effect on sparsity of solution for different values of λ. |