Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
On Coresets for Regularized Regression
Authors: Rachit Chhaya, Anirban Dasgupta, Supratim Shit
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that the modiļ¬ed version of lasso also induces sparsity in solution, similar to the original lasso. We also obtain smaller coresets for āp regression with āp regularization. We extend our methods to multi response regularized regression. Finally, we empirically demonstrate the coreset performance for the modiļ¬ed lasso and the ā1 regression with ā1 regularization. In this section we describe the empirical results supporting our claims. We performed experiments for the modiļ¬ed lasso and the RLAD problem. |
| Researcher Affiliation | Academia | 1Computer Science and Engineering, Indian Institute of Technology Gandhinagar, Gandhinagar, India . Correspondence to: Rachit Chhaya <EMAIL>. |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | We also provide open code for the same here |
| Open Datasets | Yes | We generated a matrix A of size 100000 à 30 in which there a few rows with high leverage scores. The construction of this matrix is described in (Yang et al., 2015) where they refer to it as an NG matrix. We used the Combined Cycle Power Plant Data Set (Tüfekci, 2014) available at the UCI Machine learning repository. |
| Dataset Splits | No | No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning was found. The paper mentions using "entire data" or "subsample" without specifying splits. |
| Hardware Specification | Yes | All the experiments were performed in Matlab R2017a on a machine with 16GB memory and 8 cores of 3.40 GHz. |
| Software Dependencies | Yes | All the experiments were performed in Matlab R2017a on a machine with 16GB memory and 8 cores of 3.40 GHz. |
| Experiment Setup | Yes | We generated a matrix A of size 100000 Ć 30 in which there a few rows with high leverage scores. The construction of this matrix is described in (Yang et al., 2015) where they refer to it as an NG matrix. The NG (non uniform leverage scores with good condition number) matrix is generated by the following matlab command: NG=[alpha*randn(n-d/2,d/2); (10^(-8))*rand(n-d/2,d/2); zeros(d/2,d/2) eye(d/2)]; Here we used alpha= 0.00065 to get a condition number of about 5. A solution vector x R30 was generated randomly and a response vector b = Ax + (10^-5) ||b||2 / ||e||2 e where e is a vector of noise, was also generated. In our ļ¬rst experiment we solved the modiļ¬ed lasso problem on the entire data matrix A and response vector b to see the effect on sparsity of solution for different values of Ī». |