Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Coresets for Regularized Regression

Authors: Rachit Chhaya, Anirban Dasgupta, Supratim Shit

ICML 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that the modified version of lasso also induces sparsity in solution, similar to the original lasso. We also obtain smaller coresets for ā„“p regression with ā„“p regularization. We extend our methods to multi response regularized regression. Finally, we empirically demonstrate the coreset performance for the modified lasso and the ā„“1 regression with ā„“1 regularization. In this section we describe the empirical results supporting our claims. We performed experiments for the modified lasso and the RLAD problem.
Researcher Affiliation Academia 1Computer Science and Engineering, Indian Institute of Technology Gandhinagar, Gandhinagar, India . Correspondence to: Rachit Chhaya <EMAIL>.
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes We also provide open code for the same here
Open Datasets Yes We generated a matrix A of size 100000 Ɨ 30 in which there a few rows with high leverage scores. The construction of this matrix is described in (Yang et al., 2015) where they refer to it as an NG matrix. We used the Combined Cycle Power Plant Data Set (Tüfekci, 2014) available at the UCI Machine learning repository.
Dataset Splits No No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning was found. The paper mentions using "entire data" or "subsample" without specifying splits.
Hardware Specification Yes All the experiments were performed in Matlab R2017a on a machine with 16GB memory and 8 cores of 3.40 GHz.
Software Dependencies Yes All the experiments were performed in Matlab R2017a on a machine with 16GB memory and 8 cores of 3.40 GHz.
Experiment Setup Yes We generated a matrix A of size 100000 Ɨ 30 in which there a few rows with high leverage scores. The construction of this matrix is described in (Yang et al., 2015) where they refer to it as an NG matrix. The NG (non uniform leverage scores with good condition number) matrix is generated by the following matlab command: NG=[alpha*randn(n-d/2,d/2); (10^(-8))*rand(n-d/2,d/2); zeros(d/2,d/2) eye(d/2)]; Here we used alpha= 0.00065 to get a condition number of about 5. A solution vector x R30 was generated randomly and a response vector b = Ax + (10^-5) ||b||2 / ||e||2 e where e is a vector of noise, was also generated. In our first experiment we solved the modified lasso problem on the entire data matrix A and response vector b to see the effect on sparsity of solution for different values of Ī».