reproducibilityindex.ai

Coresets for Regressions with Panel Data

Authors: Lingxiao Huang, K Sudhir, Nisheeth Vishnoi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we assess our approach with a synthetic and a real-world datasets; the coreset sizes constructed using our approach are much smaller than the full dataset and coresets indeed accelerate the running time of computing the regression objective. We implement our coreset algorithms for GLSE, and compare the performance with uniform sampling on synthetic datasets and a real-world dataset.
Researcher Affiliation	Collaboration	Lingxiao Huang Huawei K. Sudhir Yale University Nisheeth K. Vishnoi Yale University
Pseudocode	Yes	Algorithm 1: CGLSE: Coreset construction of GLSE. Algorithm 2: CGLSEk: Coreset construction of GLSEk.
Open Source Code	Yes	1Codes are in https://github.com/huanglx12/Coresets-for-regressions-with-panel-data.
Open Datasets	No	The paper describes the synthetic and real-world datasets used, but does not provide concrete access information (e.g., URL, DOI, specific citation for public availability) for either dataset.
Dataset Splits	No	The paper mentions running experiments on the 'full dataset' and 'coresets' but does not specify any train/validation/test splits, cross-validation, or other data partitioning strategies used for reproduction.
Hardware Specification	Yes	The experiments are conducted by Py Charm on a 4-Core desktop CPU with 8GB RAM.
Software Dependencies	No	The paper mentions using PyCharm as an IDE and implementing IRLS, but does not provide specific version numbers for any programming languages or libraries.
Experiment Setup	Yes	We vary ε = 0.1, 0.2, 0.3, 0.4, 0.5 and generate 100 independent random tuples ζ = (β, ρ) Rd+q (the same as described in the generation of the synthetic dataset). For each ε, we run our algorithm CGLSE and Uni to generate coresets. We also implement IRLS [32] for solving GLSE. We run IRLS on both the full dataset and coresets and record the runtime.