Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Data subsampling for Poisson regression with pth-root-link
Authors: Han Cheng Lie, Alexander Munteanu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | G.3 Experimental illustration. All experiments were run on a commodity machine with Intel Core i7-7700K processor (4 cores, 4.2GHz, 32GB RAM) and took overall around 50 minutes to complete. The Python code of [34] was adapted to the Poisson regression setting.4 We applied it with the appropriate p {1, 2} to the datasets with dimensions n = 100 000, d = 7 generated as detailed in the previous section. We compared our method to uniform sampling as a baseline. |
| Researcher Affiliation | Academia | Han Cheng Lie Institut für Mathematik Universität Potsdam Germany EMAIL. Alexander Munteanu Department of Statistics TU Dortmund University Germany EMAIL |
| Pseudocode | Yes | G.1 Pseudocode. Here we give pseudocode for our coreset construction Algorithm 1 and for the subsequent optimization procedure Algorithm 2: |
| Open Source Code | Yes | Our new code is available at https://github.com/Tim907/poisson-regression/. |
| Open Datasets | No | G.2 Synthetic data generation. We generated for each p {1, 2} a dataset with dimensions n = 100 000, d = 7 with n labels corresponding to each point. (The paper describes generating synthetic data but does not state that it is publicly available or provide access information for it.) |
| Dataset Splits | No | The paper discusses 'reduced size' for subsampling but does not explicitly provide details about training, validation, or test dataset splits (percentages, counts, or splitting methodology). |
| Hardware Specification | Yes | All experiments were run on a commodity machine with Intel Core i7-7700K processor (4 cores, 4.2GHz, 32GB RAM) |
| Software Dependencies | No | The Python code of [34] was adapted to the Poisson regression setting. (It mentions 'Python code' but does not specify Python version or any other software dependencies with version numbers.) |
| Experiment Setup | Yes | We applied it with the appropriate p {1, 2} to the datasets with dimensions n = 100 000, d = 7 generated as detailed in the previous section. We compared our method to uniform sampling as a baseline. We varied the reduced size between 50 and 600 in equal increments of size 50. For each reduced size and each method, we performed 201 independent repetitions. |