Data subsampling for Poisson regression with pth-root-link
Authors: Han Cheng Lie, Alexander Munteanu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | G.3 Experimental illustration. All experiments were run on a commodity machine with Intel Core i7-7700K processor (4 cores, 4.2GHz, 32GB RAM) and took overall around 50 minutes to complete. The Python code of [34] was adapted to the Poisson regression setting.4 We applied it with the appropriate p {1, 2} to the datasets with dimensions n = 100 000, d = 7 generated as detailed in the previous section. We compared our method to uniform sampling as a baseline. |
| Researcher Affiliation | Academia | Han Cheng Lie Institut für Mathematik Universität Potsdam Germany hanlie@uni-potsdam.de. Alexander Munteanu Department of Statistics TU Dortmund University Germany alexander.munteanu@tu-dortmund.de |
| Pseudocode | Yes | G.1 Pseudocode. Here we give pseudocode for our coreset construction Algorithm 1 and for the subsequent optimization procedure Algorithm 2: |
| Open Source Code | Yes | Our new code is available at https://github.com/Tim907/poisson-regression/. |
| Open Datasets | No | G.2 Synthetic data generation. We generated for each p {1, 2} a dataset with dimensions n = 100 000, d = 7 with n labels corresponding to each point. (The paper describes generating synthetic data but does not state that it is publicly available or provide access information for it.) |
| Dataset Splits | No | The paper discusses 'reduced size' for subsampling but does not explicitly provide details about training, validation, or test dataset splits (percentages, counts, or splitting methodology). |
| Hardware Specification | Yes | All experiments were run on a commodity machine with Intel Core i7-7700K processor (4 cores, 4.2GHz, 32GB RAM) |
| Software Dependencies | No | The Python code of [34] was adapted to the Poisson regression setting. (It mentions 'Python code' but does not specify Python version or any other software dependencies with version numbers.) |
| Experiment Setup | Yes | We applied it with the appropriate p {1, 2} to the datasets with dimensions n = 100 000, d = 7 generated as detailed in the previous section. We compared our method to uniform sampling as a baseline. We varied the reduced size between 50 and 600 in equal increments of size 50. For each reduced size and each method, we performed 201 independent repetitions. |