Regression with Label Permutation in Generalized Linear Model
Authors: Guanhua Fang, Ping Li
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Multiple numerical experiments are provided and corroborate our theoretical findings. |
| Researcher Affiliation | Collaboration | Guanhua Fang Ping Li School of Management, Fudan University Linked In Ads 670 Guoshun Road, Shanghai 200433, China 700 Bellevue Way NE, Bellevue, WA 98004, USA fanggh@fudan.edu.cn pinli@linkedin.com |
| Pseudocode | Yes | Algorithm 1 Two-step Estimation. Algorithm 2 Warm start for maximum likelihood estimation. Algorithm 3 Maximum likelihood (ML) estimation. Algorithm 4 ML estimation with warm start for missing observations. Algorithm 5 Two-step Estimation with missing observations. |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about releasing the source code for the methodology described. |
| Open Datasets | No | The paper uses generated data for simulations and a real financial dataset (Dow Jones Industrial Average) for which no public access information (link, DOI, or specific citation for public availability) is provided. It mentions: 'The Dow Jones Industrial Average is a stock market index... The dataset consists of weekly price for each of thirty stocks in the first half of year 2011.' |
| Dataset Splits | No | The paper describes simulation settings (e.g., varying m, q, n, and percentage of permuted labels) and real-world data processing but does not specify train/validation/test splits for reproducibility, nor does it refer to standard predefined splits. |
| Hardware Specification | No | No specific hardware (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments is mentioned. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python version, library versions like PyTorch, TensorFlow, or specific solvers). |
| Experiment Setup | Yes | Setting 1 In the first simulation setting, we consider to evaluate the performance of maximum likelihood estimation method. We set n to be 256 and 512 and let 25% or 33 % labels be permuted. We vary m from {log2 n, 2 log2 n, . . . , 20 log2 n} and set observation rate q at different levels. For design matrix X, each row independently follows a multivariate Gaussian distribution N(0, Ip/p) (p = 10). For coefficient matrix B, each element is i.i.d. standard Gaussian random variable. |