Regression with Label Permutation in Generalized Linear Model

Authors: Guanhua Fang, Ping Li

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Multiple numerical experiments are provided and corroborate our theoretical findings.
Researcher Affiliation Collaboration Guanhua Fang Ping Li School of Management, Fudan University Linked In Ads 670 Guoshun Road, Shanghai 200433, China 700 Bellevue Way NE, Bellevue, WA 98004, USA fanggh@fudan.edu.cn pinli@linkedin.com
Pseudocode Yes Algorithm 1 Two-step Estimation. Algorithm 2 Warm start for maximum likelihood estimation. Algorithm 3 Maximum likelihood (ML) estimation. Algorithm 4 ML estimation with warm start for missing observations. Algorithm 5 Two-step Estimation with missing observations.
Open Source Code No The paper does not provide a direct link or explicit statement about releasing the source code for the methodology described.
Open Datasets No The paper uses generated data for simulations and a real financial dataset (Dow Jones Industrial Average) for which no public access information (link, DOI, or specific citation for public availability) is provided. It mentions: 'The Dow Jones Industrial Average is a stock market index... The dataset consists of weekly price for each of thirty stocks in the first half of year 2011.'
Dataset Splits No The paper describes simulation settings (e.g., varying m, q, n, and percentage of permuted labels) and real-world data processing but does not specify train/validation/test splits for reproducibility, nor does it refer to standard predefined splits.
Hardware Specification No No specific hardware (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments is mentioned.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python version, library versions like PyTorch, TensorFlow, or specific solvers).
Experiment Setup Yes Setting 1 In the first simulation setting, we consider to evaluate the performance of maximum likelihood estimation method. We set n to be 256 and 512 and let 25% or 33 % labels be permuted. We vary m from {log2 n, 2 log2 n, . . . , 20 log2 n} and set observation rate q at different levels. For design matrix X, each row independently follows a multivariate Gaussian distribution N(0, Ip/p) (p = 10). For coefficient matrix B, each element is i.i.d. standard Gaussian random variable.