reproducibilityindex.ai

Regression with Label Permutation in Generalized Linear Model

Authors: Guanhua Fang, Ping Li

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Multiple numerical experiments are provided and corroborate our theoretical findings.
Researcher Affiliation	Collaboration	Guanhua Fang Ping Li School of Management, Fudan University Linked In Ads 670 Guoshun Road, Shanghai 200433, China 700 Bellevue Way NE, Bellevue, WA 98004, USA fanggh@fudan.edu.cn pinli@linkedin.com
Pseudocode	Yes	Algorithm 1 Two-step Estimation. Algorithm 2 Warm start for maximum likelihood estimation. Algorithm 3 Maximum likelihood (ML) estimation. Algorithm 4 ML estimation with warm start for missing observations. Algorithm 5 Two-step Estimation with missing observations.
Open Source Code	No	The paper does not provide a direct link or explicit statement about releasing the source code for the methodology described.
Open Datasets	No	The paper uses generated data for simulations and a real financial dataset (Dow Jones Industrial Average) for which no public access information (link, DOI, or specific citation for public availability) is provided. It mentions: 'The Dow Jones Industrial Average is a stock market index... The dataset consists of weekly price for each of thirty stocks in the first half of year 2011.'
Dataset Splits	No	The paper describes simulation settings (e.g., varying m, q, n, and percentage of permuted labels) and real-world data processing but does not specify train/validation/test splits for reproducibility, nor does it refer to standard predefined splits.
Hardware Specification	No	No specific hardware (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments is mentioned.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python version, library versions like PyTorch, TensorFlow, or specific solvers).
Experiment Setup	Yes	Setting 1 In the first simulation setting, we consider to evaluate the performance of maximum likelihood estimation method. We set n to be 256 and 512 and let 25% or 33 % labels be permuted. We vary m from {log2 n, 2 log2 n, . . . , 20 log2 n} and set observation rate q at different levels. For design matrix X, each row independently follows a multivariate Gaussian distribution N(0, Ip/p) (p = 10). For coefficient matrix B, each element is i.i.d. standard Gaussian random variable.