Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sparse Modal Regression with Mode-Invariant Skew Noise

Authors: Kazuki Koyama, Takayuki Kawashima, Hironori Fujisawa

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments on artiﬁcial and real-world data demonstrate that the proposed method performs signiﬁcantly better and is more stable than other existing methods for various skew-noise data. In Section 4, numerical experiments on artiﬁcial and real-world data are demonstrated.
Researcher Affiliation	Academia	Kazuki Koyama EMAIL The Graduate University for Advanced Studies (SOKENDAI) Takayuki Kawashima EMAIL Tokyo Institute of Technology Hironori Fujisawa EMAIL The Institute of Statistical Mathematics The Graduate University for Advanced Studies (SOKENDAI) RIKEN Center for Advanced Intelligence Project
Pseudocode	Yes	Algorithm 1 Optimization of the proposed method Require: Hyper-parameter λ and initialized β, σ, α 1: while until convergence do 2: while until convergence (of β) do 3: for n = 1, . . . , N do 4: Calculate z n and w n as follows: 5: σ yn X n β + ρα z2n + 1 ρ2α 6: yn X n β + σ 2α zn ρα 1+ρ2α wn + z2 n wn 8: β argmin β 1+ρ2 α Nσ2(1 ρ2α)2 PN n=1 yn X n β 2 + λ β 1 9: end while 10: σ argmin σ: σ>0 ℓ(θ \| DN) with L-BFGS-B (or other valid algorithm) 11: α argmin α ℓ(θ \| DN) with L-BFGS (or other valid algorithm) 12: end while
Open Source Code	No	The paper does not contain any explicit statement about the release of source code, nor does it provide a link to a code repository.
Open Datasets	Yes	We applied the proposed method to the following two medical datasets: PDGFR (Platelet Derived Growth Factor Receptor) consists of N = 79 samples and P = 320 features, where the outcome is the ability to inhibit PDGFR phosphorylation (Guha & Jurs, 2004). MTP (Mel Ting Point) includes N = 274 samples and P = 1142 features, where the outcome is the melting point of drug-like compounds (Karthikeyan et al., 2005). We applied the proposed method to the Engineering Graduate Salary (EGS) prediction data (Aggarwal et al., 2016).
Dataset Splits	Yes	For each trial, the regularization coeﬃcient λ was adjusted by 5-fold cross-validation based on the log-likelihood loss, in which the numbers of training and validation data for each trial were set to 400 and 100, respectively. The regularization coeﬃcient λ was tuned by 5-fold cross-validation with 80% training and 20% validation data. We used 80% of the samples for training and the remaining 20% for testing. Then, we tuned each regularization coeﬃcient λ with 5-fold cross-validation using 20% of the training samples (i.e., 16% of all samples) as validation data.
Hardware Specification	No	The paper does not contain any specific hardware details such as CPU, GPU models, or memory specifications used for running the experiments.
Software Dependencies	No	As shown in line 8 of Algorithm 1, we can rewrite the β update as another Lasso-type problem and then use well-known software, e.g., the sklearn.linear_model package of Python. In this paper, we employ the L-BFGS algorithm (Liu & Nocedal, 1989), which is an iterative method for solving non-linear optimization problems. In particular, for the inequality constraint σ > 0, we can utilize the L-BFGS-B algorithm (Byrd et al., 1995; Zhu et al., 1997), which extends the L-BFGS algorithm to handle bounded constraints. The paper mentions software packages like 'sklearn.linear_model' and 'scipy.optimize' (for L-BFGS) but does not specify their version numbers.
Experiment Setup	Yes	The sample size was set to N = 500. We conducted 50 experiments with diﬀerent random seeds. For each trial, the regularization coeﬃcient λ was adjusted by 5-fold cross-validation based on the log-likelihood loss, in which the numbers of training and validation data for each trial were set to 400 and 100, respectively. The tuning power parameter in the Yeo-Johnson transformation was determined by maximum likelihood estimation. We used 80% of the samples for training and the remaining 20% for testing. Then, we tuned each regularization coeﬃcient λ with 5-fold cross-validation using 20% of the training samples (i.e., 16% of all samples) as validation data. These samples were generated randomly, and 30 trials were conducted with diﬀerent random seeds.