Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Inference on High-dimensional Single-index Models with Streaming Data

Authors: Dongxiao Han, Jinhan Xie, Jin Liu, Liuquan Sun, Jian Huang, Bei Jiang, Linglong Kong

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the performance of the proposed method, extensive simulation studies have been conducted. We provide applications to Nasdaq stock prices and ﬁnancial distress data sets.
Researcher Affiliation	Academia	Dongxiao Han EMAIL School of Statistics and Data Science, KLMDASR, LEBPS, and LPMC Nankai University Tianjin, 300071,China Jinhan Xie EMAIL Yunnan Key Laboratory of Statistical Modeling and Data Analysis Yunnan University Kunming, 650091, China; Department of Mathematical and Statistical Sciences University of Alberta Edmonton, AB, T6G 2G1, Canada Jin Liu EMAIL School of Statistics and Data Science, KLMDASR, LEBPS, and LPMC Nankai University Tianjin, 300071,China Liuquan Sun EMAIL Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and School of Mathematical Sciences University of Chinese Academy of Sciences Beijing 100190, China Jian Huang EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong, China Bei Jiang EMAIL Department of Mathematical and Statistical Sciences University of Alberta Edmonton, AB, T6G 2G1, Canada Linglong Kong EMAIL Department of Mathematical and Statistical Sciences University of Alberta Edmonton, AB, T6G 2G1, Canada
Pseudocode	Yes	Algorithm 1 Online estimation for the SIMs. Input: Streaming data sets D1 . . . Ds . . ., and the tuning parameters λ1 . . . λs . . ., γ1 . . . γs . . .; 1. Calculate the oﬄine lasso penalized estimators bβ(1) 1 , bβ(1) 2 via (2) and (3) based on D1; 2. Update n1H(1) 1 and n2H(1) 2 ; 3. for s = 2, 3, . . . , do (i). Read the current data set Ds; (ii). Calculate the online lasso penalized estimators bβ(s) 1 and bβ(s) 2 via (5) and (6); (iii). Update and store the summary statistics { bβ(s) 1 , bβ(s) 2 , Ps j=1 nj H(j) 1 , Ps j=1 nj H(j) 2 }; (iv). Calculate ˆβ (s) ave = {ˆβ (s) 1 + ˆβ (s) 2 }/2; (v). Release data set Ds from the memory; end for Output: bβ(s) ave for s = 1, 2, . . .
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a direct link to a code repository for the methodology described.
Open Datasets	Yes	In this section, we illustrate our method with the ﬁnancial distress data set, which is available from https://www.kaggle.com/datasets/shebrahimi/ﬁnancial-distress.
Dataset Splits	Yes	The data are split into m = 10 batches. We take the ﬁrst two-year data set as the ﬁrst data batch (n1 = 164) to guarantee a suﬃciently large sample size at the initial stage and the next one-year data set as the subsequent data batch (nj = 82, j = 2, , m 1). In addition, the sample size of the ﬁnal batch is nm = 72. Hence, the streaming data consists of m = 10 data batches with a total sample size Nm = 892. ... we split the data into m = 10 batches randomly, take the n1 = 108 observations as the ﬁrst batch, and set each of the remaining 9 batches containing nj = 100 observations.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	The tuning parameters λs and γs, s = 1, . . . , m, are chosen by the modiﬁed BIC (Wang et al., 2007). For example, we obtain λs by minimizing BIC(λs) = log (ˆβ(λs) ˆβ (s 1) 2 ) s 1 X nj 2Ns H(j) 1 (ˆβ(λs) ˆβ (s 1) 2 ) i=1 l(Y (s) i , X(s) i ˆβ(λs)) + CNs log(Ns/2) where ˆβ(λs) is obtained from (5), CNs = c log log(p), c is a constant, and 0 denotes the number of nonzero elements in a vector. Furthermore, we choose the robustiﬁcation parameter τ in the Huber loss such that 80% of the prediction errors are in [ τ, τ]. ... hs = arg min h Sh tr n 2H(s) 1 bΩ(s 1) 1 (h)/ns o log[det{bΩ(s 1) 1 (h)}] .