Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Systematically Exploring Associations among Multivariate Data

Authors: Lifeng Zhang6786-6794

AAAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The mechanisms of these measures are proved in theory and demonstrated with numerical analyses. Subsequently, empirical studies are performed to evaluate the effectiveness of the new statistics and make comparisons with previous approaches.
Researcher Affiliation	Academia	Lifeng Zhang School of Information, Renmin University of China 59, Zhongguancun Street, Haidian Beijing, P.R.China, 100872 EMAIL
Pseudocode	Yes	Algorithm 1 NN algorithm based data reordering. Input: Euclidean distance matrix of sample data {x(t)}, denoted by [λpq]N N where λpq = x(p) x(q) ; Output: concomitants {y[k:N]\|1 k N}; Start on data point t 1 as the current data point, set n(1) 1 and y[1:N] y(t); for k 1 to N 1 do Find out the shortest distance connecting the current data point t and an unvisited data point i / {n(1), , n(k)} that i arg min i λit; Move the current data point to t i , set n(k+1) i and y[k+1:N] y(i ); end for
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide a direct link to a code repository.
Open Datasets	Yes	Examples 5, called two-spirals problem, is a benchmark task for nonlinear classiﬁcation, which consists of two spirals each with 200 samples in a 2-D space. n Cor based statistics were used to explore a real-world data set that consists of 357 social, economic, health, and political indicators for 202 countries around the world for the time period from 1960 through 2005. It was originally collected from the World Health Organization (WHO) and partner organizations (Rosling 2008; W.H.O. 2009).
Dataset Splits	No	No specific training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split references) are explicitly provided in the paper. It mentions generating data of length 1000 and the two-spirals problem with 200 samples, but not how they are partitioned for training, validation, or testing.
Hardware Specification	No	No specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4) are mentioned in the paper. It refers to various methods (MIC, d Cor, MI, CODCF, RDC) and the use of 'linear regression and feedforward artificial neural network (ANN)' but without version details.
Experiment Setup	No	The paper does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or other system-level training settings for the models (e.g., ANNs) used in the empirical studies. It only states that '10 ANNs was trained for each case' but lacks further configuration details.