Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Quantifying the Informativeness of Similarity Measurements

Authors: Austin J. Brockmeier, Tingting Mu, Sophia Ananiadou, John Y. Goulermas

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate that informativeness is a useful criterion for selecting kernel parameters, choosing the dimension for kernel-based nonlinear dimensionality reduction, and identifying structured graphs. We also consider the problem of ﬁnding a maximally informative correlation matrix around a target matrix, and explore parameterizing the optimization in terms of the coordinates of the sample or through a lower-dimensional embedding. In the latter case, we ﬁnd that maximizing the Bures-based informativeness measure, which is maximal for centered rank-1 correlation matrices, is equivalent to minimizing a speciﬁc matrix norm, and present an algorithm to solve the minimization problem using the norm s proximal operator. The proposed correlation denoising algorithm consistently improves spectral clustering. Overall, we ﬁnd informativeness to be a novel and useful criterion for identifying non-trivial correlation structure.
Researcher Affiliation	Academia	Austin J. Brockmeier EMAIL Department of Computer Science University of Liverpool Liverpool L69 3BX, UK Tingting Mu EMAIL School of Computer Science University of Manchester Manchester M1 7DN, UK Sophia Ananiadou EMAIL School of Computer Science University of Manchester Manchester M1 7DN, UK John Y. Goulermas EMAIL Department of Computer Science University of Liverpool Liverpool L69 3BX, UK
Pseudocode	Yes	Algorithm 1: Correlation Matrix Denoising
Open Source Code	Yes	MATLAB code that implements the informativeness measures and reproduces the ﬁgures and tables is available at http://pcwww.liv.ac.uk/~goulerma/software/brockmeier17a-code.zip.
Open Datasets	Yes	We test the dimensionality selection method on a set of UCI data sets (Lichman, 2013). A Gaussian kernel function with a heuristic bandwidth is used for all cases. Speciﬁcally, the bandwidth is a linear combination of the minimum and maximum Euclidean distances σ = 2dmin+ 2 9(dmax 2dmin), where dmin and dmax are the minimum and maximum pairwise distances (a similar heuristic was used by Shi and Malik, 2000). We then apply the denoising process to a set of grayscale images of handwritten digits in the USPS data set (Hull, 1994). The initial correlation matrix is formed by using a Gaussian kernel between images for ﬁve thumbnail image data sets: ORL, MNIST, UMIST, USPS, and COIL-20. The kernel bandwidth heuristic described in Section 5.2 is again used to select this parameter.
Dataset Splits	Yes	In all cases, we use a ﬁrst nearest neighbor classiﬁer with half of the instances for training; the average classiﬁcation accuracy is recorded in Table 6.
Hardware Specification	Yes	Computation time logged in MATLAB R2015b on a 2.8 GHz Intel Core i7 with 16 GB RAM.
Software Dependencies	Yes	Computation time logged in MATLAB R2015b on a 2.8 GHz Intel Core i7 with 16 GB RAM.
Experiment Setup	Yes	We ﬁrst ﬁnd the bandwidth σ that maximizes the informativeness using a golden search over θ [ 5, 5], where 2σ2 = 10 θ. Then, we optimize the sample coordinates using the conjugate gradient method implemented in min Func (Schmidt, 2012) with the gradients given in Section 4.1. For all methods, the optimization is performed in terms of K = (1 η)K + ηI, where η = 10 6, to ensure the correlation matrix is positive deﬁnite. A log-barrier term of log(1 1 n2 1 K1) is added to the cost function for the CKA measure. For the Bures-based measure, a smoothing parameter of γ = 10 9 is used.