Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Semi-Supervised Eigenvectors for Large-Scale Locally-Biased Learning

Authors: Toke J. Hansen, Michael W. Mahoney

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5, we present an empirical analysis, including both toy data to illustrate how the knobs of our method work, as well as applications to realistic machine learning and data analysis problems. 5. Empirical Results
Researcher Affiliation	Academia	Toke J. Hansen EMAIL Department of Applied Mathematics and Computer Science Technical University of Denmark Richard Petersens Plads, 2800 Lyngby, Denmark Michael W. Mahoney EMAIL International Computer Science Institute and Dept. of Statistics University of California Berkeley, CA 94720-1776, USA
Pseudocode	Yes	Algorithm 1 Main algorithm to compute semi-supervised eigenvectors Require: LG, DG, s, κ = [κ1, . . . , κk]T , ϵ such that s T DG1 = 0, s T DGs = 1, κT 1 1 1: X = [1] 2: for t = 1 to k do 3: FF T I DGX(XT DGDGX) 1XT DG 4: λ2 where FF T LGFF T v2 = λ2FF T DGFF T v2 5: vol(G) 6: repeat 7: γt ( + )/2 (Binary search over γt) 8: xt (FF T (LG γt DG)FF T )+FF T DGs 9: Normalize xt such that x T t DGxt = 1 10: if (x T t DGs)2 > κt then γt else γt end if 11: until (x T t DGs)2 κt ϵ or ( + )/2 γt ϵ 12: Augment X with x t by letting X = [X, x t ]. 13: end for
Open Source Code	No	The paper does not provide an explicit statement of open-source code release for the methodology described, nor does it include a link to a code repository. The mention of 'our software distribution' is ambiguous and does not confirm public availability of the specific research code.
Open Datasets	Yes	Congressional voting data. In Section 5.2, we consider roll call voting data from the United States Congress that are based on (Poole and Rosenthal, 1991). Handwritten image data. In Section 5.3, we consider data from the MNIST digit data set (Lecun and Cortes). Large-scale network data. These improvements are demonstrated on data sets from the DIMACS implementation challenge, as well as on large web-crawls with more then 3 billion non-zeros in the adjacency matrix (Paolo et al., 2004, 2011; Paolo and Sebastiano, 2004).
Dataset Splits	Yes	For each Congress we perform 5-fold cross validation based on 80 samples and leave out the remaining 20 samples to estimate an unbiased test error. (Section 5.2) ...Figure 11 shows results based on a k-nearest neighbor graph constructed from 5% and 10% percent of the training data, where in both cases we used 10% for the test data. (Section 5.3.4)
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instances) used for running the experiments are provided in the paper.
Software Dependencies	No	The paper refers to computational methods and algorithms such as 'conjugate gradient method', 'Spectral Graph Transducer (SGT) of Joachims (2003)', and the 'Push algorithm by Andersen et al. (2006)', but it does not specify any software libraries, frameworks, or solvers with explicit version numbers.
Experiment Setup	Yes	Furthermore, we fix the regularization parameter of the SGT to c = 3200, and for simplicity we fix γ = 0 for all semi-supervised eigenvectors, implicitly defining the effective κ = [κ1, . . . , κk]T . (Section 5.3) ...we compare with a standard conjugate gradient implementation using a tolerance of 1e-6... (Section 5.4)