Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Semi-Supervised Eigenvectors for Large-Scale Locally-Biased Learning
Authors: Toke J. Hansen, Michael W. Mahoney
JMLR 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we present an empirical analysis, including both toy data to illustrate how the knobs of our method work, as well as applications to realistic machine learning and data analysis problems. 5. Empirical Results |
| Researcher Affiliation | Academia | Toke J. Hansen EMAIL Department of Applied Mathematics and Computer Science Technical University of Denmark Richard Petersens Plads, 2800 Lyngby, Denmark Michael W. Mahoney EMAIL International Computer Science Institute and Dept. of Statistics University of California Berkeley, CA 94720-1776, USA |
| Pseudocode | Yes | Algorithm 1 Main algorithm to compute semi-supervised eigenvectors Require: LG, DG, s, κ = [κ1, . . . , κk]T , ϵ such that s T DG1 = 0, s T DGs = 1, κT 1 1 1: X = [1] 2: for t = 1 to k do 3: FF T I DGX(XT DGDGX) 1XT DG 4: λ2 where FF T LGFF T v2 = λ2FF T DGFF T v2 5: vol(G) 6: repeat 7: γt ( + )/2 (Binary search over γt) 8: xt (FF T (LG γt DG)FF T )+FF T DGs 9: Normalize xt such that x T t DGxt = 1 10: if (x T t DGs)2 > κt then γt else γt end if 11: until (x T t DGs)2 κt ϵ or ( + )/2 γt ϵ 12: Augment X with x t by letting X = [X, x t ]. 13: end for |
| Open Source Code | No | The paper does not provide an explicit statement of open-source code release for the methodology described, nor does it include a link to a code repository. The mention of 'our software distribution' is ambiguous and does not confirm public availability of the specific research code. |
| Open Datasets | Yes | Congressional voting data. In Section 5.2, we consider roll call voting data from the United States Congress that are based on (Poole and Rosenthal, 1991). Handwritten image data. In Section 5.3, we consider data from the MNIST digit data set (Lecun and Cortes). Large-scale network data. These improvements are demonstrated on data sets from the DIMACS implementation challenge, as well as on large web-crawls with more then 3 billion non-zeros in the adjacency matrix (Paolo et al., 2004, 2011; Paolo and Sebastiano, 2004). |
| Dataset Splits | Yes | For each Congress we perform 5-fold cross validation based on 80 samples and leave out the remaining 20 samples to estimate an unbiased test error. (Section 5.2) ...Figure 11 shows results based on a k-nearest neighbor graph constructed from 5% and 10% percent of the training data, where in both cases we used 10% for the test data. (Section 5.3.4) |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instances) used for running the experiments are provided in the paper. |
| Software Dependencies | No | The paper refers to computational methods and algorithms such as 'conjugate gradient method', 'Spectral Graph Transducer (SGT) of Joachims (2003)', and the 'Push algorithm by Andersen et al. (2006)', but it does not specify any software libraries, frameworks, or solvers with explicit version numbers. |
| Experiment Setup | Yes | Furthermore, we fix the regularization parameter of the SGT to c = 3200, and for simplicity we fix γ = 0 for all semi-supervised eigenvectors, implicitly defining the effective κ = [κ1, . . . , κk]T . (Section 5.3) ...we compare with a standard conjugate gradient implementation using a tolerance of 1e-6... (Section 5.4) |