Continuous Partitioning for Graph-Based Semi-Supervised Learning

Authors: Chester Holtz, Pengwen Chen, Zhengchao Wan, Chung-Kuan Cheng, Gal Mishne

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that Cut SSL significantly surpasses the current state-of-the-art on k-nearest neighbor graphs and large real-world graph benchmarks across a variety of label rates, class imbalance, and label imbalance regimes. 5 Experiments We evaluate our method on three image datasets: MNIST [12], Fashion-MNIST [33] and Cifar-10 [23], using pretrained autoencoders as feature extractors as in Calder et al. [10], see appendix for details. We also evaluate cut SSL on various real-world networks, including the Cora citation network and the OGB [18] Arxiv and Product networks.
Researcher Affiliation Academia Chester Holtz Halicioˇglu Data Science Institute University of California San Diego La Jolla, CA chholtz@ucsd.edu Pengwen Chen Department of Applied Mathematics National Chung Hsing University South District, Taichung, Taiwan pengwen@email.nchu.edu.tw Zhengchao Wan Department of Mathematics University of Missouri Columbia, MO zwan@missouri.edu Chung-Kuan Cheng Department of Computer Science University of California San Diego La Jolla, CA ckcheng@ucsd.edu Gal Mishne Halicioˇglu Data Science Institute University of California San Diego La Jolla, CA gmishne@ucsd.edu
Pseudocode Yes Algorithm 1 Cut SSL Input: Laplacian L, labels B, initialization X0 Output: One-hot label predictions X 1: function ADMM( Lu,s, X0) 2: X X0 3: while not converged do 4: Xt+1 = L 1 u,s{B + Tt 1nµ 1 µ21 k Λt} µ1, µ2 given by (15) 5: Tt+1 max(Λt + Xt+1, 0) Projection of T 6: Λt+1 = Λt + Xt+1 Tt+1 update multipliers 7: end while 8: return Xt 9: end function
Open Source Code Yes Our implementation is available on github1. 1https://github.com/Mishne-Lab/Cut SSL
Open Datasets Yes We evaluate our method on three image datasets: MNIST [12], Fashion-MNIST [33] and Cifar-10 [23]... We also evaluate cut SSL on various real-world networks, including the Cora citation network and the OGB [18] Arxiv and Product networks.
Dataset Splits No The paper describes using 'partially-labeled training set consisting of labeled examples and unlabeled examples' and evaluates 'label rates' (e.g., 1 label per class, 3 labels per class). It also states 'We used all available data to construct the graph...'. However, it does not explicitly provide percentages or counts for distinct train/validation/test dataset splits, nor does it refer to predefined splits for these purposes.
Hardware Specification Yes All experiments were evaluated on Colab instances. These instances were equipped with 2-core AMD EPYC 7B12 CPUs and 13 GB of ram.
Software Dependencies No The paper mentions 'an implementation of the core algorithm and experimental setup in Python and Num Py and the graphlearning package [9]'. However, it does not provide specific version numbers for Python, NumPy, or the graphlearning package, which are required for reproducible software dependencies.
Experiment Setup Yes In practice, we set s = 0.1 for all experiments. The procedure presented in Alg. 1 is run for 100 iterations. In Table 1 we present the accuracy of our method on Cifar-10 across various label rates... The graph was constructed as a k-nearest neighbor graph with Gaussian edge weights given by wij = exp 4||vi vj||2/dk(vi)2 , where vi are the latent variables for image i, and dk(vi) is the distance in the latent space between vi and its kth nearest neighbor. We used k = 10 in all experiments and symmetrize W... The autoencoder was trained for 100 epochs on each dataset.