Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improved large-scale graph learning through ridge spectral sparsification
Authors: Daniele Calandriello, Alessandro Lazaric, Ioannis Koutis, Michal Valko
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our theoretical ļ¬ndings by testing how (ε, γ)-sparsiļ¬ers improves computational complexity without sacriļ¬cing ļ¬nal accuracy. Dataset. We run experiments on the Amazon co-purchase graph (Sadhanala et al., 2016). This graph ļ¬ts our setting: It cannot be generated from vectorial data and is only artiļ¬cially sparse, since the crawler that created it had no access to the true private co-purchase network held by Amazon. To compensate, Gleich & Mahoney (2015) use a densiļ¬cation procedure that given the graph adjacency matrix AG, computes all k-step neighbors AG,k Pk s=1 As G. We make the graph unweighted for numerical stability. The ļ¬nal graph has n = 334, 863 nodes and m = 98, 465, 352 edges, with an average degree of 294. We followed an approach similar to Sadhanala et al. (2016) and introduce a hand-designed smooth signal as a target. We then perform 2000 iterations of the power method to compute an approximation of the smallest eigenvector vmin, which is used as a smooth function over the graph. Baselines. For all setups, we compute an exact solution (up to convergence error) using a fast linear solver. Computing this EXACT baseline requires O(m log(n)) time and O(m) space and achieves the best performance. Afterwards, we compare three different sparsiļ¬cation procedures to eval- Improved Large-Scale Graph Learning through Ridge Spectral Sparsiļ¬cation uate if they can accelerate computation while preserving accuracy. We run Di SRe with different values of γ depending on the setting. For empirically strong heuristics, we attempted to uniformly subsample the edges, but at the sparsity level achieved by the other methods, the uniformly sampled sparsiļ¬er is disconnected and highly inaccurate. Instead, we compare to the state-of-the-art k-neighbors (k N) heuristic by Sadhanala et al. (2016), which is just as fast as uniform sampling and more accurate in practice. |
| Researcher Affiliation | Collaboration | 1Seque L team, INRIA Lille Nord Europe, France 2LCSL, IIT, Italy, and MIT, USA. 3New Jersey Institute of Technology, USA 4Facebook AI Research, Paris, France. |
| Pseudocode | Yes | Algorithm 1 The Di SRe algorithm. Input: G Output: HG 1: Partition G into k sub-graphs: 2: H1,ā Gā {(ei,j, qe = 1, ep1,e = 1)} 3: Initialize set S1 = {H1,ā}k ā=1 4: for h = 1, . . . , k 1 do 5: Pick two sparsiļ¬ers Hh,i , Hh,i from Sh 6: H Merge-Resparsify(Hh,i, Hh,i ) 7: Place H back into Sh+1 8: end for 9: Return HG, the last sparsiļ¬er in Sk |
| Open Source Code | No | The paper does not provide explicit statements or links to open-source code for the methodology described. |
| Open Datasets | Yes | Dataset. We run experiments on the Amazon co-purchase graph (Sadhanala et al., 2016). |
| Dataset Splits | No | We set f = vmin and test different levels of noise, log10(Ļ) { 3, 2, 1, 0}. After constructing the sparsiļ¬er H, we compute an approximate solution ef using LAPSMO (Eq. 2) with Ī» {10 3, 10 2, 10 1, 1, 10}. ... The labels are generated taking the sign of f = vmin and ā {20, 346, 672, 1000} labels are revealed. The labeled nodes are chosen at random so that 0 and 1 labels are balanced in the dataset. |
| Hardware Specification | No | Experiments presented in this paper were carried out using the Grid 5000 testbed, supported by a scientiļ¬c interest group hosted by Inria and including CNRS, RENATER, and several universities as well as other organizations (see https://www.grid5000.fr). |
| Software Dependencies | No | The paper mentions using a 'fast linear solver' and 'SSD solver' but does not specify any software names with version numbers for reproducibility. |
| Experiment Setup | Yes | We then perform 2000 iterations of the power method to compute an approximation of the smallest eigenvector vmin, which is used as a smooth function over the graph. ... We set f = vmin and test different levels of noise, log10(Ļ) { 3, 2, 1, 0}. After constructing the sparsiļ¬er H, we compute an approximate solution ef using LAPSMO (Eq. 2) with Ī» {10 3, 10 2, 10 1, 1, 10}. ... We run Stable-HFS with Ī» {10 6, 10 4, 10 2, 1}. ... We repeat each experiment 10 times with different sparsiļ¬ers and report the average performance of ef on the speciļ¬c task and its standard deviation. |