Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Improved large-scale graph learning through ridge spectral sparsification
Authors: Daniele Calandriello, Alessandro Lazaric, Ioannis Koutis, Michal Valko
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our theoretical ļ¬ndings by testing how (ε, γ)-sparsiļ¬ers improves computational complexity without sacriļ¬cing ļ¬nal accuracy. Dataset. We run experiments on the Amazon co-purchase graph (Sadhanala et al., 2016). This graph ļ¬ts our setting: It cannot be generated from vectorial data and is only artiļ¬cially sparse, since the crawler that created it had no access to the true private co-purchase network held by Amazon. To compensate, Gleich & Mahoney (2015) use a densiļ¬cation procedure that given the graph adjacency matrix AG, computes all k-step neighbors AG,k Pk s=1 As G. We make the graph unweighted for numerical stability. The ļ¬nal graph has n = 334, 863 nodes and m = 98, 465, 352 edges, with an average degree of 294. We followed an approach similar to Sadhanala et al. (2016) and introduce a hand-designed smooth signal as a target. We then perform 2000 iterations of the power method to compute an approximation of the smallest eigenvector vmin, which is used as a smooth function over the graph. Baselines. For all setups, we compute an exact solution (up to convergence error) using a fast linear solver. Computing this EXACT baseline requires O(m log(n)) time and O(m) space and achieves the best performance. Afterwards, we compare three different sparsiļ¬cation procedures to eval- Improved Large-Scale Graph Learning through Ridge Spectral Sparsiļ¬cation uate if they can accelerate computation while preserving accuracy. We run Di SRe with different values of γ depending on the setting. For empirically strong heuristics, we attempted to uniformly subsample the edges, but at the sparsity level achieved by the other methods, the uniformly sampled sparsiļ¬er is disconnected and highly inaccurate. Instead, we compare to the state-of-the-art k-neighbors (k N) heuristic by Sadhanala et al. (2016), which is just as fast as uniform sampling and more accurate in practice. |
| Researcher Affiliation | Collaboration | 1Seque L team, INRIA Lille Nord Europe, France 2LCSL, IIT, Italy, and MIT, USA. 3New Jersey Institute of Technology, USA 4Facebook AI Research, Paris, France. |
| Pseudocode | Yes | Algorithm 1 The Di SRe algorithm. Input: G Output: HG 1: Partition G into k sub-graphs: 2: H1,ā Gā {(ei,j, qe = 1, ep1,e = 1)} 3: Initialize set S1 = {H1,ā}k ā=1 4: for h = 1, . . . , k 1 do 5: Pick two sparsiļ¬ers Hh,i , Hh,i from Sh 6: H Merge-Resparsify(Hh,i, Hh,i ) 7: Place H back into Sh+1 8: end for 9: Return HG, the last sparsiļ¬er in Sk |
| Open Source Code | No | The paper does not provide explicit statements or links to open-source code for the methodology described. |
| Open Datasets | Yes | Dataset. We run experiments on the Amazon co-purchase graph (Sadhanala et al., 2016). |
| Dataset Splits | No | We set f = vmin and test different levels of noise, log10(Ļ) { 3, 2, 1, 0}. After constructing the sparsiļ¬er H, we compute an approximate solution ef using LAPSMO (Eq. 2) with Ī» {10 3, 10 2, 10 1, 1, 10}. ... The labels are generated taking the sign of f = vmin and ā {20, 346, 672, 1000} labels are revealed. The labeled nodes are chosen at random so that 0 and 1 labels are balanced in the dataset. |
| Hardware Specification | No | Experiments presented in this paper were carried out using the Grid 5000 testbed, supported by a scientiļ¬c interest group hosted by Inria and including CNRS, RENATER, and several universities as well as other organizations (see https://www.grid5000.fr). |
| Software Dependencies | No | The paper mentions using a 'fast linear solver' and 'SSD solver' but does not specify any software names with version numbers for reproducibility. |
| Experiment Setup | Yes | We then perform 2000 iterations of the power method to compute an approximation of the smallest eigenvector vmin, which is used as a smooth function over the graph. ... We set f = vmin and test different levels of noise, log10(Ļ) { 3, 2, 1, 0}. After constructing the sparsiļ¬er H, we compute an approximate solution ef using LAPSMO (Eq. 2) with Ī» {10 3, 10 2, 10 1, 1, 10}. ... We run Stable-HFS with Ī» {10 6, 10 4, 10 2, 1}. ... We repeat each experiment 10 times with different sparsiļ¬ers and report the average performance of ef on the speciļ¬c task and its standard deviation. |