Triangle and Four Cycle Counting with Predictions in Graph Streams
Authors: Justin Y Chen, Talya Eden, Piotr Indyk, Honghao Lin, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner, David Woodruff, Michael Zhang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms. 4 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, and Sandeep Silwal Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA {justc, indyk, shyamsn, ronitt, silwal}@mit.edu Honghao Lin, David P. Woodruff, Michael Zhang Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213, USA {honghaol, dwoodruf, jinyaoz}@andrew.cmu.edu Tal Wagner Microsoft Research Redmond, WA 98052, USA tal.wagner@gmail.com Talya Eden Massachusetts Institute of Technology and Boston University Cambridge, MA 02139, USA teden@mit.edu |
| Pseudocode | Yes | Algorithm 1 Counting Triangles in the Adjacency List Model |
| Open Source Code | No | The paper mentions using code from other authors for baselines (e.g., "We use the authors code for our experiments (Shin et al., 2020; Shin, 2020).") and for specific components (e.g., "Here we use the method and code proposed in Zhang & Chen (2018)."). However, it does not explicitly state that the source code for the methodology described in this paper is released or provide a link to it. |
| Open Datasets | Yes | The Oregon and CAIDA datasets come from Leskovec & Krevl (2014); Leskovec et al. (2005), the Wikibooks dataset comes from Rossi & Ahmed (2015), the Reddit dataset comes from Leskovec & Krevl (2014); Kumar et al. (2018), the Twitch dataset comes from Rozemberczki et al. (2019), the Wikipedia dataset comes from Rossi & Ahmed (2015), and the Powerlaw graphs are sampled from the Chung-Lu-Vu random graph model with expected degree of the i-th vertex proportional to 1/i2 (Chung et al., 2003). |
| Dataset Splits | Yes | In the Wiki Books temporal graph, we use the exact Ne counts on the first half of the graph edges (when sorted by their timestamps) as the predicted values for the second half. To produce an edge f(e) embedding for an edge e = uv from the node embedding of its endpoints, f(u) and f(v)... Training is done on a prefix of the first half of the edges. For each of the two networks, we start with a graph that has twice as many edges as listed in Table 2... We then randomly remove 50% of the total edges to be the training data set, and use the remaining edges as the graph we test on. |
| Hardware Specification | Yes | All of our graph experiments were done on a CPU with i5 2.7 GHz dual core and 8 GB RAM or a CPU with i7 1.9 GHz 8 core and 16GB RAM. |
| Software Dependencies | No | The paper mentions using "the authors code" for baselines (Think D and WRS) and refers to the GNN method by Zhang & Chen (2018), implying software was used. However, it does not list specific software components with version numbers (e.g., "Python 3.8, PyTorch 1.9, and CUDA 11.1") that are needed to replicate their experiment. |
| Experiment Setup | Yes | To remedy this discrepancy, we modify our algorithm slightly by setting a fixed fraction of space to use for heavy edges (10% of space for all of our experiments) and setting p correspondingly to use up the rest of the space bound given as input. The constant 0.3 is fixed throughout all our experiments. We then set p = 0.7 Z/m. for the multi-layer version), we use 10% of the total space for storing the top k = Z/10 edges, and 70% of the space for sub-sampling the edges that the oracle predict value is very tiny... Then, we use 20% of the space for sub-sampling the remaining edges... |