Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Triangle and Four Cycle Counting with Predictions in Graph Streams
Authors: Justin Y Chen, Talya Eden, Piotr Indyk, Honghao Lin, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner, David Woodruff, Michael Zhang
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms. 4 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, and Sandeep Silwal Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA EMAIL Honghao Lin, David P. Woodruff, Michael Zhang Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213, USA EMAIL Tal Wagner Microsoft Research Redmond, WA 98052, USA EMAIL Talya Eden Massachusetts Institute of Technology and Boston University Cambridge, MA 02139, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Counting Triangles in the Adjacency List Model |
| Open Source Code | No | The paper mentions using code from other authors for baselines (e.g., "We use the authors code for our experiments (Shin et al., 2020; Shin, 2020).") and for specific components (e.g., "Here we use the method and code proposed in Zhang & Chen (2018)."). However, it does not explicitly state that the source code for the methodology described in this paper is released or provide a link to it. |
| Open Datasets | Yes | The Oregon and CAIDA datasets come from Leskovec & Krevl (2014); Leskovec et al. (2005), the Wikibooks dataset comes from Rossi & Ahmed (2015), the Reddit dataset comes from Leskovec & Krevl (2014); Kumar et al. (2018), the Twitch dataset comes from Rozemberczki et al. (2019), the Wikipedia dataset comes from Rossi & Ahmed (2015), and the Powerlaw graphs are sampled from the Chung-Lu-Vu random graph model with expected degree of the i-th vertex proportional to 1/i2 (Chung et al., 2003). |
| Dataset Splits | Yes | In the Wiki Books temporal graph, we use the exact Ne counts on the first half of the graph edges (when sorted by their timestamps) as the predicted values for the second half. To produce an edge f(e) embedding for an edge e = uv from the node embedding of its endpoints, f(u) and f(v)... Training is done on a prefix of the first half of the edges. For each of the two networks, we start with a graph that has twice as many edges as listed in Table 2... We then randomly remove 50% of the total edges to be the training data set, and use the remaining edges as the graph we test on. |
| Hardware Specification | Yes | All of our graph experiments were done on a CPU with i5 2.7 GHz dual core and 8 GB RAM or a CPU with i7 1.9 GHz 8 core and 16GB RAM. |
| Software Dependencies | No | The paper mentions using "the authors code" for baselines (Think D and WRS) and refers to the GNN method by Zhang & Chen (2018), implying software was used. However, it does not list specific software components with version numbers (e.g., "Python 3.8, PyTorch 1.9, and CUDA 11.1") that are needed to replicate their experiment. |
| Experiment Setup | Yes | To remedy this discrepancy, we modify our algorithm slightly by setting a fixed fraction of space to use for heavy edges (10% of space for all of our experiments) and setting p correspondingly to use up the rest of the space bound given as input. The constant 0.3 is fixed throughout all our experiments. We then set p = 0.7 Z/m. for the multi-layer version), we use 10% of the total space for storing the top k = Z/10 edges, and 70% of the space for sub-sampling the edges that the oracle predict value is very tiny... Then, we use 20% of the space for sub-sampling the remaining edges... |