reproducibilityindex.ai

Triangle and Four Cycle Counting with Predictions in Graph Streams

Authors: Justin Y Chen, Talya Eden, Piotr Indyk, Honghao Lin, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner, David Woodruff, Michael Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms. 4 EXPERIMENTS
Researcher Affiliation	Collaboration	Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, and Sandeep Silwal Computer Science and Artiﬁcial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA {justc, indyk, shyamsn, ronitt, silwal}@mit.edu Honghao Lin, David P. Woodruff, Michael Zhang Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213, USA {honghaol, dwoodruf, jinyaoz}@andrew.cmu.edu Tal Wagner Microsoft Research Redmond, WA 98052, USA tal.wagner@gmail.com Talya Eden Massachusetts Institute of Technology and Boston University Cambridge, MA 02139, USA teden@mit.edu
Pseudocode	Yes	Algorithm 1 Counting Triangles in the Adjacency List Model
Open Source Code	No	The paper mentions using code from other authors for baselines (e.g., "We use the authors code for our experiments (Shin et al., 2020; Shin, 2020).") and for specific components (e.g., "Here we use the method and code proposed in Zhang & Chen (2018)."). However, it does not explicitly state that the source code for the methodology described in this paper is released or provide a link to it.
Open Datasets	Yes	The Oregon and CAIDA datasets come from Leskovec & Krevl (2014); Leskovec et al. (2005), the Wikibooks dataset comes from Rossi & Ahmed (2015), the Reddit dataset comes from Leskovec & Krevl (2014); Kumar et al. (2018), the Twitch dataset comes from Rozemberczki et al. (2019), the Wikipedia dataset comes from Rossi & Ahmed (2015), and the Powerlaw graphs are sampled from the Chung-Lu-Vu random graph model with expected degree of the i-th vertex proportional to 1/i2 (Chung et al., 2003).
Dataset Splits	Yes	In the Wiki Books temporal graph, we use the exact Ne counts on the ﬁrst half of the graph edges (when sorted by their timestamps) as the predicted values for the second half. To produce an edge f(e) embedding for an edge e = uv from the node embedding of its endpoints, f(u) and f(v)... Training is done on a preﬁx of the ﬁrst half of the edges. For each of the two networks, we start with a graph that has twice as many edges as listed in Table 2... We then randomly remove 50% of the total edges to be the training data set, and use the remaining edges as the graph we test on.
Hardware Specification	Yes	All of our graph experiments were done on a CPU with i5 2.7 GHz dual core and 8 GB RAM or a CPU with i7 1.9 GHz 8 core and 16GB RAM.
Software Dependencies	No	The paper mentions using "the authors code" for baselines (Think D and WRS) and refers to the GNN method by Zhang & Chen (2018), implying software was used. However, it does not list specific software components with version numbers (e.g., "Python 3.8, PyTorch 1.9, and CUDA 11.1") that are needed to replicate their experiment.
Experiment Setup	Yes	To remedy this discrepancy, we modify our algorithm slightly by setting a ﬁxed fraction of space to use for heavy edges (10% of space for all of our experiments) and setting p correspondingly to use up the rest of the space bound given as input. The constant 0.3 is ﬁxed throughout all our experiments. We then set p = 0.7 Z/m. for the multi-layer version), we use 10% of the total space for storing the top k = Z/10 edges, and 70% of the space for sub-sampling the edges that the oracle predict value is very tiny... Then, we use 20% of the space for sub-sampling the remaining edges...