reproducibilityindex.ai

Mayfly: a Neural Data Structure for Graph Stream Summarization

Authors: Yuan Feng, Yukun Cao, Wang Hairu, Xike Xie, S Kevin Zhou

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical studies show that the Mayﬂy signiﬁcantly outperforms its handcrafted competitors. Extensive empirical studies show that our proposal signiﬁcantly outperforms state-of-the-art methods.
Researcher Affiliation	Academia	Yuan Feng1,3, , Yukun Cao1,3, , Hairu Wang1,3, Xike Xie2,3, , and S. Kevin Zhou2,3 1School of Computer Science, University of Science and Technology of China (USTC), China 2School of Biomedical Engineering, USTC, China 3Data Darkness Lab, MIRACLE Center, Suzhou Institute for Advanced Research, USTC, China {yfung,ykcho,wanghairu}@mail.ustc.edu.cn, xkxie@ustc.edu.cn, s.kevin.zhou@gmail.com
Pseudocode	Yes	Algorithm 1: Details of Mayﬂy Operations
Open Source Code	Yes	The code for Mayﬂy has been included in the supplementary materials.
Open Datasets	Yes	We use four commonly used public graph stream datasets, comprising two medium-sized datasets (Lkml, Enron) and two large-scale datasets (Coauthor, Twitter).
Dataset Splits	No	Metamorphosis Phase. We split each dataset into Dtrain and Dtest, using a 2:8 based on timestamps. The paper specifies train/test split but does not explicitly mention a separate validation set split.
Hardware Specification	Yes	All of our experiments run at a NVIDIA DGX workstation with CPU Xeon-8358 (2.60GHz, 32 cores), and 4 NVIDIA A100 GPUs (6912 CUDA cores and 80GB GPU memory on each GPU).
Software Dependencies	No	The paper describes the neural network architectures and activation functions used (e.g., MLP, Relu) but does not provide specific version numbers for software libraries like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	Larval Phase. We set γ = 60, 000 and use the Zipf distributions with α ranging from 0.3 to 0.8 to build the distribution pool P. The total weight sum is ranging from 5 to 50 times of the edges in graph. The number of training steps is 500,000 and the learning rate is 0.0005.