Mayfly: a Neural Data Structure for Graph Stream Summarization

Authors: Yuan Feng, Yukun Cao, Wang Hairu, Xike Xie, S Kevin Zhou

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical studies show that the Mayfly significantly outperforms its handcrafted competitors. Extensive empirical studies show that our proposal significantly outperforms state-of-the-art methods.
Researcher Affiliation Academia Yuan Feng1,3, , Yukun Cao1,3, , Hairu Wang1,3, Xike Xie2,3, , and S. Kevin Zhou2,3 1School of Computer Science, University of Science and Technology of China (USTC), China 2School of Biomedical Engineering, USTC, China 3Data Darkness Lab, MIRACLE Center, Suzhou Institute for Advanced Research, USTC, China {yfung,ykcho,wanghairu}@mail.ustc.edu.cn, xkxie@ustc.edu.cn, s.kevin.zhou@gmail.com
Pseudocode Yes Algorithm 1: Details of Mayfly Operations
Open Source Code Yes The code for Mayfly has been included in the supplementary materials.
Open Datasets Yes We use four commonly used public graph stream datasets, comprising two medium-sized datasets (Lkml, Enron) and two large-scale datasets (Coauthor, Twitter).
Dataset Splits No Metamorphosis Phase. We split each dataset into Dtrain and Dtest, using a 2:8 based on timestamps. The paper specifies train/test split but does not explicitly mention a separate validation set split.
Hardware Specification Yes All of our experiments run at a NVIDIA DGX workstation with CPU Xeon-8358 (2.60GHz, 32 cores), and 4 NVIDIA A100 GPUs (6912 CUDA cores and 80GB GPU memory on each GPU).
Software Dependencies No The paper describes the neural network architectures and activation functions used (e.g., MLP, Relu) but does not provide specific version numbers for software libraries like Python, PyTorch, or TensorFlow.
Experiment Setup Yes Larval Phase. We set γ = 60, 000 and use the Zipf distributions with α ranging from 0.3 to 0.8 to build the distribution pool P. The total weight sum is ranging from 5 to 50 times of the edges in graph. The number of training steps is 500,000 and the learning rate is 0.0005.