Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mayfly: a Neural Data Structure for Graph Stream Summarization

Authors: Yuan Feng, Yukun Cao, Wang Hairu, Xike Xie, S Kevin Zhou

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical studies show that the Mayfly significantly outperforms its handcrafted competitors. Extensive empirical studies show that our proposal significantly outperforms state-of-the-art methods.
Researcher Affiliation Academia Yuan Feng1,3, , Yukun Cao1,3, , Hairu Wang1,3, Xike Xie2,3, , and S. Kevin Zhou2,3 1School of Computer Science, University of Science and Technology of China (USTC), China 2School of Biomedical Engineering, USTC, China 3Data Darkness Lab, MIRACLE Center, Suzhou Institute for Advanced Research, USTC, China EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Details of Mayfly Operations
Open Source Code Yes The code for Mayfly has been included in the supplementary materials.
Open Datasets Yes We use four commonly used public graph stream datasets, comprising two medium-sized datasets (Lkml, Enron) and two large-scale datasets (Coauthor, Twitter).
Dataset Splits No Metamorphosis Phase. We split each dataset into Dtrain and Dtest, using a 2:8 based on timestamps. The paper specifies train/test split but does not explicitly mention a separate validation set split.
Hardware Specification Yes All of our experiments run at a NVIDIA DGX workstation with CPU Xeon-8358 (2.60GHz, 32 cores), and 4 NVIDIA A100 GPUs (6912 CUDA cores and 80GB GPU memory on each GPU).
Software Dependencies No The paper describes the neural network architectures and activation functions used (e.g., MLP, Relu) but does not provide specific version numbers for software libraries like Python, PyTorch, or TensorFlow.
Experiment Setup Yes Larval Phase. We set γ = 60, 000 and use the Zipf distributions with α ranging from 0.3 to 0.8 to build the distribution pool P. The total weight sum is ranging from 5 to 50 times of the edges in graph. The number of training steps is 500,000 and the learning rate is 0.0005.