Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Temporal Heterogeneous Graph Generation with Privacy, Utility, and Efficiency
Authors: Xinyu He, Dongqi Fu, Hanghang Tong, Ross Maciejewski, Jingrui He
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, based on temporal heterogeneous graph datasets with up to 1 million nodes and 20 million edges, the experiments show that THEPUFF generates utilizable temporal heterogeneous graphs with privacy protected, compared with state-of-the-art baselines. |
| Researcher Affiliation | Collaboration | Xinyu He , Dongqi Fu , Hanghang Tong, Ross Maciejewski, Jingrui He University of Illinois Urbana-Champaign, Meta AI, Arizona State University EMAIL, {dongqifu}@meta.com, {rmacieje}@asu.edu |
| Pseudocode | Yes | The general graph perturbation process is summarized in Alg. 1 in Appendix A.3. ... A.3 PSEUDO CODES ... Algorithm 1 Graph Perturbation based on Differential Privacy ... Algorithm 2 Privacy-Utility Adversarial Training ... Algorithm 3 Pseudo-code of Dutil() ... Algorithm 4 Pseudo-code of Assembler |
| Open Source Code | Yes | 1Dataset statistics and more implementation details are summarized in Appendix A.5. Code is at https://github.com/xinyuu-he/THe PUff. |
| Open Datasets | Yes | Datasets. To test the performance, we utilize 4 real-world publicly-available temporal heterogeneous graph datasets from academic citation graphs (DBLP), online rating graphs (ML-100k, ML20M), and million-node online shopping graphs (Taobao). ... Movie Lens-100k2, DBLP3, Movie Lens-20M4, and Taobao5 are publicly available. 2https://www.kaggle.com/datasets/prajitdatta/movielens-100k-dataset 3https://www.aminer.org/citation 4https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset 5https://tianchi.aliyun.com/dataset/649 |
| Dataset Splits | No | During the adversarial training, we extract sampled subgraphs (e.g., via random walks) as model inputs. The paper discusses input sampling and mini-batches but does not explicitly state train/test/validation splits for the datasets used in evaluation. |
| Hardware Specification | Yes | Machine Configuration. All experiments are performed on a Linux platform with Intel(R) Xeon(R) Gold 6240R CPU and Tesla V100 SXM2 32GB GPU. |
| Software Dependencies | No | SGD optimizer is used for discriminators, while RMSprop optimizer is used for the generator; The paper mentions optimizers (SGD, RMSprop) and model architectures (LSTM, tri-level attention networks) but does not provide specific version numbers for any software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | Hyperparameters. Table 2 is implemented with the following hyperparameters: ϵ = 8 for all datasets, ϵ+ is decided by Eq. 4. batch size = 32 for Movie Lens 100K dataset and DBLP dataset, 64 for other datasets; node embedding dimension = 128; hidden dimensions are all set to 128; dropout rate = 0.2 in the attention layer; learning rate = 1e 4 for the generator and 1e 3 for discriminators; SGD optimizer is used for discriminators, while RMSprop optimizer is used for the generator; JUST (Hussein et al., 2018) is applied to initialize node embeddings. In the running of JUST, we have the maximum walk length as 100; sample maximum of 10 walks starting from each node. |