Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
DyG2Vec: Efficient Representation Learning for Dynamic Graphs
Authors: Mohammad Alomrani, Mahdi Biparva, Yingxue Zhang, Mark Coates
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on 7 benchmark datasets indicate that on average, our model outperforms So TA baselines on the future link prediction task by 4.23% for the transductive setting and 3.30% for the inductive setting while only requiring 5-10x less training/inference time. Lastly, different aspects of the proposed framework are investigated through experimental analysis and ablation studies. |
| Researcher Affiliation | Collaboration | Mohammad Ali Alomrani EMAIL Huawei Noah s Ark Lab Mahdi Biparva EMAIL Huawei Noah s Ark Lab Yingxue Zhang EMAIL Huawei Noah s Ark Lab Mark Coates EMAIL Mc Gill University |
| Pseudocode | No | The paper describes the methodology in text and uses figures (Figure 1 and Figure 2) to illustrate architectures and frameworks, but it does not include a dedicated pseudocode or algorithm block. |
| Open Source Code | Yes | The code is publicly available at https://github.com/huawei-noah/noah-research/tree/master/graph_atlas. |
| Open Datasets | Yes | The code and datasets are publicly available at https://github. com/huawei-noah/noah-research/tree/master/graph_atlas. We use 7 real-world datasets: Wikipedia, Reddit, MOOC, and Last FM (Kumar et al., 2019); Social Evolution, Enron, and UCI (Wang et al., 2021b). |
| Dataset Splits | Yes | We perform the same 70%-15%-15% chronological split for all datasets as in Wang et al. (2021b). |
| Hardware Specification | Yes | All experiments were done on a Ubuntu 20.4 server with 72 Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz cores and a RAM of size 755 Gb. We use a NVIDIA Tesla V100-PCIE-32GB GPU. |
| Software Dependencies | No | The paper mentions using the Pytorch framework (Paszke et al., 2019) and Pytorch Geometric (Fey & Lenssen, 2019) but does not provide specific version numbers for these software dependencies. It also mentions Ubuntu 20.4 as the operating system. |
| Experiment Setup | Yes | We use a constant learning rate of 0.0001 for all datasets and tasks. Dy G2Vec is trained for 100 epochs for both downstream and SSL pre-training. For downstream training, We use a constant window size of 64K for all datasets except for MOOC, Social Evolve, and Enron where we found a smaller window size of 8K works best. The batch size is set to 200 target edges. During SSL pre-training, we use a constant window size of 32K with stride 200. Following previous work (Rossi et al., 2020; Xu et al., 2020), all dynamic node classification training experiments are performed with L2-decay parameter λ = 0.00001 to alleviate over-fitting. Both distortions are applied with dropout probability pd = 0.3. |