Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Self-Supervised Learning of Graph Representations for Network Intrusion Detection

Authors: Lorenzo Guerra, Thomas Chapuis, Guillaume Duc, Pavlo Mozharovskyi, Van-Tam Nguyen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model on multiple Net Flow-based datasets for Network Intrusion Detection Systems (NIDS), including NF-UNSW-NB15 and NF-CSE-CIC-IDS2018. We conduct extensive experiments on NIDS datasets, covering diverse network environments and a wide range of attack types. Graph IDS achieves state-of-the-art performance, outperforming existing methods by 5% to over 25% in both macro F1 and PR-AUC.
Researcher Affiliation	Collaboration	Lorenzo Guerra1,2 Thomas Chapuis2 Guillaume Duc1 Pavlo Mozharovskyi1 Van-Tam Nguyen1 1LTCI, Télécom Paris, Institut Polytechnique de Paris, Palaiseau, France {name.surname}@telecom-paris.fr 2Ampere Software Technology, Guyancourt, France {name.surname}@ampere.cars
Pseudocode	No	The paper describes the methodology in sections 3.1, 3.2, 3.3, and 3.4 using descriptive text and figures (e.g., Figure 1, Figure 2, Figure 3), but it does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and pre-trained models: https://github.com/lorenzo9uerra/Graph IDS.
Open Datasets	Yes	We evaluate our model on multiple Net Flow-based datasets for Network Intrusion Detection Systems (NIDS), including NF-UNSW-NB15 and NF-CSE-CIC-IDS2018. We use both their second version (v2), which includes 43 Net Flow features, and their third version (v3), which extends these with 10 additional temporal features, resulting in a total of 53 features [3][4]. The datasets are available under the "Permitted reuse with commercial use restriction" license (https://guides.library.uq.edu.au/deposit-your-data/license-reuse-noncommercial).
Dataset Splits	Yes	Before training, we replace missing or invalid values with zeros and split the dataset into 80% for the training set, 10% for the validation set, and 10% for the test set, preserving the class distribution of attack types.
Hardware Specification	Yes	We train Graph IDS for a maximum of 100 epochs using the Adam W optimizer on a machine with an NVIDIA A100 GPU, 32 GB of RAM, and 8 CPU cores of an AMD EPYC 7302 processor.
Software Dependencies	Yes	We implemented our model using Py Torch 2.3.1 and DGL 2.3.0.
Experiment Setup	Yes	We train Graph IDS for a maximum of 100 epochs using the Adam W optimizer on a machine with an NVIDIA A100 GPU, 32 GB of RAM, and 8 CPU cores of an AMD EPYC 7302 processor. [...] We initially explored coarse configurations through grid search, followed by Bayesian optimization to tune the hyperparameters based on validation Precision-Recall Area Under Curve (PR-AUC). Table 14 reports the complete set of optimized hyperparameters used for the Graph IDS model.