Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FlowRefiner: A Robust Traffic Classification Framework against Label Noise

Authors: Mingwei Zhan, Ruijie Zhao, Xianwen Deng, Zhi Xue, Qi Li, Zhuotao Liu, Guang Cheng, Ke Xu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on four traffic datasets with various classification scenarios across varying noise ratios. Experimental results demonstrate that FLOWREFINER mitigates the impact of label noise and consistently outperforms state-of-the-art baselines by a large margin.
Researcher Affiliation	Academia	Mingwei Zhan1, Ruijie Zhao2*, Xianwen Deng1, Zhi Xue1 , Qi Li3, Zhuotao Liu3, Guang Cheng2, Ke Xu3 1Shanghai Jiao Tong University 2Southeast University 3Tsinghua University EMAIL EMAIL EMAIL
Pseudocode	No	The paper describes the methodology in prose and through architectural diagrams (Figure 1, Figure 5, Figure 6, Figure 7) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/NSSL-SJTU/Flow Refiner.
Open Datasets	Yes	We conduct our experiments on four real-world traffic datasets: ISCXVPN (45), Cross Platform (46), USTC-TFC (16), and Malware (47)... Malware (47) is a recently published dataset featuring traffic from 10 malware families, available at https://malware-traffic-analysis.net/about.html. Previous studies (20; 21) have revealed significant label noise in the CICIDS2017 dataset (51), and provide a corrected version with revised flow labels.
Dataset Splits	Yes	ISCXVPN (45) ... The dataset consists of 2,275 training samples and 569 test samples. Cross Platform (46) ... The dataset consists of 5,429 training samples and 1,372 test samples. USTC-TFC (16) ... The dataset consists of 1,914 training samples and 483 test samples. Malware (47) ... The dataset consists of 2,938 training samples and 740 test samples. To comprehensively evaluate the robustness of our methods against label noise, we generate noisy datasets for each scenario with different noise ratios (5%, 10%, 20%, 40%, 60%).
Hardware Specification	Yes	All experiments are implemented in four NVIDIA Ge Force RTX3090 GPUs with Py Torch 1.9.0. Our framework currently takes about 3 minutes on an RTX 3090 GPU with 2.79 GB of memory for training.
Software Dependencies	Yes	All experiments are implemented in four NVIDIA Ge Force RTX3090 GPUs with Py Torch 1.9.0.
Experiment Setup	Yes	In the training stage, we set the batch size as 64, the epochs as 20, and the learning rate as 4 10 3 with the Adam W optimizer (50). In noise detection, we set the parameter of clustering granularity as n = 5, and the top m = 2 most frequent labels in the cluster are defined as the majority labels. The high and low confidence thresholds are setted as τh = 0.9 and τl = 0.7.