Suitable is the Best: Task-Oriented Knowledge Fusion in Vulnerability Detection
Authors: Jingjing Wang, Minhuan Huang, yuanping nie, Xiang Li, Qianjin Du, Wei Kong, Huan Deng, Xiaohui Kuang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that KF-GVD outperforms SOTAs on function-level and statement-level vulnerability detection across various target tasks, with an average increase of 40.9% in precision and 26.1% in recall. |
| Researcher Affiliation | Academia | Jingjing Wang Institute of Systems Engineering, Academy of Military Sciences, PLA jennywangel@163.com Minhuan Huang Institute of Systems Engineering, Academy of Military Sciences, PLA darbean@126.com Yuanpin Nie Institute of Systems Engineering, Academy of Military Sciences, PLA yuanpingnie@nudt.edu.cn Xiang Li Institute of Systems Engineering, Academy of Military Sciences, PLA ideal_work@163.com Qianjin Du Department of Computer Science and Technology, Tsinghua University dqj20@mails.tsinghua.edu.cn Wei Kong School of Information Science and Engineering, Zhejiang Sci-Tech University kong_wei@ieee.org Huan Deng Institute of Systems Engineering, Academy of Military Sciences, PLA denghuan619@163.com Xiaohui Kuang Institute of Systems Engineering, Academy of Military Sciences, PLA xiaohui_kuang@163.com |
| Pseudocode | No | The paper describes the method and model architecture in prose and figures (e.g., Figure 3, Figure 5) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The dataset has been uploaded to the supplementary materials, and the detail can be found in Appendix C. |
| Open Datasets | Yes | The source task dataset consists of 80% CWE-119 and CWE-416 type vulnerability information extensively collected from 13 real-world C++ projects from NVD3. The remaining 20% is sourced from academic security defects and synthetic data provided by SARD4. |
| Dataset Splits | Yes | Train:Validation:Test 8:1:1 |
| Hardware Specification | Yes | We conducted all experiments on a workstation equipped with a Quadro RTX 6000 GPU. |
| Software Dependencies | Yes | CPGs corresponding to source code files were generated using Joern version 1.1.1033. We employed a pre-trained Word2Vec model... The SAGPool model deployed in both source and target tasks were implemented using Py Torch version 1.4.0 and CUDA version 10.2. |
| Experiment Setup | Yes | Model Parameter Setting Min count 0.001 Size 30 Window 5 Embedding dim 300 Hidden dim 32 Activation funcion Relu Learning rate 0.0001 Optimizer Adam Train:Validation:Test 8:1:1 |