JUST-IN-TIME VULNERABILITY DETECTION AND LOCALIZATION
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/19102Keywords:
Just-in-time vulnerability detection, Just-in-time vulnerability localization, Vulnerable commit, Vulnerable statement.Abstract
Software vulnerabilities have increased dramatically, and multiple severe attacks have occurred in recent years. This poses a critical challenge for early detection and prevention of vulnerabilities in Software Quality Assurance. This paper introduces a novel framework, JULY, which serves the dual purpose of detecting vulnerable commits and localizing the root causes of the vulnerabilities. The fundamental concept of JULY is that the determinant of the vulnerability of a commit is the inherent meaning embedded in its changed code. For just-in-time vulnerability detection (JIT-VD ), JULY represents each commit by a Code Transformation Graph and employs a Graph Neural Network model to capture their meanings and distinguish between vulnerable and non-vulnerable commits. Once a commit is detected as vulnerable, it is passed to the just-in-time vulnerability localization (JIT-VL ) model to localize the root causes, which are vulnerable changed statements. In JIT-VL , JULY encodes each statement by the following features: operation, context, and topic. Then, JULY measures the suspiciousness score of each changed statement and ranks them based on their scores. To evaluate the effectiveness of JULY, we conducted several experiments using a dataset consisting of 20,274 commits in 506 C/C++ projects. JULY achieves a remarkable improvement of 95% in Top-1 ACC and 63% in MRR compared to the state-of-the-art approaches. Furthermore, when examining the same portion (i.e., 20%) of modified statements in each commit, JULY can find twice as many vulnerable statements within a given commit as the state-of-the-art approaches.
Metrics
References
“CVE details.” [Online]. Available: https://www.cvedetails.com
S. Cao, X. Sun, L. Bo, Y. Wei, and B. Li, “Bgnn4vd: constructing bidirectional graph neuralnetwork
for vulnerability detection,” Information and Software Technology, vol. 136, p. 106576,
S. Chakraborty, R. Krishna, Y. Ding, and B. Ray, “Deep learning based vulnerability detection:
Are we there yet,” IEEE Transactions on Software Engineering, 2021.
Y. Ding, S. Suneja, Y. Zheng, J. Laredo, A. Morari, G. Kaiser, and B. Ray, “Velvet: a novel
ensemble learning approach to automatically locate vulnerable statements,” in 2022 IEEE International
Conference on Software Analysis, Evolution and Reengineering. IEEE, 2022, pp.
–970.
S. Elder, N. Zahan, R. Shu, M. Metro, V. Kozarev, T. Menzies, and L. Williams, “Do i really
need all this work to find vulnerabilities?” Empirical Software Engineering, vol. 27, no. 6, pp.
–78, 2022.
Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang,
and M. Zhou, “CodeBERT: A pre-trained model for programming and natural languages,” in
Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association
for Computational Linguistics, Nov. 2020, pp. 1536–1547.
M. Fu and C. Tantithamthavorn, “Linevul: A transformer-based line-level vulnerability prediction,”
in 2022 IEEE/ACM 19th International Conference on Mining Software Repositories
(MSR). Los Alamitos, CA, USA: IEEE Computer Society, may 2022, pp. 608–620.
H. Hanif, M. H. N. M. Nasir, M. F. Ab Razak, A. Firdaus, and N. B. Anuar, “The rise of software
vulnerability: Taxonomy of software vulnerabilities detection and machine learning approaches,”
Journal of Network and Computer Applications, vol. 179, p. 103009, 2021.
D. Hin, A. Kan, H. Chen, and M. A. Babar, “Linevd: Statement-level vulnerability detection
using graph neural networks,” in IEEE/ACM 19th International Conference on Mining Software
Repositories, MSR 2022, Pittsburgh, PA, USA, May 23-24, 2022. IEEE, 2022, pp. 596–607.
A. Hindle, E. T. Barr, M. Gabel, Z. Su, and P. Devanbu, “On the naturalness of software,”
Communications of the ACM, vol. 59, no. 5, pp. 122–131, 2016.
T. Hoang, H. K. Dam, Y. Kamei, D. Lo, and N. Ubayashi, “Deepjit: an end-to-end deep learning
framework for just-in-time defect prediction,” in 2019 IEEE/ACM 16th International Conference
on Mining Software Repositories (MSR). IEEE, 2019, pp. 34–45.
Y. Kamei, T. Fukushima, S. McIntosh, K. Yamashita, N. Ubayashi, and A. E. Hassan, “Studying
just-in-time defect prediction using cross-project models,” Empirical Software Engineering,
vol. 21, pp. 2072–2106, 2016.
Y. Li, S. Wang, and T. N. Nguyen, “Vulnerability detection with fine-grained interpretations,”
in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference
and Symposium on the Foundations of Software Engineering, 2021, pp. 292–303.
Z. Li, D. Zou, S. Xu, H. Jin, Y. Zhu, and Z. Chen, “Sysevr: A framework for using deep learning
to detect software vulnerabilities,” IEEE Transactions on Dependable and Secure Computing,
Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y. Zhong, “Vuldeepecker: A deep
learning-based system for vulnerability detection,” arXiv preprint arXiv:1801.01681, 2018.
G. Lin, J. Zhang, W. Luo, L. Pan, O. De Vel, P. Montague, and Y. Xiang, “Software vulnerability
discovery via learning multi-domain knowledge bases,” IEEE Transactions on Dependable and
Secure Computing, 2019.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations
in vector space,” in 1st International Conference on Learning Representations, ICLR 2013,
Scottsdale, Arizona, USA, May 2-4, 2013, Y. Bengio and Y. LeCun, Eds., 2013.
S. Nguyen, T.-T. Nguyen, T. T. Vu, T.-D. Do, K.-T. Ngo, and H. D. Vo, “Code-centric learningbased
just-in-time vulnerability detection,” arXiv preprint arXiv:2304.08396, 2023.
C. Ni, W. Wang, K. Yang, X. Xia, K. Liu, and D. Lo, “The best of both worlds: integrating semantic
features with expert features for defect prediction and localization,” in Proceedings of the
th ACM Joint European Software Engineering Conference and Symposium on the Foundations
of Software Engineering, 2022, pp. 672–683.
H. Perl, S. Dechand, M. Smith, D. Arp, F. Yamaguchi, K. Rieck, S. Fahl, and Y. Acar, “Vccfinder:
Finding potential vulnerabilities in open-source projects to assist code audits,” in Proceedings
of the 22nd ACM SIGSAC Conference on Computer and Communications Security,
, pp. 426–437.
C. Pornprasit and C. K. Tantithamthavorn, “Jitline: A simpler, better, faster, finer-grained
just-in-time defect prediction,” in 2021 IEEE/ACM 18th International Conference on Mining
Software Repositories (MSR). IEEE, 2021, pp. 369–379.
F. Qiu, Z. Gao, X. Xia, D. Lo, J. Grundy, and X. Wang, “Deep just-in-time defect localization,”
IEEE Transactions on Software Engineering, vol. 48, no. 12, pp. 5068–5086, 2021.
M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you? explaining the predictions of
any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge
discovery and data mining, 2016, pp. 1135–1144.
R. L. Russell, L. Y. Kim, L. H. Hamilton, T. Lazovich, J. A. Harer, O. Ozdemir, P. M. Ellingwood,
and M. W. McConley, “Automated vulnerability detection in source code using deep
representation learning,” 2018 17th IEEE International Conference on Machine Learning and
Applications (ICMLA), pp. 757–762, 2018.
M. Schlichtkrull, T. N. Kipf, P. Bloem, R. v. d. Berg, I. Titov, and M. Welling, “Modeling relational
data with graph convolutional networks,” in European semantic web conference. Springer,
, pp. 593–607.
M. Wang, Z. Lin, Y. Zou, and B. Xie, “Cora: Decomposing and describing tangled code changes
for reviewer,” in 2019 34th IEEE/ACM International Conference on Automated Software Engineering
(ASE). IEEE, 2019, pp. 1050–1061.
F. Yamaguchi, N. Golde, D. Arp, and K. Rieck, “Modeling and discovering vulnerabilities with
code property graphs,” in 2014 IEEE Symposium on Security and Privacy. IEEE, 2014, pp.
–604.
M. Yan, X. Xia, Y. Fan, A. E. Hassan, D. Lo, and S. Li, “Just-in-time defect identification and
localization: A two-phase framework,” IEEE Transactions on Software Engineering, vol. 48,
no. 1, pp. 82–101, 2020.
L. Yang, X. Li, and Y. Yu, “Vuldigger: A just-in-time and cost-aware tool for digging
vulnerability-contributing changes,” in GLOBECOM 2017 - 2017 IEEE Global Communications
Conference, 2017, pp. 1–7.
X. Yang, S. Wang, Y. Li, and S. Wang, “Does data sampling improve deep learning-based
vulnerability detection? yeas! and nays!” in 2023 IEEE/ACM 45th International Conference
on Software Engineering (ICSE). IEEE, 2023, pp. 2287–2298.
Z. Zeng, Y. Zhang, H. Zhang, and L. Zhang, “Deep just-in-time defect prediction: how far are
we?” in Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing
and Analysis, 2021, pp. 427–438.
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.