JUST-IN-TIME VULNERABILITY DETECTION AND LOCALIZATION

Hieu Dinh Vo
Author affiliations

Authors

  • Hieu Dinh Vo Faculty of Information Technology, VNU University of Engineering and Technology, Ha Noi, Viet Nam

DOI:

https://doi.org/10.15625/1813-9663/19102

Keywords:

Just-in-time vulnerability detection, Just-in-time vulnerability localization, Vulnerable commit, Vulnerable statement.

Abstract

Software vulnerabilities have increased dramatically, and multiple severe attacks have occurred in recent years. This poses a critical challenge for early detection and prevention of vulnerabilities in Software Quality Assurance. This paper introduces a novel framework, JULY, which serves the dual purpose of detecting vulnerable commits and localizing the root causes of the vulnerabilities. The fundamental concept of JULY is that the determinant of the vulnerability of a commit is the inherent meaning embedded in its changed code. For just-in-time vulnerability detection (JIT-VD ), JULY represents each commit by a Code Transformation Graph and employs a Graph Neural Network model to capture their meanings and distinguish between vulnerable and non-vulnerable commits. Once a commit is detected as vulnerable, it is passed to the just-in-time vulnerability localization (JIT-VL ) model to localize the root causes, which are vulnerable changed statements. In JIT-VL , JULY encodes each statement by the following features: operation, context, and topic. Then, JULY measures the suspiciousness score of each changed statement and ranks them based on their scores. To evaluate the effectiveness of JULY, we conducted several experiments using a dataset consisting of 20,274 commits in 506 C/C++ projects. JULY achieves a remarkable improvement of 95% in Top-1 ACC and 63% in MRR compared to the state-of-the-art approaches. Furthermore, when examining the same portion (i.e., 20%) of modified statements in each commit, JULY can find twice as many vulnerable statements within a given commit as the state-of-the-art approaches.

 

 

 

Metrics

Metrics Loading ...

References

“CVE details.” [Online]. Available: https://www.cvedetails.com https://www.cvedetails.com">

S. Cao, X. Sun, L. Bo, Y. Wei, and B. Li, “Bgnn4vd: constructing bidirectional graph neuralnetwork

for vulnerability detection,” Information and Software Technology, vol. 136, p. 106576,

S. Chakraborty, R. Krishna, Y. Ding, and B. Ray, “Deep learning based vulnerability detection:

Are we there yet,” IEEE Transactions on Software Engineering, 2021.

Y. Ding, S. Suneja, Y. Zheng, J. Laredo, A. Morari, G. Kaiser, and B. Ray, “Velvet: a novel

ensemble learning approach to automatically locate vulnerable statements,” in 2022 IEEE International

Conference on Software Analysis, Evolution and Reengineering. IEEE, 2022, pp.

–970.

S. Elder, N. Zahan, R. Shu, M. Metro, V. Kozarev, T. Menzies, and L. Williams, “Do i really

need all this work to find vulnerabilities?” Empirical Software Engineering, vol. 27, no. 6, pp.

–78, 2022.

Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang,

and M. Zhou, “CodeBERT: A pre-trained model for programming and natural languages,” in

Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association

for Computational Linguistics, Nov. 2020, pp. 1536–1547.

M. Fu and C. Tantithamthavorn, “Linevul: A transformer-based line-level vulnerability prediction,”

in 2022 IEEE/ACM 19th International Conference on Mining Software Repositories

(MSR). Los Alamitos, CA, USA: IEEE Computer Society, may 2022, pp. 608–620.

H. Hanif, M. H. N. M. Nasir, M. F. Ab Razak, A. Firdaus, and N. B. Anuar, “The rise of software

vulnerability: Taxonomy of software vulnerabilities detection and machine learning approaches,”

Journal of Network and Computer Applications, vol. 179, p. 103009, 2021.

D. Hin, A. Kan, H. Chen, and M. A. Babar, “Linevd: Statement-level vulnerability detection

using graph neural networks,” in IEEE/ACM 19th International Conference on Mining Software

Repositories, MSR 2022, Pittsburgh, PA, USA, May 23-24, 2022. IEEE, 2022, pp. 596–607.

A. Hindle, E. T. Barr, M. Gabel, Z. Su, and P. Devanbu, “On the naturalness of software,”

Communications of the ACM, vol. 59, no. 5, pp. 122–131, 2016.

T. Hoang, H. K. Dam, Y. Kamei, D. Lo, and N. Ubayashi, “Deepjit: an end-to-end deep learning

framework for just-in-time defect prediction,” in 2019 IEEE/ACM 16th International Conference

on Mining Software Repositories (MSR). IEEE, 2019, pp. 34–45.

Y. Kamei, T. Fukushima, S. McIntosh, K. Yamashita, N. Ubayashi, and A. E. Hassan, “Studying

just-in-time defect prediction using cross-project models,” Empirical Software Engineering,

vol. 21, pp. 2072–2106, 2016.

Y. Li, S. Wang, and T. N. Nguyen, “Vulnerability detection with fine-grained interpretations,”

in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference

and Symposium on the Foundations of Software Engineering, 2021, pp. 292–303.

Z. Li, D. Zou, S. Xu, H. Jin, Y. Zhu, and Z. Chen, “Sysevr: A framework for using deep learning

to detect software vulnerabilities,” IEEE Transactions on Dependable and Secure Computing,

Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y. Zhong, “Vuldeepecker: A deep

learning-based system for vulnerability detection,” arXiv preprint arXiv:1801.01681, 2018.

G. Lin, J. Zhang, W. Luo, L. Pan, O. De Vel, P. Montague, and Y. Xiang, “Software vulnerability

discovery via learning multi-domain knowledge bases,” IEEE Transactions on Dependable and

Secure Computing, 2019.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations

in vector space,” in 1st International Conference on Learning Representations, ICLR 2013,

Scottsdale, Arizona, USA, May 2-4, 2013, Y. Bengio and Y. LeCun, Eds., 2013.

S. Nguyen, T.-T. Nguyen, T. T. Vu, T.-D. Do, K.-T. Ngo, and H. D. Vo, “Code-centric learningbased

just-in-time vulnerability detection,” arXiv preprint arXiv:2304.08396, 2023.

C. Ni, W. Wang, K. Yang, X. Xia, K. Liu, and D. Lo, “The best of both worlds: integrating semantic

features with expert features for defect prediction and localization,” in Proceedings of the

th ACM Joint European Software Engineering Conference and Symposium on the Foundations

of Software Engineering, 2022, pp. 672–683.

H. Perl, S. Dechand, M. Smith, D. Arp, F. Yamaguchi, K. Rieck, S. Fahl, and Y. Acar, “Vccfinder:

Finding potential vulnerabilities in open-source projects to assist code audits,” in Proceedings

of the 22nd ACM SIGSAC Conference on Computer and Communications Security,

, pp. 426–437.

C. Pornprasit and C. K. Tantithamthavorn, “Jitline: A simpler, better, faster, finer-grained

just-in-time defect prediction,” in 2021 IEEE/ACM 18th International Conference on Mining

Software Repositories (MSR). IEEE, 2021, pp. 369–379.

F. Qiu, Z. Gao, X. Xia, D. Lo, J. Grundy, and X. Wang, “Deep just-in-time defect localization,”

IEEE Transactions on Software Engineering, vol. 48, no. 12, pp. 5068–5086, 2021.

M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you? explaining the predictions of

any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge

discovery and data mining, 2016, pp. 1135–1144.

R. L. Russell, L. Y. Kim, L. H. Hamilton, T. Lazovich, J. A. Harer, O. Ozdemir, P. M. Ellingwood,

and M. W. McConley, “Automated vulnerability detection in source code using deep

representation learning,” 2018 17th IEEE International Conference on Machine Learning and

Applications (ICMLA), pp. 757–762, 2018.

M. Schlichtkrull, T. N. Kipf, P. Bloem, R. v. d. Berg, I. Titov, and M. Welling, “Modeling relational

data with graph convolutional networks,” in European semantic web conference. Springer,

, pp. 593–607.

M. Wang, Z. Lin, Y. Zou, and B. Xie, “Cora: Decomposing and describing tangled code changes

for reviewer,” in 2019 34th IEEE/ACM International Conference on Automated Software Engineering

(ASE). IEEE, 2019, pp. 1050–1061.

F. Yamaguchi, N. Golde, D. Arp, and K. Rieck, “Modeling and discovering vulnerabilities with

code property graphs,” in 2014 IEEE Symposium on Security and Privacy. IEEE, 2014, pp.

–604.

M. Yan, X. Xia, Y. Fan, A. E. Hassan, D. Lo, and S. Li, “Just-in-time defect identification and

localization: A two-phase framework,” IEEE Transactions on Software Engineering, vol. 48,

no. 1, pp. 82–101, 2020.

L. Yang, X. Li, and Y. Yu, “Vuldigger: A just-in-time and cost-aware tool for digging

vulnerability-contributing changes,” in GLOBECOM 2017 - 2017 IEEE Global Communications

Conference, 2017, pp. 1–7.

X. Yang, S. Wang, Y. Li, and S. Wang, “Does data sampling improve deep learning-based

vulnerability detection? yeas! and nays!” in 2023 IEEE/ACM 45th International Conference

on Software Engineering (ICSE). IEEE, 2023, pp. 2287–2298.

Z. Zeng, Y. Zhang, H. Zhang, and L. Zhang, “Deep just-in-time defect prediction: how far are

we?” in Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing

and Analysis, 2021, pp. 427–438.

Downloads

Published

29-03-2024

How to Cite

[1]
H. D. Vo, “JUST-IN-TIME VULNERABILITY DETECTION AND LOCALIZATION”, JCC, vol. 40, no. 1, Mar. 2024.

Issue

Section

Articles