Comparing binary code through binary function similarity is critical for a number of important tasks, including vulnerability detection, malware analysis, and reverse engineering. However, because source codes are compiled on different architectures, with different compiler optimizations, and may be subject to obfuscation techniques, comparing their associated binaries is a very complicated task in practice, and still relies heavily on expert-crafted engineering. This research investigates whether Large Language Models (LLMs) can replace manual feature engineering in binary function similarity tasks, answering to three main research questions: (1) if LLMs can be effectively leveraged to develop an automatic and expert-free generic binary diffing tool, capable of generalizing across multiple scenarios (2) what is the performance impact of using intermediate representations compared to raw assembly code in the context of automatic binary diffing, and (3) wether hybrid approaches combining LLMs and GNNs can outperform current solutions. To answer these questions, we designed an hybrid model that integrates LLMs for learning the semantics of basic blocks with GGNNs for capturing the global structure of Control Flow Graphs (CFGs). We systematically compared various LLMs configurations like input representations, pre-training strategies, and fine-tuning techniques. Our results show that while our hybrid approach did not surpass the current state-of-the-art model, it outperformed many existing solutions and demonstrated the potential of LLMs for low-level code analysis. We also found that using raw assembly code consistently outperformed intermediate representations like P-Code due to its more concise and direct structure, which better fits the LLM's ability to capture the semantic meaning of basic blocks. In addition to our findings, we contribute a modular framework that allows for experimentation with different combinations of LLMs, representations, and fine-tuning techniques. This framework is designed to encourage further research in applying LLMs to binary function similarity and other low-level code analysis tasks.
Comparing binary code through binary function similarity is critical for a number of important tasks, including vulnerability detection, malware analysis, and reverse engineering. However, because source codes are compiled on different architectures, with different compiler optimizations, and may be subject to obfuscation techniques, comparing their associated binaries is a very complicated task in practice, and still relies heavily on expert-crafted engineering. This research investigates whether Large Language Models (LLMs) can replace manual feature engineering in binary function similarity tasks, answering to three main research questions: (1) if LLMs can be effectively leveraged to develop an automatic and expert-free generic binary diffing tool, capable of generalizing across multiple scenarios (2) what is the performance impact of using intermediate representations compared to raw assembly code in the context of automatic binary diffing, and (3) wether hybrid approaches combining LLMs and GNNs can outperform current solutions. To answer these questions, we designed an hybrid model that integrates LLMs for learning the semantics of basic blocks with GGNNs for capturing the global structure of Control Flow Graphs (CFGs). We systematically compared various LLMs configurations like input representations, pre-training strategies, and fine-tuning techniques. Our results show that while our hybrid approach did not surpass the current state-of-the-art model, it outperformed many existing solutions and demonstrated the potential of LLMs for low-level code analysis. We also found that using raw assembly code consistently outperformed intermediate representations like P-Code due to its more concise and direct structure, which better fits the LLM's ability to capture the semantic meaning of basic blocks. In addition to our findings, we contribute a modular framework that allows for experimentation with different combinations of LLMs, representations, and fine-tuning techniques. This framework is designed to encourage further research in applying LLMs to binary function similarity and other low-level code analysis tasks.
Exploring the Potential of LLM-GNN Hybrid Models for Automated Binary Diffing
DE ROSA, STEFANO
2023/2024
Abstract
Comparing binary code through binary function similarity is critical for a number of important tasks, including vulnerability detection, malware analysis, and reverse engineering. However, because source codes are compiled on different architectures, with different compiler optimizations, and may be subject to obfuscation techniques, comparing their associated binaries is a very complicated task in practice, and still relies heavily on expert-crafted engineering. This research investigates whether Large Language Models (LLMs) can replace manual feature engineering in binary function similarity tasks, answering to three main research questions: (1) if LLMs can be effectively leveraged to develop an automatic and expert-free generic binary diffing tool, capable of generalizing across multiple scenarios (2) what is the performance impact of using intermediate representations compared to raw assembly code in the context of automatic binary diffing, and (3) wether hybrid approaches combining LLMs and GNNs can outperform current solutions. To answer these questions, we designed an hybrid model that integrates LLMs for learning the semantics of basic blocks with GGNNs for capturing the global structure of Control Flow Graphs (CFGs). We systematically compared various LLMs configurations like input representations, pre-training strategies, and fine-tuning techniques. Our results show that while our hybrid approach did not surpass the current state-of-the-art model, it outperformed many existing solutions and demonstrated the potential of LLMs for low-level code analysis. We also found that using raw assembly code consistently outperformed intermediate representations like P-Code due to its more concise and direct structure, which better fits the LLM's ability to capture the semantic meaning of basic blocks. In addition to our findings, we contribute a modular framework that allows for experimentation with different combinations of LLMs, representations, and fine-tuning techniques. This framework is designed to encourage further research in applying LLMs to binary function similarity and other low-level code analysis tasks.File | Dimensione | Formato | |
---|---|---|---|
De_Rosa_Automate_Binary_Diffing_with_Deep_Learning.pdf
non disponibili
Dimensione
3.47 MB
Formato
Adobe PDF
|
3.47 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/7865