Exploring the Potential of LLM-GNN Hybrid Models for Automated Binary Diffing

Comparing binary code through binary function similarity is critical for a number of important tasks, including vulnerability detection, malware analysis, and reverse engineering. However, because source codes are compiled on different architectures, with different compiler optimizations, and may be subject to obfuscation techniques, comparing their associated binaries is a very complicated task in practice, and still relies heavily on expert-crafted engineering. This research investigates whether Large Language Models (LLMs) can replace manual feature engineering in binary function similarity tasks, answering to three main research questions: (1) if LLMs can be effectively leveraged to develop an automatic and expert-free generic binary diffing tool, capable of generalizing across multiple scenarios (2) what is the performance impact of using intermediate representations compared to raw assembly code in the context of automatic binary diffing, and (3) wether hybrid approaches combining LLMs and GNNs can outperform current solutions. To answer these questions, we designed an hybrid model that integrates LLMs for learning the semantics of basic blocks with GGNNs for capturing the global structure of Control Flow Graphs (CFGs). We systematically compared various LLMs configurations like input representations, pre-training strategies, and fine-tuning techniques. Our results show that while our hybrid approach did not surpass the current state-of-the-art model, it outperformed many existing solutions and demonstrated the potential of LLMs for low-level code analysis. We also found that using raw assembly code consistently outperformed intermediate representations like P-Code due to its more concise and direct structure, which better fits the LLM's ability to capture the semantic meaning of basic blocks. In addition to our findings, we contribute a modular framework that allows for experimentation with different combinations of LLMs, representations, and fine-tuning techniques. This framework is designed to encourage further research in applying LLMs to binary function similarity and other low-level code analysis tasks.

Exploring the Potential of LLM-GNN Hybrid Models for Automated Binary Diffing

DE ROSA, STEFANO

2023/2024

Abstract

Comparing binary code through binary function similarity is critical for a number of important tasks, including vulnerability detection, malware analysis, and reverse engineering. However, because source codes are compiled on different architectures, with different compiler optimizations, and may be subject to obfuscation techniques, comparing their associated binaries is a very complicated task in practice, and still relies heavily on expert-crafted engineering. This research investigates whether Large Language Models (LLMs) can replace manual feature engineering in binary function similarity tasks, answering to three main research questions: (1) if LLMs can be effectively leveraged to develop an automatic and expert-free generic binary diffing tool, capable of generalizing across multiple scenarios (2) what is the performance impact of using intermediate representations compared to raw assembly code in the context of automatic binary diffing, and (3) wether hybrid approaches combining LLMs and GNNs can outperform current solutions. To answer these questions, we designed an hybrid model that integrates LLMs for learning the semantics of basic blocks with GGNNs for capturing the global structure of Control Flow Graphs (CFGs). We systematically compared various LLMs configurations like input representations, pre-training strategies, and fine-tuning techniques. Our results show that while our hybrid approach did not surpass the current state-of-the-art model, it outperformed many existing solutions and demonstrated the potential of LLMs for low-level code analysis. We also found that using raw assembly code consistently outperformed intermediate representations like P-Code due to its more concise and direct structure, which better fits the LLM's ability to capture the semantic meaning of basic blocks. In addition to our findings, we contribute a modular framework that allows for experimentation with different combinations of LLMs, representations, and fine-tuning techniques. This framework is designed to encourage further research in applying LLMs to binary function similarity and other low-level code analysis tasks.

Scheda breve

	Facoltà/Dipartimento
	
				INFORMATICA
			
	Corso di studio
	
				INFORMATICA
			
	Titolo inglese
	
				Exploring the Potential of LLM-GNN Hybrid Models for Automated Binary Diffing
			
	Abstract in inglese
	
				Comparing binary code through binary function similarity is critical for a number of important tasks, including vulnerability detection, malware analysis, and reverse engineering. However, because source codes are compiled on different architectures, with different compiler optimizations, and may be subject to obfuscation techniques, comparing their associated binaries is a very complicated task in practice, and still relies heavily on expert-crafted engineering.

This research investigates whether Large Language Models (LLMs) can replace manual feature engineering in binary function similarity tasks, answering to three main research questions: 
(1) if LLMs can be effectively leveraged to develop an automatic and expert-free generic binary diffing tool, capable of generalizing across multiple scenarios (2) what is the performance impact of using intermediate representations compared to raw assembly code in the context of automatic binary diffing, and (3) wether hybrid approaches combining LLMs and GNNs can outperform current solutions.

To answer these questions, we designed an hybrid model that integrates LLMs for learning the semantics of basic blocks with GGNNs for capturing the global structure of Control Flow Graphs (CFGs). We systematically compared various LLMs configurations like input representations, pre-training strategies, and fine-tuning techniques. Our results show that while our hybrid approach did not surpass the current state-of-the-art model, it outperformed many existing solutions and demonstrated the potential of LLMs for low-level code analysis. We also found that using raw assembly code consistently outperformed intermediate representations like P-Code due to its more concise and direct structure, which better fits the LLM's ability to capture the semantic meaning of basic blocks.

In addition to our findings, we contribute a modular framework that allows for experimentation with different combinations of LLMs, representations, and fine-tuning techniques. This framework is designed to encourage further research in applying LLMs to binary function similarity and other low-level code analysis tasks.
			
	Relatrice / Relatore
	
				DRAGO, IDILIO
			
	Modalità consultazione tesi
	
				Non autorizzo consultazione esterna dell'elaborato
			
	Appare nelle tipologie:
	
				Corso di Laurea Magistrale

File in questo prodotto:

File	Dimensione	Formato
De_Rosa_Automate_Binary_Diffing_with_Deep_Learning.pdf non disponibili Dimensione 3.47 MB Formato Adobe PDF	3.47 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/7865