Graph4Ever: Debugging tramite un Code Knowledge Graph cosciente della storia del codice.

Debugging is a central issue in programmers' life. Debugging is the process of detecting errors or defects in a piece of software, and solving them so that the program works correctly. Dealing with big projects composed by many source files/scripts and high-capacity input/output data can typically bring to results that were not foreseen such as unexpected run-time events, or to errors raised by the compiler that need to be solved. Debugging is typically a time-consuming and hard task. A common approach both for novice and expert programmers is to use online resources and forums, like StackOverflow or StackExchance to search for possible solutions to the problem at hand. Nevertheless, dealing with such online information is difficult, especially for non-experts users, and for those who are not familiar with the programming language currently being employed. Various support systems, including tools that automatically mine for helpful Stack Overflow posts, have been proposed to augment IDEs and ameliorate finding online debugging information. However, these tools are typically not intended for novices, and often have limited empirical basis in studies of the human behavior. Here comes to play the contribution of knowledge graphs, that have been proven extremely powerful in diverse application scenarios, both in semantic search and natural language understanding. In particular, we will show how the use of a particular framework like GraphGen4Code, a toolkit for building code knowledge graphs, can deeply help to deal with, among its other use-cases, the user debugging experience. The present analysis will focus on how it is possible to make this debugging process even better, starting from Graph4Code, and expanding it through the inclusion of the version control system and built-in debugging algorithms of GitHub API. In the end, the main goal is to enrich the information about the code to be analysed, incorporating into the knowledge graph data from a version control system through source code repositories mining techniques. The extended knowledge graph contains, as a result of this process, the full history (i.e. commits) about the inspected code and, consequently, more accurate links to documentation/forums to provide an even easier debugging experience. Therefore, through the exploitation of this extended framework, Graph4Ever, the user can easily understand where the error originated in the code, and obtain suggestions on how to solve it through the help of specific and time-conscious suggestions, thanks to the knowledge graph's computed links and the results from a semi-automated bisection debugging technique like Git Bisect.

Il problema del debugging è una questione centrale nella vita dei programmatori e del campo dello sviluppo software più in generale. Il debugging è il processo di rilevazione di errori o difetti in una parte di software e della loro risoluzione, al fine di ripristinare il corretto funzionamento dei programmi. Gestire progetti di grandi dimensioni composti da molti file sorgente/script e dati di input/output ad alta capacità può in genere portare a risultati non previsti come eventi di runtime imprevisti o errori generati dal compilatore che devono essere risolti. Il debugging è in genere un'attività difficile e dispendiosa in termini di tempo. Un approccio comune sia per i programmatori principianti che per quelli più esperti consiste nell'utilizzare risorse e forum online, come StackOverflow o StackExchance per cercare possibili soluzioni al problema in questione. Tuttavia, gestire tali informazioni online è difficile, soprattutto per utenti non esperti e per coloro che non hanno familiarità con lo specifico linguaggio di programmazione utilizzato. Sono stati proposti vari sistemi di supporto, inclusi strumenti che estraggono automaticamente post utili da Stack Overflow, per estendere il supporto degli IDE e migliorare la ricerca di informazioni di debug online. Tuttavia, questi strumenti in genere non sono destinati ai principianti e spesso hanno basi empiriche limitate negli studi sul comportamento umano. Qui entra in gioco il contributo dei knowledge graphs, che si sono dimostrati estremamente potenti in diversi scenari applicativi, sia nella ricerca semantica che nella comprensione del linguaggio naturale. In particolare, mostreremo come l'uso di un particolare framework come GraphGen4Code, un toolkit per la creazione di grafi della conoscenza del codice, può aiutare profondamente a gestire, tra gli altri casi d'uso, l'esperienza di debugging degli utenti. La presente analisi si concentrerà su come è possibile migliorare ulteriormente questo processo di debugging, partendo da Graph4Code, ed espandendolo attraverso l'inclusione di un sistema di versioning control e di algoritmi di debugging integrati all’interno della GitHub API. In definitiva, l'obiettivo principale è quello di arricchire le informazioni sul codice da analizzare, incorporando nel knowledge graph i dati provenienti da un sistema di controllo della versione attraverso tecniche di mining della repository del codice sorgente. Il knowledge graph esteso contiene, come risultato di questo processo, la cronologia completa (cioè i commit) sul codice ispezionato e, di conseguenza, collegamenti più accurati alla documentazione e ai forum per fornire un'esperienza di debugging ancora più agevole e veloce. Pertanto, attraverso lo sfruttamento di questo framework esteso, Graph4Ever, l'utente può facilmente capire da dove ha avuto origine l'errore nel codice, e ottenere suggerimenti su come risolverlo attraverso l'ausilio di suggerimenti specifici e tempestivi, grazie ai collegamenti computati dal knowledge graph e ai risultati di una tecnica di debugging di bisezione come Git Bisect, eseguita in maniera semi-automatica.