Tecniche di Recommendation: analisi e implementazione di un Book Recommender.

The increasing availability of content, in every category of the entertainment industry, requires timely tools to help users navigate and discover works of their interest. The so-called “recommendation systems” play a key role, offering personalized experiences based on users' preferences. This thesis presents the development of a book recommendation system that integrates two complementary approaches: content-based filtering using natural language processing models such as Word2Vec, and collaborative filtering implemented in a C# environment. The collaborative filtering system identifies patterns of behavior among similar users. It is based on the idea that users with similar preferences in the past are likely to value similar items in the future. This methodology does not require explicit knowledge of the content of items, but exploits only data on interactions, such as ratings, purchases, or clicks-in my case, the reviews that users provide relative to the books they read. Content-based recommendation, on the other hand, through the use of Word2Vec analyzes the textual content of books based on the plots of each. This approach makes it possible to generate recommendations based on similarity between content, thus providing recommendations even for those users about whom we have little information and little historical data. The end result is a backend side implementation of a system that combines the strengths of the two methods, using the two recommendation techniques in a hybrid way. This thesis describes the entire design and implementation cycle of this system. The first step was data search and cleansing. Next, I turned my attention to the study of some natural language processing models: after analyzing and testing two models, Sentence Transformers (“SBERT”) and Word2Vec, I opted for the second one, which in my opinion proved to be more useful for the purpose of this thesis. Finally, I explored the actual integration of the two approaches and that of the system with the dataset at my disposal. The result is a system that overcomes the limitations of each of the two approaches taken individually, providing the user with more accurate and personalized recommendations.

La crescente disponibilità di contenuti, in ogni categoria del settore intrattenimento, richiede strumenti puntuali per aiutare gli utenti a orientarsi e a scoprire opere di proprio interesse. I cosiddetti “recommender” svolgono un ruolo fondamentale, offrendo esperienze personalizzate basate sulle preferenze degli utenti. Questa tesi presenta lo sviluppo di un sistema di recommendation per libri che integra due approcci complementari: il content-based filtering con l'uso di modelli di elaborazione del linguaggio naturale come Word2Vec, e il collaborative filtering implementato in ambiente C#. Il sistema di collaborative filtering identifica pattern di comportamento tra utenti simili. Si fonda sull'idea che gli utenti con preferenze simili in passato probabilmente apprezzeranno elementi simili in futuro. Questa metodologia non richiede una conoscenza esplicita del contenuto degli elementi, ma sfrutta esclusivamente i dati sulle interazioni, come valutazioni, acquisti o click - nel mio caso, le recensioni che gli utenti forniscono relativamente ai libri che leggono. La content-based recommendation invece, tramite l’utilizzo di Word2Vec analizza il contenuto testuale dei libri basandosi sulle trame di ognuno. Questo approccio consente di generare suggerimenti basati sulla similarità tra contenuti, fornendo così suggerimenti anche per quegli utenti di cui si abbiano poche informazioni e pochi dati storici. Il risultato finale è un’implementazione lato backend di un sistema che combina i punti di forza dei due metodi, utilizzando le due tecniche di recommendation in modo ibrido. La tesi descrive l'intero ciclo di progettazione e implementazione di questo sistema. II primo step è stato quello della ricerca e pulizia dei dati. Successivamente ho rivolto la mia attenzione allo studio di alcuni modelli di elaborazione del linguaggio naturale: dopo aver analizzato e testato due modelli, Sentence Transformers (“SBERT”) e Word2Vec, ho optato per il secondo che a mio avviso si è rivelato più utile allo scopo di questa tesi. Infine, ho esplorato l'integrazione vera e propria dei due approcci e quella del sistema con il set di dati a mia disposizione. Il risultato è un sistema che permette di superare i limiti di ciascuno dei due approcci presi singolarmente, fornendo all’utente raccomandazioni più accurate e personalizzate.