Words are light, like a feather, but sometimes they are heavy and sharp like the blade of a knife. Nowadays, after the social media boom, this concept has become more and more important. With huge amounts of available data coming from the Web, it is necessary to create new algorithms capable of autonomously processing these large amounts of data. Natural Language Processing (NLP) is a sub-field of computer science focusing on the development of systems, able to handle natural language and therefore analyse huge amounts of data and foster the interaction with human users. NLP tackles a variety of specific tasks; this thesis will focus on Sentiment Analysis. Sentiment analysis is the process of computationally identifying and categorising opinions expressed in text. Understanding the sentiment in a text enables a wide range of analyses on human-generated text data: political mood and response to political campaigns, online advertisement, and detection of potentially dangerous situations such as hate speech. As a data-driven task, NLP has been tackled with Machine Learning (ML) algorithms, which have recently achieved state-of-the-art performances for several tasks and are being deployed in an ever-growing range of real-life scenarios. The best models in this field are the Transformer models. They have outperformed other models, based on recurrent neural networks, relying entirely on an attention mechanism to draw global dependencies between inputs and outputs. However, most of these algorithms do not provide human-understandable explanations in support of their decisions, and this limitation hampers the fairness, accountability and transparency of these models. This scenario generates the following research question: How can an algorithmically predicted sentiment be trusted? In order to address this issue, the research area of eXplainable Artificial Intelligence (XAI) was introduced. This Master Thesis explores some XAI techniques applied to the field of NLP in order to better understand the sentiment classification process. XAI techniques can help analyse how a Transformer makes a prediction: in most cases, the algorithm produces a heat-map highlighting parts of the input that have contributed most to the output. In this thesis, two main algorithms were analysed: LRP (Layer-wise Relevance Propagation) and LIME (Local Interpretable Model-agnostic Explanations). The aim of the thesis was to consider the latest transformer model proposed by Google, called BERT, and apply LRP and LIME to it. LIME is capable of assigning accurate sentiment scores to every embedding word. However, obtaining these scores is computationally expensive. Therefore, instead of using the whole LIME algorithm, we used an attention layer on top of the language embedding created by BERT. These scores are very similar to the ones produced by LIME while also being computationally easier to obtain.

Words are light, like a feather, but sometimes they are heavy and sharp like the blade of a knife. Nowadays, after the social media boom, this concept has become more and more important. With huge amounts of available data coming from the Web, it is necessary to create new algorithms capable of autonomously processing these large amounts of data. Natural Language Processing (NLP) is a sub-field of computer science focusing on the development of systems, able to handle natural language and therefore analyse huge amounts of data and foster the interaction with human users. NLP tackles a variety of specific tasks; this thesis will focus on Sentiment Analysis. Sentiment analysis is the process of computationally identifying and categorising opinions expressed in text. Understanding the sentiment in a text enables a wide range of analyses on human-generated text data: political mood and response to political campaigns, online advertisement, and detection of potentially dangerous situations such as hate speech. As a data-driven task, NLP has been tackled with Machine Learning (ML) algorithms, which have recently achieved state-of-the-art performances for several tasks and are being deployed in an ever-growing range of real-life scenarios. The best models in this field are the Transformer models. They have outperformed other models, based on recurrent neural networks, relying entirely on an attention mechanism to draw global dependencies between inputs and outputs. However, most of these algorithms do not provide human-understandable explanations in support of their decisions, and this limitation hampers the fairness, accountability and transparency of these models. This scenario generates the following research question: How can an algorithmically predicted sentiment be trusted? In order to address this issue, the research area of eXplainable Artificial Intelligence (XAI) was introduced. This Master Thesis explores some XAI techniques applied to the field of NLP in order to better understand the sentiment classification process. XAI techniques can help analyse how a Transformer makes a prediction: in most cases, the algorithm produces a heat-map highlighting parts of the input that have contributed most to the output. In this thesis, two main algorithms were analysed: LRP (Layer-wise Relevance Propagation) and LIME (Local Interpretable Model-agnostic Explanations). The aim of the thesis was to consider the latest transformer model proposed by Google, called BERT, and apply LRP and LIME to it. LIME is capable of assigning accurate sentiment scores to every embedding word. However, obtaining these scores is computationally expensive. Therefore, instead of using the whole LIME algorithm, we used an attention layer on top of the language embedding created by BERT. These scores are very similar to the ones produced by LIME while also being computationally easier to obtain.

Explainability Methods for Natural Language Processing: Applications in Sentiment Analysis

BODRIA, FRANCESCO
2018/2019

Abstract

Words are light, like a feather, but sometimes they are heavy and sharp like the blade of a knife. Nowadays, after the social media boom, this concept has become more and more important. With huge amounts of available data coming from the Web, it is necessary to create new algorithms capable of autonomously processing these large amounts of data. Natural Language Processing (NLP) is a sub-field of computer science focusing on the development of systems, able to handle natural language and therefore analyse huge amounts of data and foster the interaction with human users. NLP tackles a variety of specific tasks; this thesis will focus on Sentiment Analysis. Sentiment analysis is the process of computationally identifying and categorising opinions expressed in text. Understanding the sentiment in a text enables a wide range of analyses on human-generated text data: political mood and response to political campaigns, online advertisement, and detection of potentially dangerous situations such as hate speech. As a data-driven task, NLP has been tackled with Machine Learning (ML) algorithms, which have recently achieved state-of-the-art performances for several tasks and are being deployed in an ever-growing range of real-life scenarios. The best models in this field are the Transformer models. They have outperformed other models, based on recurrent neural networks, relying entirely on an attention mechanism to draw global dependencies between inputs and outputs. However, most of these algorithms do not provide human-understandable explanations in support of their decisions, and this limitation hampers the fairness, accountability and transparency of these models. This scenario generates the following research question: How can an algorithmically predicted sentiment be trusted? In order to address this issue, the research area of eXplainable Artificial Intelligence (XAI) was introduced. This Master Thesis explores some XAI techniques applied to the field of NLP in order to better understand the sentiment classification process. XAI techniques can help analyse how a Transformer makes a prediction: in most cases, the algorithm produces a heat-map highlighting parts of the input that have contributed most to the output. In this thesis, two main algorithms were analysed: LRP (Layer-wise Relevance Propagation) and LIME (Local Interpretable Model-agnostic Explanations). The aim of the thesis was to consider the latest transformer model proposed by Google, called BERT, and apply LRP and LIME to it. LIME is capable of assigning accurate sentiment scores to every embedding word. However, obtaining these scores is computationally expensive. Therefore, instead of using the whole LIME algorithm, we used an attention layer on top of the language embedding created by BERT. These scores are very similar to the ones produced by LIME while also being computationally easier to obtain.
ENG
Words are light, like a feather, but sometimes they are heavy and sharp like the blade of a knife. Nowadays, after the social media boom, this concept has become more and more important. With huge amounts of available data coming from the Web, it is necessary to create new algorithms capable of autonomously processing these large amounts of data. Natural Language Processing (NLP) is a sub-field of computer science focusing on the development of systems, able to handle natural language and therefore analyse huge amounts of data and foster the interaction with human users. NLP tackles a variety of specific tasks; this thesis will focus on Sentiment Analysis. Sentiment analysis is the process of computationally identifying and categorising opinions expressed in text. Understanding the sentiment in a text enables a wide range of analyses on human-generated text data: political mood and response to political campaigns, online advertisement, and detection of potentially dangerous situations such as hate speech. As a data-driven task, NLP has been tackled with Machine Learning (ML) algorithms, which have recently achieved state-of-the-art performances for several tasks and are being deployed in an ever-growing range of real-life scenarios. The best models in this field are the Transformer models. They have outperformed other models, based on recurrent neural networks, relying entirely on an attention mechanism to draw global dependencies between inputs and outputs. However, most of these algorithms do not provide human-understandable explanations in support of their decisions, and this limitation hampers the fairness, accountability and transparency of these models. This scenario generates the following research question: How can an algorithmically predicted sentiment be trusted? In order to address this issue, the research area of eXplainable Artificial Intelligence (XAI) was introduced. This Master Thesis explores some XAI techniques applied to the field of NLP in order to better understand the sentiment classification process. XAI techniques can help analyse how a Transformer makes a prediction: in most cases, the algorithm produces a heat-map highlighting parts of the input that have contributed most to the output. In this thesis, two main algorithms were analysed: LRP (Layer-wise Relevance Propagation) and LIME (Local Interpretable Model-agnostic Explanations). The aim of the thesis was to consider the latest transformer model proposed by Google, called BERT, and apply LRP and LIME to it. LIME is capable of assigning accurate sentiment scores to every embedding word. However, obtaining these scores is computationally expensive. Therefore, instead of using the whole LIME algorithm, we used an attention layer on top of the language embedding created by BERT. These scores are very similar to the ones produced by LIME while also being computationally easier to obtain.
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
872703_bodria_thesis.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 4.37 MB
Formato Adobe PDF
4.37 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/96409