Nowadays we are surrounded by devices endowed with artificial intelligence whose goal is no more just helping us in performing tasks, but also making the experience enjoyable. The trend is looking for a human-computer interaction as natural as possible, personalized for each of us. Humans interpose their personality in every exchange with the external world and so the computational study of personality has became a common need for every tool in which there is a human-computer interaction. The thesis focuses on the study of human personality by using computational linguistic techniques. The main contribution is the development of Personal-ITY, a novel corpus for the Italian language collected from YouTube annotated with personality labels, containing a larger number of authors and a different textual genre compared to previously available resources. The corpus is built exploiting Distant Supervision, assigning Myers-Briggs Type Indicator(MBTI) labels to YouTube comments, and can lend itself to a variety of experiments. I report preliminary automatic Personality Detection experiments on the Personali-ITY corpus. A set of prediction models based on machine learning approaches has been developed, providing a baseline for future works, showing that some personality types are easier to predict than others. I also discuss the perks of cross-dataset prediction, presenting a set of cross-domain experiments with TwiSty, another dataset annotated with MBTI labels, collected from Twitter. Finally, an in-depth analysis is proposed, aimed at investigating how the presence of linguistic cues can relate with psychological evidences of the different personality types analyzed.

Personal-ITY: un Corpus YouTube per la Personality Profiling in Social Media Italiani

BASSIGNANA, ELISA
2019/2020

Abstract

Nowadays we are surrounded by devices endowed with artificial intelligence whose goal is no more just helping us in performing tasks, but also making the experience enjoyable. The trend is looking for a human-computer interaction as natural as possible, personalized for each of us. Humans interpose their personality in every exchange with the external world and so the computational study of personality has became a common need for every tool in which there is a human-computer interaction. The thesis focuses on the study of human personality by using computational linguistic techniques. The main contribution is the development of Personal-ITY, a novel corpus for the Italian language collected from YouTube annotated with personality labels, containing a larger number of authors and a different textual genre compared to previously available resources. The corpus is built exploiting Distant Supervision, assigning Myers-Briggs Type Indicator(MBTI) labels to YouTube comments, and can lend itself to a variety of experiments. I report preliminary automatic Personality Detection experiments on the Personali-ITY corpus. A set of prediction models based on machine learning approaches has been developed, providing a baseline for future works, showing that some personality types are easier to predict than others. I also discuss the perks of cross-dataset prediction, presenting a set of cross-domain experiments with TwiSty, another dataset annotated with MBTI labels, collected from Twitter. Finally, an in-depth analysis is proposed, aimed at investigating how the presence of linguistic cues can relate with psychological evidences of the different personality types analyzed.
ENG
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
813205_master_thesis.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 2.08 MB
Formato Adobe PDF
2.08 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/153285