The work of the Thesis concerns textual analysis of Chilean tweets. Initially the text cleaning process is performed in which some Natural Language Processing techniques are applied. The tweets relating to the most frequent hashtags in the Dataset have been extrapolated. Several Topic Extraction techniques, as clustering algorithm and topic modeling methods, were have been applied to this subset. The methods were compared and evaluated with different measures proposed in literature, such as the similarity between clusters and the coherence of the topics. Then, we show the features of topics extrapolated and some main results. Finally, we present some anticipations of possible future work on sentiment analysis.

Dataset di Twitter delle agenzie di informazione cilene : Estrapolazione di Argomenti e Analisi testuale

SIMONETTI, ANDREA
2017/2018

Abstract

The work of the Thesis concerns textual analysis of Chilean tweets. Initially the text cleaning process is performed in which some Natural Language Processing techniques are applied. The tweets relating to the most frequent hashtags in the Dataset have been extrapolated. Several Topic Extraction techniques, as clustering algorithm and topic modeling methods, were have been applied to this subset. The methods were compared and evaluated with different measures proposed in literature, such as the similarity between clusters and the coherence of the topics. Then, we show the features of topics extrapolated and some main results. Finally, we present some anticipations of possible future work on sentiment analysis.
ENG
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
762671_thesis.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 5.67 MB
Formato Adobe PDF
5.67 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/51840