The work of the Thesis concerns textual analysis of Chilean tweets. Initially the text cleaning process is performed in which some Natural Language Processing techniques are applied. The tweets relating to the most frequent hashtags in the Dataset have been extrapolated. Several Topic Extraction techniques, as clustering algorithm and topic modeling methods, were have been applied to this subset. The methods were compared and evaluated with different measures proposed in literature, such as the similarity between clusters and the coherence of the topics. Then, we show the features of topics extrapolated and some main results. Finally, we present some anticipations of possible future work on sentiment analysis.
Dataset di Twitter delle agenzie di informazione cilene : Estrapolazione di Argomenti e Analisi testuale
SIMONETTI, ANDREA
2017/2018
Abstract
The work of the Thesis concerns textual analysis of Chilean tweets. Initially the text cleaning process is performed in which some Natural Language Processing techniques are applied. The tweets relating to the most frequent hashtags in the Dataset have been extrapolated. Several Topic Extraction techniques, as clustering algorithm and topic modeling methods, were have been applied to this subset. The methods were compared and evaluated with different measures proposed in literature, such as the similarity between clusters and the coherence of the topics. Then, we show the features of topics extrapolated and some main results. Finally, we present some anticipations of possible future work on sentiment analysis.File | Dimensione | Formato | |
---|---|---|---|
762671_thesis.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
5.67 MB
Formato
Adobe PDF
|
5.67 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/51840