Dataset di Twitter delle agenzie di informazione cilene : Estrapolazione di Argomenti e Analisi testuale

The work of the Thesis concerns textual analysis of Chilean tweets. Initially the text cleaning process is performed in which some Natural Language Processing techniques are applied. The tweets relating to the most frequent hashtags in the Dataset have been extrapolated. Several Topic Extraction techniques, as clustering algorithm and topic modeling methods, were have been applied to this subset. The methods were compared and evaluated with different measures proposed in literature, such as the similarity between clusters and the coherence of the topics. Then, we show the features of topics extrapolated and some main results. Finally, we present some anticipations of possible future work on sentiment analysis.

Dataset di Twitter delle agenzie di informazione cilene : Estrapolazione di Argomenti e Analisi testuale

SIMONETTI, ANDREA

2017/2018

Abstract

The work of the Thesis concerns textual analysis of Chilean tweets. Initially the text cleaning process is performed in which some Natural Language Processing techniques are applied. The tweets relating to the most frequent hashtags in the Dataset have been extrapolated. Several Topic Extraction techniques, as clustering algorithm and topic modeling methods, were have been applied to this subset. The methods were compared and evaluated with different measures proposed in literature, such as the similarity between clusters and the coherence of the topics. Then, we show the features of topics extrapolated and some main results. Finally, we present some anticipations of possible future work on sentiment analysis.

Scheda breve

	Facoltà/Dipartimento
	
				MATEMATICA "GIUSEPPE PEANO"
			
	Corso di studio
	
				STOCHASTICS AND DATA SCIENCE
			
	Lingua
	
				ENG
			
	Relatrice / Relatore
	
				RUFFO, Giancarlo Francesco
			
	Modalità consultazione tesi
	
				IMPORT DA TESIONLINE
			
	Appare nelle tipologie:
	
				Corso di Laurea Magistrale

File in questo prodotto:

File	Dimensione	Formato
762671_thesis.pdf non disponibili Tipologia: Altro materiale allegato Dimensione 5.67 MB Formato Adobe PDF	5.67 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/51840