Apprendimento parzialmente supervisionato per l'identificazione di malattie mentali

Detecting mental health disorders (e.g. depression, anxiety and bipolar disorder) through the analysis of written linguistic markers expressed in online social network systems, such as Twitter, Facebook and Reddit, is an important application of text classification. In traditional binary classification, classifiers are trained on the basis of pre-labeled positive and negative instances. The feasibility of extracting positive samples from social networking data has been demonstrated, e.g. through self-reported diagnosis, but randomly labeling non-positive users as negative instances might cause undesired sample bias effects. For example, depressed users might be more likely to write about specific topics than the general population. Randomly selecting users as negative instances may thus result in a model that learns to detect topical differences, but fails to identify the more relevant linguistic markers that truly characterize affected users. Here we outline a methodology that leverages the prevalent homophily of social networks to create representative samples of negative instances. We select users and comments that are related to positive instances in terms of their network connections and evaluate them as negative instances, thus equalizing irrelevanttopical difference. This allows our classifiers to detect differences relevant to the condition under consideration. We compare the results of different selection methods in a controlled scenario,where true negatives are known but hidden from the process. We show that using the underlyingnetwork structure to select negative samples results in classification models that approximate, interms of selected linguistic markers, to models that rely on true negative samples.

Apprendimento parzialmente supervisionato per l'identificazione di malattie mentali

CERIA, ALBERTO

2016/2017

Abstract

Detecting mental health disorders (e.g. depression, anxiety and bipolar disorder) through the analysis of written linguistic markers expressed in online social network systems, such as Twitter, Facebook and Reddit, is an important application of text classification. In traditional binary classification, classifiers are trained on the basis of pre-labeled positive and negative instances. The feasibility of extracting positive samples from social networking data has been demonstrated, e.g. through self-reported diagnosis, but randomly labeling non-positive users as negative instances might cause undesired sample bias effects. For example, depressed users might be more likely to write about specific topics than the general population. Randomly selecting users as negative instances may thus result in a model that learns to detect topical differences, but fails to identify the more relevant linguistic markers that truly characterize affected users. Here we outline a methodology that leverages the prevalent homophily of social networks to create representative samples of negative instances. We select users and comments that are related to positive instances in terms of their network connections and evaluate them as negative instances, thus equalizing irrelevanttopical difference. This allows our classifiers to detect differences relevant to the condition under consideration. We compare the results of different selection methods in a controlled scenario,where true negatives are known but hidden from the process. We show that using the underlyingnetwork structure to select negative samples results in classification models that approximate, interms of selected linguistic markers, to models that rely on true negative samples.

Scheda breve

	Facoltà/Dipartimento
	
				FISICA
			
	Corso di studio
	
				FISICA DEI SISTEMI COMPLESSI
			
	Lingua
	
				ENG
			
	Relatrice / Relatore
	
				PANISSON, Andre'
			
	Modalità consultazione tesi
	
				IMPORT DA TESIONLINE
			
	Appare nelle tipologie:
	
				Corso di Laurea Magistrale

File in questo prodotto:

File	Dimensione	Formato
809520_main2.pdf non disponibili Tipologia: Altro materiale allegato Dimensione 6.37 MB Formato Adobe PDF	6.37 MB	Adobe PDF

Se sei interessato/a a consultare l'elaborato, vai nella sezione Home in alto a destra, dove troverai le informazioni su come richiederlo. I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/91077