The present work focuses on automatic geolocation of documents and, in particular, of Wikipedia articles. Document geolocation falls into the category of georeferencing tasks and involves locating documents in geographical space. In particular, our task is formalized as a classification problem over a discrete set of geographical areas. Motivated by the advancements of deep learning in the field of Natural Language Processing and Information Retrieval, we developed and compared multiple location prediction models highly based on neural networks architectures, each model leveraging different types of information: the textual content of each article, the link structure between articles, or both. Finally, considering the sparsity of geotagged articles and the constant growth of the Wikipedia ecosystem, we built models that are both capable of leveraging unlabelled data and, when possible, generalizing to articles not seen at training time.

Geolocazione Automatica per gli articoli di Wikipedia: un approccio basato su Reti Neurali Convoluzionali e strutture a grafo.

CUTTICA, AMEDEO
2019/2020

Abstract

The present work focuses on automatic geolocation of documents and, in particular, of Wikipedia articles. Document geolocation falls into the category of georeferencing tasks and involves locating documents in geographical space. In particular, our task is formalized as a classification problem over a discrete set of geographical areas. Motivated by the advancements of deep learning in the field of Natural Language Processing and Information Retrieval, we developed and compared multiple location prediction models highly based on neural networks architectures, each model leveraging different types of information: the textual content of each article, the link structure between articles, or both. Finally, considering the sparsity of geotagged articles and the constant growth of the Wikipedia ecosystem, we built models that are both capable of leveraging unlabelled data and, when possible, generalizing to articles not seen at training time.
ENG
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
868197_thesisamedeocutticam.sc..pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 3.75 MB
Formato Adobe PDF
3.75 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/154863