The present work focuses on automatic geolocation of documents and, in particular, of Wikipedia articles. Document geolocation falls into the category of georeferencing tasks and involves locating documents in geographical space. In particular, our task is formalized as a classification problem over a discrete set of geographical areas. Motivated by the advancements of deep learning in the field of Natural Language Processing and Information Retrieval, we developed and compared multiple location prediction models highly based on neural networks architectures, each model leveraging different types of information: the textual content of each article, the link structure between articles, or both. Finally, considering the sparsity of geotagged articles and the constant growth of the Wikipedia ecosystem, we built models that are both capable of leveraging unlabelled data and, when possible, generalizing to articles not seen at training time.
Geolocazione Automatica per gli articoli di Wikipedia: un approccio basato su Reti Neurali Convoluzionali e strutture a grafo.
CUTTICA, AMEDEO
2019/2020
Abstract
The present work focuses on automatic geolocation of documents and, in particular, of Wikipedia articles. Document geolocation falls into the category of georeferencing tasks and involves locating documents in geographical space. In particular, our task is formalized as a classification problem over a discrete set of geographical areas. Motivated by the advancements of deep learning in the field of Natural Language Processing and Information Retrieval, we developed and compared multiple location prediction models highly based on neural networks architectures, each model leveraging different types of information: the textual content of each article, the link structure between articles, or both. Finally, considering the sparsity of geotagged articles and the constant growth of the Wikipedia ecosystem, we built models that are both capable of leveraging unlabelled data and, when possible, generalizing to articles not seen at training time.File | Dimensione | Formato | |
---|---|---|---|
868197_thesisamedeocutticam.sc..pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
3.75 MB
Formato
Adobe PDF
|
3.75 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/154863