Analisi di layout basata su metodi di Deep Learning

One of the steps in the arduous task of automatically extracting information from a given document consists of understanding its layout, or the sequence of textual and non-textual elements that constitute it. The somewhat repetitive and predictable properties of certain classes of documents such as scientific papers and articles has sometimes made rule-based and heuristic approaches suitable to address this task, but the increasing complexity of other kinds of layouts as well as the scientific progress in AI-related fields has recently justified researching deep learning-based techniques to perform document layout analysis. \newline This thesis outlines two fundamentally different deep learning-driven approaches to automatically parse document layouts: object detection based on convolutional neural networks and sequence modeling via (variational) autoencoders motivated by self-attention layers. In particular, we stress that while state of the art object detection models were reportedly proven able to detect layout elements with remarkable accuracy, they lack in leveraging or learning any relationship whatsoever between such elements; on the other hand, self-attention mechanisms are inherently built to model sequences and short as well as long-distance relationships within. As a result, the latter approach might prove useful in downstream tasks that rely on inferring the real, non-trivial ordering of layout blocks.

Analisi di layout basata su metodi di Deep Learning

ARGIOLAS, EDOARDO

2020/2021

Abstract

One of the steps in the arduous task of automatically extracting information from a given document consists of understanding its layout, or the sequence of textual and non-textual elements that constitute it. The somewhat repetitive and predictable properties of certain classes of documents such as scientific papers and articles has sometimes made rule-based and heuristic approaches suitable to address this task, but the increasing complexity of other kinds of layouts as well as the scientific progress in AI-related fields has recently justified researching deep learning-based techniques to perform document layout analysis. \newline This thesis outlines two fundamentally different deep learning-driven approaches to automatically parse document layouts: object detection based on convolutional neural networks and sequence modeling via (variational) autoencoders motivated by self-attention layers. In particular, we stress that while state of the art object detection models were reportedly proven able to detect layout elements with remarkable accuracy, they lack in leveraging or learning any relationship whatsoever between such elements; on the other hand, self-attention mechanisms are inherently built to model sequences and short as well as long-distance relationships within. As a result, the latter approach might prove useful in downstream tasks that rely on inferring the real, non-trivial ordering of layout blocks.

Scheda breve

	Facoltà/Dipartimento
	
				FISICA
			
	Corso di studio
	
				FISICA DEI SISTEMI COMPLESSI
			
	Lingua
	
				ENG
			
	Relatrice / Relatore
	
				OSELLA, Matteo
			
	Modalità consultazione tesi
	
				IMPORT DA TESIONLINE
			
	Appare nelle tipologie:
	
				Corso di Laurea Magistrale

File in questo prodotto:

File	Dimensione	Formato
904629_adeeplearning-driveninvestigationintodocumentlayoutanalysis.pdf non disponibili Tipologia: Altro materiale allegato Dimensione 6.62 MB Formato Adobe PDF	6.62 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/78863