One of the steps in the arduous task of automatically extracting information from a given document consists of understanding its layout, or the sequence of textual and non-textual elements that constitute it. The somewhat repetitive and predictable properties of certain classes of documents such as scientific papers and articles has sometimes made rule-based and heuristic approaches suitable to address this task, but the increasing complexity of other kinds of layouts as well as the scientific progress in AI-related fields has recently justified researching deep learning-based techniques to perform document layout analysis. \newline This thesis outlines two fundamentally different deep learning-driven approaches to automatically parse document layouts: object detection based on convolutional neural networks and sequence modeling via (variational) autoencoders motivated by self-attention layers. In particular, we stress that while state of the art object detection models were reportedly proven able to detect layout elements with remarkable accuracy, they lack in leveraging or learning any relationship whatsoever between such elements; on the other hand, self-attention mechanisms are inherently built to model sequences and short as well as long-distance relationships within. As a result, the latter approach might prove useful in downstream tasks that rely on inferring the real, non-trivial ordering of layout blocks.
Analisi di layout basata su metodi di Deep Learning
ARGIOLAS, EDOARDO
2020/2021
Abstract
One of the steps in the arduous task of automatically extracting information from a given document consists of understanding its layout, or the sequence of textual and non-textual elements that constitute it. The somewhat repetitive and predictable properties of certain classes of documents such as scientific papers and articles has sometimes made rule-based and heuristic approaches suitable to address this task, but the increasing complexity of other kinds of layouts as well as the scientific progress in AI-related fields has recently justified researching deep learning-based techniques to perform document layout analysis. \newline This thesis outlines two fundamentally different deep learning-driven approaches to automatically parse document layouts: object detection based on convolutional neural networks and sequence modeling via (variational) autoencoders motivated by self-attention layers. In particular, we stress that while state of the art object detection models were reportedly proven able to detect layout elements with remarkable accuracy, they lack in leveraging or learning any relationship whatsoever between such elements; on the other hand, self-attention mechanisms are inherently built to model sequences and short as well as long-distance relationships within. As a result, the latter approach might prove useful in downstream tasks that rely on inferring the real, non-trivial ordering of layout blocks.File | Dimensione | Formato | |
---|---|---|---|
904629_adeeplearning-driveninvestigationintodocumentlayoutanalysis.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
6.62 MB
Formato
Adobe PDF
|
6.62 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/78863