The semantic complexity, along with intricate lexical richness, holds significant importance in Natural Language Processing (NLP) tasks, aiming to enhance comprehension and accessibility of linguistic information. However, research often focuses on grammar rather than vocabulary and the different levels of word complexity. Selecting a core set of basic words can facilitate effective communication and promote clarity and precision in language use. In recent years, there has been a growing interest in leveraging Machine Learning (ML) techniques and Large Language Models (LLMs) to analyze and understand natural language. The thesis focuses on identifying semantic complexity through these techniques, specifically by creating a mapping between WordNet synsets and a basicness score, enabling an automatic basic-level term categorisation. The proposed methodology combines feature extraction, Machine Learning models and a human-annotated gold standard to predict the complexity level of a synset. By integrating computational and human judgments, it aims to gain a deeper understanding of the factors influencing synset basicness and to develop robust metrics for automatic classification of synset basicness.
Is it complex? An Approach based on Feature Extraction and Machine Learning Models for Basicness Computation.
BIBIRE, RALUCA ANDREEA
2023/2024
Abstract
The semantic complexity, along with intricate lexical richness, holds significant importance in Natural Language Processing (NLP) tasks, aiming to enhance comprehension and accessibility of linguistic information. However, research often focuses on grammar rather than vocabulary and the different levels of word complexity. Selecting a core set of basic words can facilitate effective communication and promote clarity and precision in language use. In recent years, there has been a growing interest in leveraging Machine Learning (ML) techniques and Large Language Models (LLMs) to analyze and understand natural language. The thesis focuses on identifying semantic complexity through these techniques, specifically by creating a mapping between WordNet synsets and a basicness score, enabling an automatic basic-level term categorisation. The proposed methodology combines feature extraction, Machine Learning models and a human-annotated gold standard to predict the complexity level of a synset. By integrating computational and human judgments, it aims to gain a deeper understanding of the factors influencing synset basicness and to develop robust metrics for automatic classification of synset basicness.File | Dimensione | Formato | |
---|---|---|---|
839556_bibire_839556_thesis.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
2.69 MB
Formato
Adobe PDF
|
2.69 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/147613