The semantic complexity, along with intricate lexical richness, holds significant importance in Natural Language Processing (NLP) tasks, aiming to enhance comprehension and accessibility of linguistic information. However, research often focuses on grammar rather than vocabulary and the different levels of word complexity. Selecting a core set of basic words can facilitate effective communication and promote clarity and precision in language use. In recent years, there has been a growing interest in leveraging Machine Learning (ML) techniques and Large Language Models (LLMs) to analyze and understand natural language. The thesis focuses on identifying semantic complexity through these techniques, specifically by creating a mapping between WordNet synsets and a basicness score, enabling an automatic basic-level term categorisation. The proposed methodology combines feature extraction, Machine Learning models and a human-annotated gold standard to predict the complexity level of a synset. By integrating computational and human judgments, it aims to gain a deeper understanding of the factors influencing synset basicness and to develop robust metrics for automatic classification of synset basicness.

Is it complex? An Approach based on Feature Extraction and Machine Learning Models for Basicness Computation.

BIBIRE, RALUCA ANDREEA
2023/2024

Abstract

The semantic complexity, along with intricate lexical richness, holds significant importance in Natural Language Processing (NLP) tasks, aiming to enhance comprehension and accessibility of linguistic information. However, research often focuses on grammar rather than vocabulary and the different levels of word complexity. Selecting a core set of basic words can facilitate effective communication and promote clarity and precision in language use. In recent years, there has been a growing interest in leveraging Machine Learning (ML) techniques and Large Language Models (LLMs) to analyze and understand natural language. The thesis focuses on identifying semantic complexity through these techniques, specifically by creating a mapping between WordNet synsets and a basicness score, enabling an automatic basic-level term categorisation. The proposed methodology combines feature extraction, Machine Learning models and a human-annotated gold standard to predict the complexity level of a synset. By integrating computational and human judgments, it aims to gain a deeper understanding of the factors influencing synset basicness and to develop robust metrics for automatic classification of synset basicness.
ENG
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
839556_bibire_839556_thesis.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 2.69 MB
Formato Adobe PDF
2.69 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/147613