Is it complex? An Approach based on Feature Extraction and Machine Learning Models for Basicness Computation.

The semantic complexity, along with intricate lexical richness, holds significant importance in Natural Language Processing (NLP) tasks, aiming to enhance comprehension and accessibility of linguistic information. However, research often focuses on grammar rather than vocabulary and the different levels of word complexity. Selecting a core set of basic words can facilitate effective communication and promote clarity and precision in language use. In recent years, there has been a growing interest in leveraging Machine Learning (ML) techniques and Large Language Models (LLMs) to analyze and understand natural language. The thesis focuses on identifying semantic complexity through these techniques, specifically by creating a mapping between WordNet synsets and a basicness score, enabling an automatic basic-level term categorisation. The proposed methodology combines feature extraction, Machine Learning models and a human-annotated gold standard to predict the complexity level of a synset. By integrating computational and human judgments, it aims to gain a deeper understanding of the factors influencing synset basicness and to develop robust metrics for automatic classification of synset basicness.

Is it complex? An Approach based on Feature Extraction and Machine Learning Models for Basicness Computation.

BIBIRE, RALUCA ANDREEA

2023/2024

Abstract

The semantic complexity, along with intricate lexical richness, holds significant importance in Natural Language Processing (NLP) tasks, aiming to enhance comprehension and accessibility of linguistic information. However, research often focuses on grammar rather than vocabulary and the different levels of word complexity. Selecting a core set of basic words can facilitate effective communication and promote clarity and precision in language use. In recent years, there has been a growing interest in leveraging Machine Learning (ML) techniques and Large Language Models (LLMs) to analyze and understand natural language. The thesis focuses on identifying semantic complexity through these techniques, specifically by creating a mapping between WordNet synsets and a basicness score, enabling an automatic basic-level term categorisation. The proposed methodology combines feature extraction, Machine Learning models and a human-annotated gold standard to predict the complexity level of a synset. By integrating computational and human judgments, it aims to gain a deeper understanding of the factors influencing synset basicness and to develop robust metrics for automatic classification of synset basicness.

Scheda breve

	Facoltà/Dipartimento
	
				INFORMATICA
			
	Corso di studio
	
				INFORMATICA
			
	Lingua
	
				ENG
			
	Relatrice / Relatore
	
				DI CARO, Luigi
TORRIELLI, Federico
SCHIFANELLA, Claudio
			
	Modalità consultazione tesi
	
				IMPORT DA TESIONLINE
			
	Appare nelle tipologie:
	
				Corso di Laurea Magistrale

File in questo prodotto:

File	Dimensione	Formato
839556_bibire_839556_thesis.pdf non disponibili Tipologia: Altro materiale allegato Dimensione 2.69 MB Formato Adobe PDF	2.69 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/147613