This thesis explores the application of advanced machine learning techniques to detect and analyze stereotypes in textual data, with a focus on the adolescent demographic. Utilizing the newly created STERHEOSCHOOL corpus, along with the established StereoHoax and FB-Stereotypes corpora, this work aims to develop, refine, and compare methods for automated stereotype detection. The research begins with the comprehensive construction and annotation of the STERHEOSCHOOL corpus, designed specifically to capture the nuances of stereotype expression among teenagers in educational settings. This corpus is annotated with multiple layers, including stereotype categorization, stance identification, and forms of discredit, employing a robust annotation schema inspired by current computational linguistics and social sciences research. A neural classification system based on the GilBERTo model is developed and meticulously trained on the STERHEOSCHOOL dataset. Initial classification efforts focus solely on this corpus to benchmark its capabilities and establish a foundational understanding of the model’s effectiveness in stereotype detection. Subsequent phases extend the classification to incorporate the StereoHoax and FB-Stereotypes datasets, enabling a comprehensive analysis across different contexts. This work demonstrates the feasibility and efficacy of using sophisticated machine learning models for social linguistics analysis, providing a scalable approach to addressing bias and stereotypes in digital communication. The results highlight the potential of machine learning in understanding and mitigating stereotypes, thus contributing to more informed and inclusive content strategies.

Individuazione degli stereotipi negli adolescenti: sviluppo di un corpus annotato ed esperimenti di classificazione con LLMs

CHIERCHIELLO, ELISA
2023/2024

Abstract

This thesis explores the application of advanced machine learning techniques to detect and analyze stereotypes in textual data, with a focus on the adolescent demographic. Utilizing the newly created STERHEOSCHOOL corpus, along with the established StereoHoax and FB-Stereotypes corpora, this work aims to develop, refine, and compare methods for automated stereotype detection. The research begins with the comprehensive construction and annotation of the STERHEOSCHOOL corpus, designed specifically to capture the nuances of stereotype expression among teenagers in educational settings. This corpus is annotated with multiple layers, including stereotype categorization, stance identification, and forms of discredit, employing a robust annotation schema inspired by current computational linguistics and social sciences research. A neural classification system based on the GilBERTo model is developed and meticulously trained on the STERHEOSCHOOL dataset. Initial classification efforts focus solely on this corpus to benchmark its capabilities and establish a foundational understanding of the model’s effectiveness in stereotype detection. Subsequent phases extend the classification to incorporate the StereoHoax and FB-Stereotypes datasets, enabling a comprehensive analysis across different contexts. This work demonstrates the feasibility and efficacy of using sophisticated machine learning models for social linguistics analysis, providing a scalable approach to addressing bias and stereotypes in digital communication. The results highlight the potential of machine learning in understanding and mitigating stereotypes, thus contributing to more informed and inclusive content strategies.
ENG
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
859193_tesi_chierchiello_elisa.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 3.78 MB
Formato Adobe PDF
3.78 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/147367