The Credit Approval as Machine Learning problem - A STEM Graduate's Guide to Quantitative Credit Scoring

Banks employ increasingly sophisticated tools to manage risks. Among these risks, credit risk stands out as a potential trigger for systemic crisis affecting the entire economic system, as evidenced by the events of 2007-2008. Credit risk management relies on ever-advancing statistical techniques, including Artificial Intelligence (AI) and Machine Learning (ML), tools that are often wielded by professionals with STEM backgrounds, valued by banks for their technical expertise, problem-solving mindset, and quantitative skills. Nevertheless, they often enter the banking world without a financial background, which can negatively impact team dynamics and project outcomes in terms of both time and quality. This work aims to bridge this gap by providing STEM graduates with a guide to applying ML and AI to a critical part of credit risk management: credit scoring. The study begins with an introduction to ML, summarizing its main tools (Chapter 1). Chapter 2, starting from publicly available datasets named “Home Credit”, delves into data exploration, addressing data quality management, handling missing values, aggregating numerical, categorical and temporal data, and optimizing data representation. Chapter 3 explores various methodological approaches to building credit scoring models, ranging from traditional methods like regression to more complex techniques such as Gradient Boosting and Neural Networks. Each model is accompanied by a brief introduction outlining theoretical background and main hyper-parameters, followed by insights from various tests conducted on data. While advanced methodologies often yield higher accuracy, they can compromise transparency, a topic discussed in Chapter 4. Here, explainability issue was explored, mentioning metrics for its measurement, as well as insights on model fairness and validation, as required by recent regulation. Lastly, Chapter 5 presents conclusions, with Gradient Boosting identified as the optimal modeling approach for balancing generalization capacity and explainability. The conclusions include a list of limitations and potential future steps, emphasizing that deploying a ML model in banking goes beyond modeling, but includes challenges regarding, for example, technological infrastructure, effective monitoring, corporate culture and transparency with board, clients and regulators. The comprehensive bibliography enriches and concludes the work, addressing much of what was not detailed here

The Credit Approval as Machine Learning problem - A STEM Graduate's Guide to Quantitative Credit Scoring

RUBINO, UMBERTO

2022/2023

Abstract

Banks employ increasingly sophisticated tools to manage risks. Among these risks, credit risk stands out as a potential trigger for systemic crisis affecting the entire economic system, as evidenced by the events of 2007-2008. Credit risk management relies on ever-advancing statistical techniques, including Artificial Intelligence (AI) and Machine Learning (ML), tools that are often wielded by professionals with STEM backgrounds, valued by banks for their technical expertise, problem-solving mindset, and quantitative skills. Nevertheless, they often enter the banking world without a financial background, which can negatively impact team dynamics and project outcomes in terms of both time and quality. This work aims to bridge this gap by providing STEM graduates with a guide to applying ML and AI to a critical part of credit risk management: credit scoring. The study begins with an introduction to ML, summarizing its main tools (Chapter 1). Chapter 2, starting from publicly available datasets named “Home Credit”, delves into data exploration, addressing data quality management, handling missing values, aggregating numerical, categorical and temporal data, and optimizing data representation. Chapter 3 explores various methodological approaches to building credit scoring models, ranging from traditional methods like regression to more complex techniques such as Gradient Boosting and Neural Networks. Each model is accompanied by a brief introduction outlining theoretical background and main hyper-parameters, followed by insights from various tests conducted on data. While advanced methodologies often yield higher accuracy, they can compromise transparency, a topic discussed in Chapter 4. Here, explainability issue was explored, mentioning metrics for its measurement, as well as insights on model fairness and validation, as required by recent regulation. Lastly, Chapter 5 presents conclusions, with Gradient Boosting identified as the optimal modeling approach for balancing generalization capacity and explainability. The conclusions include a list of limitations and potential future steps, emphasizing that deploying a ML model in banking goes beyond modeling, but includes challenges regarding, for example, technological infrastructure, effective monitoring, corporate culture and transparency with board, clients and regulators. The comprehensive bibliography enriches and concludes the work, addressing much of what was not detailed here

Scheda breve

	Facoltà/Dipartimento
	
				FISICA
			
	Corso di studio
	
				FISICA DEI SISTEMI COMPLESSI
			
	Lingua
	
				ENG
			
	Abstract in inglese
	
				Banks employ increasingly sophisticated tools to manage risks. Among these risks, credit risk stands out as a potential trigger for systemic crisis affecting the entire economic system, as evidenced by the events of 2007-2008. Credit risk management relies on ever-advancing statistical techniques, including Artificial Intelligence (AI) and Machine Learning (ML), tools that are often wielded by professionals with STEM backgrounds, valued by banks for their technical expertise, problem-solving mindset, and quantitative skills. Nevertheless, they often enter the banking world without a financial background, which can negatively impact team dynamics and project outcomes in terms of both time and quality. This work aims to bridge this gap by providing STEM graduates with a guide to applying ML and AI to a critical part of credit risk management:  credit scoring. The study begins with an introduction to ML, summarizing its main tools (Chapter 1). Chapter 2, starting from publicly available datasets named “Home Credit”, delves into data exploration, addressing data quality management, handling missing values, aggregating numerical, categorical and temporal data, and optimizing data representation. Chapter 3 explores various methodological approaches to building credit scoring models, ranging from traditional methods like regression to more complex techniques such as Gradient Boosting and Neural Networks. Each model is accompanied by a brief introduction outlining theoretical background and main hyper-parameters, followed by insights from various tests conducted on data. While advanced methodologies often yield higher accuracy, they can compromise transparency, a topic discussed in Chapter 4. Here, explainability issue was explored, mentioning metrics for its measurement, as well as insights on model fairness and validation, as required by recent regulation. Lastly, Chapter 5 presents conclusions, with Gradient Boosting identified as the optimal modeling approach for balancing generalization capacity and explainability. The conclusions include a list of limitations and potential future steps, emphasizing that deploying a ML model in banking goes beyond modeling, but includes challenges regarding, for example, technological infrastructure, effective monitoring, corporate culture and transparency with board, clients and regulators. The comprehensive bibliography enriches and concludes the work, addressing much of what was not detailed here
			
	Relatrice / Relatore
	
				FARISELLI, Piero
			
	Modalità consultazione tesi
	
				IMPORT DA TESIONLINE
			
	Appare nelle tipologie:
	
				Corso di Laurea Magistrale

File in questo prodotto:

File	Dimensione	Formato
932593_thecreditapprovalasmachinelearningproblem-umbertorubinoonlineversion.pdf non disponibili Tipologia: Altro materiale allegato Dimensione 57.54 MB Formato Adobe PDF	57.54 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/146781