Banks employ increasingly sophisticated tools to manage risks. Among these risks, credit risk stands out as a potential trigger for systemic crisis affecting the entire economic system, as evidenced by the events of 2007-2008. Credit risk management relies on ever-advancing statistical techniques, including Artificial Intelligence (AI) and Machine Learning (ML), tools that are often wielded by professionals with STEM backgrounds, valued by banks for their technical expertise, problem-solving mindset, and quantitative skills. Nevertheless, they often enter the banking world without a financial background, which can negatively impact team dynamics and project outcomes in terms of both time and quality. This work aims to bridge this gap by providing STEM graduates with a guide to applying ML and AI to a critical part of credit risk management: credit scoring. The study begins with an introduction to ML, summarizing its main tools (Chapter 1). Chapter 2, starting from publicly available datasets named “Home Credit”, delves into data exploration, addressing data quality management, handling missing values, aggregating numerical, categorical and temporal data, and optimizing data representation. Chapter 3 explores various methodological approaches to building credit scoring models, ranging from traditional methods like regression to more complex techniques such as Gradient Boosting and Neural Networks. Each model is accompanied by a brief introduction outlining theoretical background and main hyper-parameters, followed by insights from various tests conducted on data. While advanced methodologies often yield higher accuracy, they can compromise transparency, a topic discussed in Chapter 4. Here, explainability issue was explored, mentioning metrics for its measurement, as well as insights on model fairness and validation, as required by recent regulation. Lastly, Chapter 5 presents conclusions, with Gradient Boosting identified as the optimal modeling approach for balancing generalization capacity and explainability. The conclusions include a list of limitations and potential future steps, emphasizing that deploying a ML model in banking goes beyond modeling, but includes challenges regarding, for example, technological infrastructure, effective monitoring, corporate culture and transparency with board, clients and regulators. The comprehensive bibliography enriches and concludes the work, addressing much of what was not detailed here
Banks employ increasingly sophisticated tools to manage risks. Among these risks, credit risk stands out as a potential trigger for systemic crisis affecting the entire economic system, as evidenced by the events of 2007-2008. Credit risk management relies on ever-advancing statistical techniques, including Artificial Intelligence (AI) and Machine Learning (ML), tools that are often wielded by professionals with STEM backgrounds, valued by banks for their technical expertise, problem-solving mindset, and quantitative skills. Nevertheless, they often enter the banking world without a financial background, which can negatively impact team dynamics and project outcomes in terms of both time and quality. This work aims to bridge this gap by providing STEM graduates with a guide to applying ML and AI to a critical part of credit risk management: credit scoring. The study begins with an introduction to ML, summarizing its main tools (Chapter 1). Chapter 2, starting from publicly available datasets named “Home Credit”, delves into data exploration, addressing data quality management, handling missing values, aggregating numerical, categorical and temporal data, and optimizing data representation. Chapter 3 explores various methodological approaches to building credit scoring models, ranging from traditional methods like regression to more complex techniques such as Gradient Boosting and Neural Networks. Each model is accompanied by a brief introduction outlining theoretical background and main hyper-parameters, followed by insights from various tests conducted on data. While advanced methodologies often yield higher accuracy, they can compromise transparency, a topic discussed in Chapter 4. Here, explainability issue was explored, mentioning metrics for its measurement, as well as insights on model fairness and validation, as required by recent regulation. Lastly, Chapter 5 presents conclusions, with Gradient Boosting identified as the optimal modeling approach for balancing generalization capacity and explainability. The conclusions include a list of limitations and potential future steps, emphasizing that deploying a ML model in banking goes beyond modeling, but includes challenges regarding, for example, technological infrastructure, effective monitoring, corporate culture and transparency with board, clients and regulators. The comprehensive bibliography enriches and concludes the work, addressing much of what was not detailed here
The Credit Approval as Machine Learning problem - A STEM Graduate's Guide to Quantitative Credit Scoring
RUBINO, UMBERTO
2022/2023
Abstract
Banks employ increasingly sophisticated tools to manage risks. Among these risks, credit risk stands out as a potential trigger for systemic crisis affecting the entire economic system, as evidenced by the events of 2007-2008. Credit risk management relies on ever-advancing statistical techniques, including Artificial Intelligence (AI) and Machine Learning (ML), tools that are often wielded by professionals with STEM backgrounds, valued by banks for their technical expertise, problem-solving mindset, and quantitative skills. Nevertheless, they often enter the banking world without a financial background, which can negatively impact team dynamics and project outcomes in terms of both time and quality. This work aims to bridge this gap by providing STEM graduates with a guide to applying ML and AI to a critical part of credit risk management: credit scoring. The study begins with an introduction to ML, summarizing its main tools (Chapter 1). Chapter 2, starting from publicly available datasets named “Home Credit”, delves into data exploration, addressing data quality management, handling missing values, aggregating numerical, categorical and temporal data, and optimizing data representation. Chapter 3 explores various methodological approaches to building credit scoring models, ranging from traditional methods like regression to more complex techniques such as Gradient Boosting and Neural Networks. Each model is accompanied by a brief introduction outlining theoretical background and main hyper-parameters, followed by insights from various tests conducted on data. While advanced methodologies often yield higher accuracy, they can compromise transparency, a topic discussed in Chapter 4. Here, explainability issue was explored, mentioning metrics for its measurement, as well as insights on model fairness and validation, as required by recent regulation. Lastly, Chapter 5 presents conclusions, with Gradient Boosting identified as the optimal modeling approach for balancing generalization capacity and explainability. The conclusions include a list of limitations and potential future steps, emphasizing that deploying a ML model in banking goes beyond modeling, but includes challenges regarding, for example, technological infrastructure, effective monitoring, corporate culture and transparency with board, clients and regulators. The comprehensive bibliography enriches and concludes the work, addressing much of what was not detailed hereFile | Dimensione | Formato | |
---|---|---|---|
932593_thecreditapprovalasmachinelearningproblem-umbertorubinoonlineversion.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
57.54 MB
Formato
Adobe PDF
|
57.54 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/146781