Two predictive models for binary classification: logistic regression and Extreme Gradient Boosting.
Theory, application and commercial consequences

One of the main objectives of each company is to attract as many customers as possible. Equally important is the loyalty strategy that they must implement to maintain customers. Given the scarcity of resources that characterizes the economic environment in which each company is immersed, it is useful to find a strategy to identify customers with the highest probability of not renewing existing contracts and identify the reasons for which this event, called churn, has a high probability of happening. After identifying these customers and the characteristics for which they can be considered risky, it is the responsibility of the company's management to devise and implement solutions to deal with this problem. The business problem described above immediately translates into a statistical-predictive problem: given a customer in a specific period of time, with which a series of features are associated, it is necessary to calculate the probability that the event "such customer will not renew existing contracts in the next period of time" occurs. The aim of this work is to describe how this problem was addressed in the specific case of the Cerved company, operating in the business information sector. Having available a data set including information from Cerved customers over the past four years and the "churn" or "non churn" label associated with each of these, it was possible to estimate two models, the classic Logistic Regression and a Machine Learning model called Extreme Gradient Boosting, based on the aggregation of decision trees whose structure is iteratively estimated. The theoretical details of the two models and the technical details relating to their implementation are reported, together with an accurate description of the data set used for the estimation and validation of the above models. The two models will be compared with respect to the accuracy ratio, the metric that in this project measures the predictive ability of the techniques used. Given a new customer input, the models are therefore able to return the estimated probability of churn and the features that have most influenced this result; thanks to this, it is possible to direct company resources only to the customers that the model considers most risky and to act only on the most significant variables.

Two predictive models for binary classification: logistic regression and Extreme Gradient Boosting. Theory, application and commercial consequences

TURTURRO, FRANCESCO

2018/2019

Abstract

One of the main objectives of each company is to attract as many customers as possible. Equally important is the loyalty strategy that they must implement to maintain customers. Given the scarcity of resources that characterizes the economic environment in which each company is immersed, it is useful to find a strategy to identify customers with the highest probability of not renewing existing contracts and identify the reasons for which this event, called churn, has a high probability of happening. After identifying these customers and the characteristics for which they can be considered risky, it is the responsibility of the company's management to devise and implement solutions to deal with this problem. The business problem described above immediately translates into a statistical-predictive problem: given a customer in a specific period of time, with which a series of features are associated, it is necessary to calculate the probability that the event "such customer will not renew existing contracts in the next period of time" occurs. The aim of this work is to describe how this problem was addressed in the specific case of the Cerved company, operating in the business information sector. Having available a data set including information from Cerved customers over the past four years and the "churn" or "non churn" label associated with each of these, it was possible to estimate two models, the classic Logistic Regression and a Machine Learning model called Extreme Gradient Boosting, based on the aggregation of decision trees whose structure is iteratively estimated. The theoretical details of the two models and the technical details relating to their implementation are reported, together with an accurate description of the data set used for the estimation and validation of the above models. The two models will be compared with respect to the accuracy ratio, the metric that in this project measures the predictive ability of the techniques used. Given a new customer input, the models are therefore able to return the estimated probability of churn and the features that have most influenced this result; thanks to this, it is possible to direct company resources only to the customers that the model considers most risky and to act only on the most significant variables.

Scheda breve

	Facoltà/Dipartimento
	
				SCIENZE ECONOMICO-SOCIALI E MATEMATICO-STATISTICHE
			
	Corso di studio
	
				QUANTITATIVE FINANCE AND INSURANCE  - FINANZA QUANTITATIVA E ASSICURAZIONI
			
	Titolo inglese
	
				Two predictive models for binary classification: logistic regression and Extreme Gradient Boosting.
Theory, application and commercial consequences
			
	Abstract in inglese
	
				One of the main objectives of each company is to attract as many customers as possible. Equally important is the loyalty strategy that they must implement to maintain customers. Given the scarcity of resources that characterizes the economic environment in which each company is immersed, it is useful to find a strategy to identify customers with the highest probability of not renewing existing contracts and identify the reasons for which this event, called churn, has a high probability of happening. After identifying these customers and the characteristics for which they can be considered risky, it is the responsibility of the company's management to devise and implement solutions to deal with this problem.
The business problem described above immediately translates into a statistical-predictive problem: given a customer in a specific period of time, with which a series of features are associated, it is necessary to calculate the probability that the event "such customer will not renew existing contracts in the next period of time" occurs.
The aim of this work is to describe how this problem was addressed in the specific case of the Cerved company, operating in the business information sector. Having available a data set including information from Cerved customers over the past four years and the "churn" or "non churn" label associated with each of these, it was possible to estimate two models, the classic Logistic Regression and a Machine Learning model called Extreme Gradient Boosting, based on the aggregation of decision trees whose structure is iteratively estimated.
The theoretical details of the two models and the technical details relating to their implementation are reported, together with an accurate description of the data set used for the estimation and validation of the above models. The two models will be compared with respect to the accuracy ratio, the metric that in this project measures the predictive ability of the techniques used.
Given a new customer input, the models are therefore able to return the estimated probability of churn and the features that have most influenced this result; thanks to this, it is possible to direct company resources only to the customers that the model considers most risky and to act only on the most significant variables.
			
	Relatrice / Relatore
	
				MEO, ROSA
FAVARO, STEFANO
			
	Controrelatrice / Controrelatore
	
				TESSIORE, GIOVANNI
			
	Modalità consultazione tesi
	
				IMPORT TESI SOLO SU ESSE3 DAL 2018
			
	Appare nelle tipologie:
	
				Corso di Laurea Magistrale

File in questo prodotto:

File	Dimensione	Formato
Tesi Turturro.pdf non disponibili Descrizione: Two predictive models for binary classication: Logistic Regression and Extreme Gradient Boosting. Theory, application and commercial consequences Dimensione 572.68 kB Formato Adobe PDF	572.68 kB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/1941