"Churn" is defined as the client's action of quit paying for a subscription, a utility or a product and it could be related to any kind of business. Churn has become extremely popular in the Data science environment for two reasons above all: firstly, the increase of the number of online services, app and websites where is possible to sign a subscription. Secondly, the amount of data available. WELT.de, a daily newspaper, and BILD.de, a daily tabloid, are two of the most famous newspapers in Germany. Both offer an online premium subscription offering customers interesting features and extras added to the free account available to everyone. The goal of their Data Science team is to study behavioural data collected from those special users, to detect if they are going to “churn” in the upcoming month. In the first chapter the problem is introduced, together with some related works and possible solutions applied on similar projects. Different approaches to the data are studied in the second chapter: collection, cleaning and preparation, together with a robust section on feature selection that could be useful for any kind of problem related to the usage of a website. Machine learning models (for predictions) are presented, both linear and non-linear, from a theoretic point of view: logistic regression, decision trees and random forests among the others. Different metrics are proposed, depending on the goal of the project, AUC and log-loss above all. The core of the thesis is constructed with the results obtained by the above models applied to real data from the newspapers. Side problems are presented as well, such as the privacy protection on personal data and the difficulties in using standard approaches with Big Data. Finally, the last chapter consists of final thoughts on the project and its results, jointly with future works. The master thesis is intended to be a “first machine learning project”, for people from different backgrounds aiming to grapple with a real data science project.
Churn Detection in News Sites: Predicting Subscription Cancellation from Behavioral Data
DOLCI, ALESSANDRO
2018/2019
Abstract
"Churn" is defined as the client's action of quit paying for a subscription, a utility or a product and it could be related to any kind of business. Churn has become extremely popular in the Data science environment for two reasons above all: firstly, the increase of the number of online services, app and websites where is possible to sign a subscription. Secondly, the amount of data available. WELT.de, a daily newspaper, and BILD.de, a daily tabloid, are two of the most famous newspapers in Germany. Both offer an online premium subscription offering customers interesting features and extras added to the free account available to everyone. The goal of their Data Science team is to study behavioural data collected from those special users, to detect if they are going to “churn” in the upcoming month. In the first chapter the problem is introduced, together with some related works and possible solutions applied on similar projects. Different approaches to the data are studied in the second chapter: collection, cleaning and preparation, together with a robust section on feature selection that could be useful for any kind of problem related to the usage of a website. Machine learning models (for predictions) are presented, both linear and non-linear, from a theoretic point of view: logistic regression, decision trees and random forests among the others. Different metrics are proposed, depending on the goal of the project, AUC and log-loss above all. The core of the thesis is constructed with the results obtained by the above models applied to real data from the newspapers. Side problems are presented as well, such as the privacy protection on personal data and the difficulties in using standard approaches with Big Data. Finally, the last chapter consists of final thoughts on the project and its results, jointly with future works. The master thesis is intended to be a “first machine learning project”, for people from different backgrounds aiming to grapple with a real data science project.File | Dimensione | Formato | |
---|---|---|---|
870697_dolci_master_thesis.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
1.56 MB
Formato
Adobe PDF
|
1.56 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/51246