Programmazione Genetica per la stima di Intervalli di Previsione

This thesis presents four new Genetic Programming (GP) variants to model Prediction Intervals (PIs). A PI is defined as the range of values in which a target is expected to fall, given its relative predictors’ values. In practice, this translates into models that have to search for two functions of the predictors that act as the lower and upper bounds of the intervals, containing the required portion of the data, called desired coverage probability. In the literature mainly two strategies to predict PIs can be found. The first is called "direct" and consists of using all the observations during the model training to directly estimate the PIs. The second strategy is called "sequential" and consists of, firstly, running a model to obtain a crisp prediction, and then using it to model the lower and upper bound independently. The first proposed variant of GP, called CWC-GP, uses the Coverage and Width Criterion (CWC) as a fitness function, thus using a single objective approach. The CWC measure incorporates in a unique function the two fundamental properties of PIs, i.e. its coverage probability and its width. The second proposed variant of GP, called LUBE-GP, uses instead two fitness functions that take into account the two properties of PIs, following a multi-objective approach. For both variants, both a direct and a sequential version were implemented, for a total of four approaches. The proposed methods were implemented in Python and subjected to experiments to study their performance in predicting PIs and to explore their different parameters. For example, all methods were tested according to different values of the desired coverage probability, or in the case of multi-objective methods, by using different selection procedures. In addition, two methods proposed in the literature, the Joint Supervision (JS) and the LUBE-NN, were implemented in order to be compared with the proposed methods. From the analysis of the results, it appears that the GP approaches are capable of producing high-quality PIs, i.e. the intervals contain the desired amount of data within them and are sufficiently tight. In particular, the CWC-GP methods seem to be the most promising ones as they succeed better in the width optimization. The best performing approach results the Direct CWC-GP which requires less running time with respect to its sequential version. The presented preliminary results pave the way to further investigations on the proposed GP approaches.

Programmazione Genetica per la stima di Intervalli di Previsione

TALLONE, NICCOLÒ

2020/2021

Abstract

This thesis presents four new Genetic Programming (GP) variants to model Prediction Intervals (PIs). A PI is defined as the range of values in which a target is expected to fall, given its relative predictors’ values. In practice, this translates into models that have to search for two functions of the predictors that act as the lower and upper bounds of the intervals, containing the required portion of the data, called desired coverage probability. In the literature mainly two strategies to predict PIs can be found. The first is called "direct" and consists of using all the observations during the model training to directly estimate the PIs. The second strategy is called "sequential" and consists of, firstly, running a model to obtain a crisp prediction, and then using it to model the lower and upper bound independently. The first proposed variant of GP, called CWC-GP, uses the Coverage and Width Criterion (CWC) as a fitness function, thus using a single objective approach. The CWC measure incorporates in a unique function the two fundamental properties of PIs, i.e. its coverage probability and its width. The second proposed variant of GP, called LUBE-GP, uses instead two fitness functions that take into account the two properties of PIs, following a multi-objective approach. For both variants, both a direct and a sequential version were implemented, for a total of four approaches. The proposed methods were implemented in Python and subjected to experiments to study their performance in predicting PIs and to explore their different parameters. For example, all methods were tested according to different values of the desired coverage probability, or in the case of multi-objective methods, by using different selection procedures. In addition, two methods proposed in the literature, the Joint Supervision (JS) and the LUBE-NN, were implemented in order to be compared with the proposed methods. From the analysis of the results, it appears that the GP approaches are capable of producing high-quality PIs, i.e. the intervals contain the desired amount of data within them and are sufficiently tight. In particular, the CWC-GP methods seem to be the most promising ones as they succeed better in the width optimization. The best performing approach results the Direct CWC-GP which requires less running time with respect to its sequential version. The presented preliminary results pave the way to further investigations on the proposed GP approaches.

Scheda breve

	Facoltà/Dipartimento
	
				MATEMATICA "GIUSEPPE PEANO"
			
	Corso di studio
	
				STOCHASTICS AND DATA SCIENCE
			
	Lingua
	
				ENG
			
	Relatrice / Relatore
	
				GIACOBINI, Mario Dante Lucio
			
	Modalità consultazione tesi
	
				IMPORT DA TESIONLINE
			
	Appare nelle tipologie:
	
				Corso di Laurea Magistrale

File in questo prodotto:

File	Dimensione	Formato
836120_tesi_gp_tallone.pdf non disponibili Tipologia: Altro materiale allegato Dimensione 7.19 MB Formato Adobe PDF	7.19 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/82120