This thesis presents four new Genetic Programming (GP) variants to model Prediction Intervals (PIs). A PI is defined as the range of values in which a target is expected to fall, given its relative predictors’ values. In practice, this translates into models that have to search for two functions of the predictors that act as the lower and upper bounds of the intervals, containing the required portion of the data, called desired coverage probability. In the literature mainly two strategies to predict PIs can be found. The first is called "direct" and consists of using all the observations during the model training to directly estimate the PIs. The second strategy is called "sequential" and consists of, firstly, running a model to obtain a crisp prediction, and then using it to model the lower and upper bound independently. The first proposed variant of GP, called CWC-GP, uses the Coverage and Width Criterion (CWC) as a fitness function, thus using a single objective approach. The CWC measure incorporates in a unique function the two fundamental properties of PIs, i.e. its coverage probability and its width. The second proposed variant of GP, called LUBE-GP, uses instead two fitness functions that take into account the two properties of PIs, following a multi-objective approach. For both variants, both a direct and a sequential version were implemented, for a total of four approaches. The proposed methods were implemented in Python and subjected to experiments to study their performance in predicting PIs and to explore their different parameters. For example, all methods were tested according to different values of the desired coverage probability, or in the case of multi-objective methods, by using different selection procedures. In addition, two methods proposed in the literature, the Joint Supervision (JS) and the LUBE-NN, were implemented in order to be compared with the proposed methods. From the analysis of the results, it appears that the GP approaches are capable of producing high-quality PIs, i.e. the intervals contain the desired amount of data within them and are sufficiently tight. In particular, the CWC-GP methods seem to be the most promising ones as they succeed better in the width optimization. The best performing approach results the Direct CWC-GP which requires less running time with respect to its sequential version. The presented preliminary results pave the way to further investigations on the proposed GP approaches.
Programmazione Genetica per la stima di Intervalli di Previsione
TALLONE, NICCOLÒ
2020/2021
Abstract
This thesis presents four new Genetic Programming (GP) variants to model Prediction Intervals (PIs). A PI is defined as the range of values in which a target is expected to fall, given its relative predictors’ values. In practice, this translates into models that have to search for two functions of the predictors that act as the lower and upper bounds of the intervals, containing the required portion of the data, called desired coverage probability. In the literature mainly two strategies to predict PIs can be found. The first is called "direct" and consists of using all the observations during the model training to directly estimate the PIs. The second strategy is called "sequential" and consists of, firstly, running a model to obtain a crisp prediction, and then using it to model the lower and upper bound independently. The first proposed variant of GP, called CWC-GP, uses the Coverage and Width Criterion (CWC) as a fitness function, thus using a single objective approach. The CWC measure incorporates in a unique function the two fundamental properties of PIs, i.e. its coverage probability and its width. The second proposed variant of GP, called LUBE-GP, uses instead two fitness functions that take into account the two properties of PIs, following a multi-objective approach. For both variants, both a direct and a sequential version were implemented, for a total of four approaches. The proposed methods were implemented in Python and subjected to experiments to study their performance in predicting PIs and to explore their different parameters. For example, all methods were tested according to different values of the desired coverage probability, or in the case of multi-objective methods, by using different selection procedures. In addition, two methods proposed in the literature, the Joint Supervision (JS) and the LUBE-NN, were implemented in order to be compared with the proposed methods. From the analysis of the results, it appears that the GP approaches are capable of producing high-quality PIs, i.e. the intervals contain the desired amount of data within them and are sufficiently tight. In particular, the CWC-GP methods seem to be the most promising ones as they succeed better in the width optimization. The best performing approach results the Direct CWC-GP which requires less running time with respect to its sequential version. The presented preliminary results pave the way to further investigations on the proposed GP approaches.File | Dimensione | Formato | |
---|---|---|---|
836120_tesi_gp_tallone.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
7.19 MB
Formato
Adobe PDF
|
7.19 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/82120