The thesis presents several approaches to the estimation of the number of distinct values in a database relation, which is a key quantity for query optimization purposes. In particular, we consider the case of data following a Power Law distribution, which is an assumption that allows to model different human activity data. After a review of the estimators proposed in the literature, we present a Bayesian nonparametric apprroach based on the Pitman Yor process that is particularly fitted for this estimation problem. Finally, we test and compare the estimators on real human activity databases. ​

The thesis presents several approaches to the estimation of the number of distinct values in a database relation, which is a key quantity for query optimization purposes. In particular, we consider the case of data following a Power Law distribution, which is an assumption that allows to model different human activity data. After a review of the estimators proposed in the literature, we present a Bayesian nonparametric apprroach based on the Pitman Yor process that is particularly fitted for this estimation problem. Finally, we test and compare the estimators on real human activity databases. ​

A Bayesian Nonparametric Approach to Query Optimization

BELLIARDO, ENRICO MARIA
2017/2018

Abstract

The thesis presents several approaches to the estimation of the number of distinct values in a database relation, which is a key quantity for query optimization purposes. In particular, we consider the case of data following a Power Law distribution, which is an assumption that allows to model different human activity data. After a review of the estimators proposed in the literature, we present a Bayesian nonparametric apprroach based on the Pitman Yor process that is particularly fitted for this estimation problem. Finally, we test and compare the estimators on real human activity databases. ​
ENG
The thesis presents several approaches to the estimation of the number of distinct values in a database relation, which is a key quantity for query optimization purposes. In particular, we consider the case of data following a Power Law distribution, which is an assumption that allows to model different human activity data. After a review of the estimators proposed in the literature, we present a Bayesian nonparametric apprroach based on the Pitman Yor process that is particularly fitted for this estimation problem. Finally, we test and compare the estimators on real human activity databases. ​
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
781348_tesi.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 1.87 MB
Formato Adobe PDF
1.87 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/51836