A non-parametric approach to query optimization under heavy tails

The necessity to realize requests of information from databases (query) is very popular in many fields, from medical to industrial sector, passing through educational one. In many disciplines the amount of data that should be managed is dramatically huge (big data problem) and even the most advanced processors can take unacceptable time to perform query. We will show that to execute a query in the best possible way it is important to know how many distinct values there are in the considered database: knowing the number of distinct values, it is possible to understand the best order of steps to do in order to construct a good query plan. The right choice of the most efficient algorithm to apply to a particular query is of paramount importance to get the best result in the shortest time. In this thesis two algorithmic estimators for distinct values are implemented using a non-parametric approach for Zipfian distribution, which is a type of distribution with heavy tails. The aim is to analyze their performance both in terms of errors and computational time. Possible future works can be done in this field in order to find estimators that lower execution times (in particular for highly skewed data) and reduce the error for small sample sizes.

A non-parametric approach to query optimization under heavy tails

PALESTINI, SIMONA

2016/2017

Abstract

The necessity to realize requests of information from databases (query) is very popular in many fields, from medical to industrial sector, passing through educational one. In many disciplines the amount of data that should be managed is dramatically huge (big data problem) and even the most advanced processors can take unacceptable time to perform query. We will show that to execute a query in the best possible way it is important to know how many distinct values there are in the considered database: knowing the number of distinct values, it is possible to understand the best order of steps to do in order to construct a good query plan. The right choice of the most efficient algorithm to apply to a particular query is of paramount importance to get the best result in the shortest time. In this thesis two algorithmic estimators for distinct values are implemented using a non-parametric approach for Zipfian distribution, which is a type of distribution with heavy tails. The aim is to analyze their performance both in terms of errors and computational time. Possible future works can be done in this field in order to find estimators that lower execution times (in particular for highly skewed data) and reduce the error for small sample sizes.

Scheda breve

	Facoltà/Dipartimento
	
				SCIENZE ECONOMICO-SOCIALI E MATEMATICO-STATISTICHE
			
	Corso di studio
	
				QUANTITATIVE FINANCE AND INSURANCE  - FINANZA QUANTITATIVA E ASSICURAZIONI
			
	Lingua
	
				ENG
			
	Relatrice / Relatore
	
				FAVARO, Stefano
			
	Modalità consultazione tesi
	
				IMPORT DA TESIONLINE
			
	Appare nelle tipologie:
	
				Corso di Laurea Magistrale

File in questo prodotto:

File	Dimensione	Formato
826271_tesi.pdf non disponibili Tipologia: Altro materiale allegato Dimensione 723.79 kB Formato Adobe PDF	723.79 kB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/46931