The era of big data has led to an exponential growth in the amount of information available, surpassing our current capabilities to effectively handle and process it. In response, compact data structures have emerged as a solution to efficiently represent and extract information from large data streams, albeit at the cost of sacrificing accuracy. Sketches, a type of compact randomized data structure, have gained prominence for their ability to estimate various statistics from massive data streams with significant gains in both time and memory efficiency. This paper focuses on token-based data streams, where a summary is created by applying sketching techniques using random hash functions. Specifically, we explore a Bayesian nonparametric (BNP) estimator that models the data stream of tokens through a Dirichlet process (DP) prior. Building upon the work introduced by Favaro et al. (2022), we present a revised version of the estimator and derive an explicit expression for its variance, enabling us to quantify its uncertainty.
Bayesian nonparametric cardinality estimation
FIOCCHI, FILIPPO
2022/2023
Abstract
The era of big data has led to an exponential growth in the amount of information available, surpassing our current capabilities to effectively handle and process it. In response, compact data structures have emerged as a solution to efficiently represent and extract information from large data streams, albeit at the cost of sacrificing accuracy. Sketches, a type of compact randomized data structure, have gained prominence for their ability to estimate various statistics from massive data streams with significant gains in both time and memory efficiency. This paper focuses on token-based data streams, where a summary is created by applying sketching techniques using random hash functions. Specifically, we explore a Bayesian nonparametric (BNP) estimator that models the data stream of tokens through a Dirichlet process (DP) prior. Building upon the work introduced by Favaro et al. (2022), we present a revised version of the estimator and derive an explicit expression for its variance, enabling us to quantify its uncertainty.File | Dimensione | Formato | |
---|---|---|---|
977691_thesis_fiocchi_final.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
1 MB
Formato
Adobe PDF
|
1 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/147966