The goal of this thesis is to introduce the reader to the Two Parameter Geometric Stick Breaking Mixture Model (TGSB) for the problem of Gaussian mixture estimation. In the first chapter we review the concept of Dirichlet Process Mixture Model (DPMM) and discuss about the impact of the concentration parameter on posterior number of clusters. In chapter 2 we focus on the identifiabilty issue of the DPMM and we review the latent variables approach in Walker (2007) to ease the estimation of mixture models with an infinite number of components. Then, following Fuentes-Garcia (2010) and P. Damien (2013), we discuss how the one parameter Geometric Stick Breaking Mixture Model (GSB) can be used to relax the identifiablity issue by constraining the mixing weights to have a geometric structure. In chapter 3 we introduce Two Parameter Geometric Stick Breaking Mixture Model as an extension of the one parameter GSB. The distinct feature of the TGSB is the availability of another parameter that allow to control the rate of decrease of the mixing weights. We expect this to be useful to improve the Gaussian mixture estimation in case the true mixture of characterized by many symmetric modes. For each model we derive the set of full conditional distributions and the pseudocode to implement the Gibbs sampler algorithm. In chapter 4 we analyse and compare in details of the mixture three models mentioned above in terms of mixing weights, traceplots of the active components and we try to explain how they allocate mass in different locations to estimated the true mixture. Lastly, we show the performance of the three models for the estimation of a Gaussian mixture with symmetric modes. We will see that although the Geometric models do not perform significantly better than the DPMM, they are more parsimonious in terms of number of clusters and they allow us to understand clearer how the active components are used to allocate probability mass in different locations.

The geometric stick-breaking process and a two parameter extension: posterior sampling algorithms in infinite mixture models

CAPPELLO, ALBERTO
2017/2018

Abstract

The goal of this thesis is to introduce the reader to the Two Parameter Geometric Stick Breaking Mixture Model (TGSB) for the problem of Gaussian mixture estimation. In the first chapter we review the concept of Dirichlet Process Mixture Model (DPMM) and discuss about the impact of the concentration parameter on posterior number of clusters. In chapter 2 we focus on the identifiabilty issue of the DPMM and we review the latent variables approach in Walker (2007) to ease the estimation of mixture models with an infinite number of components. Then, following Fuentes-Garcia (2010) and P. Damien (2013), we discuss how the one parameter Geometric Stick Breaking Mixture Model (GSB) can be used to relax the identifiablity issue by constraining the mixing weights to have a geometric structure. In chapter 3 we introduce Two Parameter Geometric Stick Breaking Mixture Model as an extension of the one parameter GSB. The distinct feature of the TGSB is the availability of another parameter that allow to control the rate of decrease of the mixing weights. We expect this to be useful to improve the Gaussian mixture estimation in case the true mixture of characterized by many symmetric modes. For each model we derive the set of full conditional distributions and the pseudocode to implement the Gibbs sampler algorithm. In chapter 4 we analyse and compare in details of the mixture three models mentioned above in terms of mixing weights, traceplots of the active components and we try to explain how they allocate mass in different locations to estimated the true mixture. Lastly, we show the performance of the three models for the estimation of a Gaussian mixture with symmetric modes. We will see that although the Geometric models do not perform significantly better than the DPMM, they are more parsimonious in terms of number of clusters and they allow us to understand clearer how the active components are used to allocate probability mass in different locations.
ENG
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
781726_thesis-stochasticsanddatascience-albertocappelloa.a.201718.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 4.38 MB
Formato Adobe PDF
4.38 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/51853