This work is motivated by a Sport Analytics project in collaboration with the University of Kent (Canterbury, UK). The aim is to model the functions that describe the performance of shot put Olympic athletes during their careers. From a methodological point of view, we propose a Bayesian model for longitudinal data. The proposed approach is developed within the framework of functional data analysis, and can be applied to a wide range of applications besides the one considered here. We characterize the curves of each athlete as a non linear combination of an high-dimensional set of basis functions, with a latent factor prior placed on the basis function coefficients to allow for borrowing of strength across individuals, automatic shrinkage and bases selection. Further, we accommodate functional regression allowing for athlete specific covariate to impact on the shape of the estimated trajectories. Moreover, we assume a seasonal random effect to capture heterogeneity in an athlete's mean performances across seasons within his/her career. Our contributions are: from a theoretical point of view we extend the Bayesian Latent Factor Regression model for functional and longitudinal data of Montagna et al. in different way. To model non-equidistant and wiggly data, which are peculiar of this application, we modify the choice of basis functions and include a seasonal specific component. These combined will allow to obtain a sufficiently flexible trajectory that can be used in different so called centimeter-gram-second sport for single athlete trajectory estimation. After different validation of the model, we end up with our final findings about application. We are able to plot estimated trajectory and make prediction on future evolution for each athlete. Furthermore, we find out informations on how covariates affect evolution in performances during years. All this analysis can be very useful from a sport analytic perspective. For coaches, they can be used to set up a training regime able to take athlete at his/her maximum peaks of performances during the most important competitions. From an anti-doping perspective comparing estimated trajectories can be useful to raise flags about suspicious athletes that need to be tested. Looking especially at abrupt changes in seasonal random effect and first order differences, we can discriminate usual performances evolution form unnatural ones that can be motivated by the doping usage. We developed a completely new code by our own that is able, using a Gibbs Sampler, to generate the athlete underlying functions. The software used for programming is R, integrated with C++ for the most time-consuming matrix operations. To achieve reproducibility, the code developed for the analysis presented in this Thesis is freely available at the git-hub https://github.com/Samubertaina/Master-Thesis-codes.git.
Bayesian Functional Regression with application to shot put data
BERTAINA, SAMUELE
2017/2018
Abstract
This work is motivated by a Sport Analytics project in collaboration with the University of Kent (Canterbury, UK). The aim is to model the functions that describe the performance of shot put Olympic athletes during their careers. From a methodological point of view, we propose a Bayesian model for longitudinal data. The proposed approach is developed within the framework of functional data analysis, and can be applied to a wide range of applications besides the one considered here. We characterize the curves of each athlete as a non linear combination of an high-dimensional set of basis functions, with a latent factor prior placed on the basis function coefficients to allow for borrowing of strength across individuals, automatic shrinkage and bases selection. Further, we accommodate functional regression allowing for athlete specific covariate to impact on the shape of the estimated trajectories. Moreover, we assume a seasonal random effect to capture heterogeneity in an athlete's mean performances across seasons within his/her career. Our contributions are: from a theoretical point of view we extend the Bayesian Latent Factor Regression model for functional and longitudinal data of Montagna et al. in different way. To model non-equidistant and wiggly data, which are peculiar of this application, we modify the choice of basis functions and include a seasonal specific component. These combined will allow to obtain a sufficiently flexible trajectory that can be used in different so called centimeter-gram-second sport for single athlete trajectory estimation. After different validation of the model, we end up with our final findings about application. We are able to plot estimated trajectory and make prediction on future evolution for each athlete. Furthermore, we find out informations on how covariates affect evolution in performances during years. All this analysis can be very useful from a sport analytic perspective. For coaches, they can be used to set up a training regime able to take athlete at his/her maximum peaks of performances during the most important competitions. From an anti-doping perspective comparing estimated trajectories can be useful to raise flags about suspicious athletes that need to be tested. Looking especially at abrupt changes in seasonal random effect and first order differences, we can discriminate usual performances evolution form unnatural ones that can be motivated by the doping usage. We developed a completely new code by our own that is able, using a Gibbs Sampler, to generate the athlete underlying functions. The software used for programming is R, integrated with C++ for the most time-consuming matrix operations. To achieve reproducibility, the code developed for the analysis presented in this Thesis is freely available at the git-hub https://github.com/Samubertaina/Master-Thesis-codes.git.File | Dimensione | Formato | |
---|---|---|---|
779599_samuelebertainamasterthesis.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
11.13 MB
Formato
Adobe PDF
|
11.13 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/49058