In this dissertation we adopt an approach based on the theory of Causality to analyze a data set concerning a public transport fleet of buses with the aim of characterizing the overall system composed by vehicle, driver and environment. Beyond driver's capabilities and experiences, the driving style can be influenced by many external factors including the traffic situation, the environment, the vehicle, the road, etc. The recorded quantities measure the driver's actions (e.g. time spent using brakes), the bus status (e.g. total bus mass including the passengers on-board), the relationship with the environment (e.g. the difference in altitude between the departure and the arrival), climate factors (e.g. month) and factors related to the traffic situation (e.g. frequency of the intermediary breaks taken along the way). We use the Bayesian Networks as the main tool for identifying cause-effect relationships between the recorded features, mainly focusing on how and how much the fuel consumption is influenced by the circumstances surrounding a travel. The dissertation starts performing preliminary statistical tests to pinpoint possible associations between the fuel consumption and some other variables. Different regression models are built to predict the fuel consumption and the predictive performances are tested and then compared in the end. The best regression models turn out to be the ones that use quantities expressing potential energy, kinetic energy and some other interaction terms as predictors drawing inspiration from the law of conservation of energy. Moreover, considering the effects of up-slope and down-slope separately helps the description of our data. The existing basic concepts of the theory of Causality are reported to explain the rationales behind the Bayesian Networks learned from the observational data afterwards. The cause-effect analysis approach allows to investigate how each feature is influenced by different factors. Since correlation does not imply causation, our aim is to separate the genuine causation properties from spurious associations in our observational study and evaluate whether an association between two variables X and Y may be contaminated by one (or more) external common cause Z that is referred as confounder factor. For example, in our data set it turns out that the travel distance induces spurious association between the fuel consumption and the total bus mass (including the passengers on-board). Indeed, paradoxically, our data supports that the highest fuel consumption values are associated with low total bus mass. This may be due to the fact that people could prefer other means of transport for running long distances but, reasonably, a genuine cause for fuel burning increase is the travel extension rather than the lightness of the mass. As final note on data and methodology, the structure of Gaussian Bayesian Networks is learned from the original continuous variables of the data set, while Multinomial Bayesian Networks are built using discretized data for accommodating skewness, non-linear relationships and non-gaussianity of the empirical probability distributions. Model averaging by means of bootstrap resampling is performed to improve the quality of the structures learned.
In this dissertation we adopt an approach based on the theory of Causality to analyze a data set concerning a public transport fleet of buses with the aim of characterizing the overall system composed by vehicle, driver and environment. Beyond driver's capabilities and experiences, the driving style can be influenced by many external factors including the traffic situation, the environment, the vehicle, the road, etc. The recorded quantities measure the driver's actions (e.g. time spent using brakes), the bus status (e.g. total bus mass including the passengers on-board), the relationship with the environment (e.g. the difference in altitude between the departure and the arrival), climate factors (e.g. month) and factors related to the traffic situation (e.g. frequency of the intermediary breaks taken along the way). We use the Bayesian Networks as the main tool for identifying cause-effect relationships between the recorded features, mainly focusing on how and how much the fuel consumption is influenced by the circumstances surrounding a travel. The dissertation starts performing preliminary statistical tests to pinpoint possible associations between the fuel consumption and some other variables. Different regression models are built to predict the fuel consumption and the predictive performances are tested and then compared in the end. The best regression models turn out to be the ones that use quantities expressing potential energy, kinetic energy and some other interaction terms as predictors drawing inspiration from the law of conservation of energy. Moreover, considering the effects of up-slope and down-slope separately helps the description of our data. The existing basic concepts of the theory of Causality are reported to explain the rationales behind the Bayesian Networks learned from the observational data afterwards. The cause-effect analysis approach allows to investigate how each feature is influenced by different factors. Since correlation does not imply causation, our aim is to separate the genuine causation properties from spurious associations in our observational study and evaluate whether an association between two variables X and Y may be contaminated by one (or more) external common cause Z that is referred as confounder factor. For example, in our data set it turns out that the travel distance induces spurious association between the fuel consumption and the total bus mass (including the passengers on-board). Indeed, paradoxically, our data supports that the highest fuel consumption values are associated with low total bus mass. This may be due to the fact that people could prefer other means of transport for running long distances but, reasonably, a genuine cause for fuel burning increase is the travel extension rather than the lightness of the mass. As final note on data and methodology, the structure of Gaussian Bayesian Networks is learned from the original continuous variables of the data set, while Multinomial Bayesian Networks are built using discretized data for accommodating skewness, non-linear relationships and non-gaussianity of the empirical probability distributions. Model averaging by means of bootstrap resampling is performed to improve the quality of the structures learned.
Applying machine learning to improve the driving style
PELLEGRINO, MICHELA
2018/2019
Abstract
In this dissertation we adopt an approach based on the theory of Causality to analyze a data set concerning a public transport fleet of buses with the aim of characterizing the overall system composed by vehicle, driver and environment. Beyond driver's capabilities and experiences, the driving style can be influenced by many external factors including the traffic situation, the environment, the vehicle, the road, etc. The recorded quantities measure the driver's actions (e.g. time spent using brakes), the bus status (e.g. total bus mass including the passengers on-board), the relationship with the environment (e.g. the difference in altitude between the departure and the arrival), climate factors (e.g. month) and factors related to the traffic situation (e.g. frequency of the intermediary breaks taken along the way). We use the Bayesian Networks as the main tool for identifying cause-effect relationships between the recorded features, mainly focusing on how and how much the fuel consumption is influenced by the circumstances surrounding a travel. The dissertation starts performing preliminary statistical tests to pinpoint possible associations between the fuel consumption and some other variables. Different regression models are built to predict the fuel consumption and the predictive performances are tested and then compared in the end. The best regression models turn out to be the ones that use quantities expressing potential energy, kinetic energy and some other interaction terms as predictors drawing inspiration from the law of conservation of energy. Moreover, considering the effects of up-slope and down-slope separately helps the description of our data. The existing basic concepts of the theory of Causality are reported to explain the rationales behind the Bayesian Networks learned from the observational data afterwards. The cause-effect analysis approach allows to investigate how each feature is influenced by different factors. Since correlation does not imply causation, our aim is to separate the genuine causation properties from spurious associations in our observational study and evaluate whether an association between two variables X and Y may be contaminated by one (or more) external common cause Z that is referred as confounder factor. For example, in our data set it turns out that the travel distance induces spurious association between the fuel consumption and the total bus mass (including the passengers on-board). Indeed, paradoxically, our data supports that the highest fuel consumption values are associated with low total bus mass. This may be due to the fact that people could prefer other means of transport for running long distances but, reasonably, a genuine cause for fuel burning increase is the travel extension rather than the lightness of the mass. As final note on data and methodology, the structure of Gaussian Bayesian Networks is learned from the original continuous variables of the data set, while Multinomial Bayesian Networks are built using discretized data for accommodating skewness, non-linear relationships and non-gaussianity of the empirical probability distributions. Model averaging by means of bootstrap resampling is performed to improve the quality of the structures learned.File | Dimensione | Formato | |
---|---|---|---|
756919_thesismichelapellegrino.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
128.89 MB
Formato
Adobe PDF
|
128.89 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/51789