Log messages are text reports written by software programs when events occur or status changes arise. Complex IT systems generate large numbers of log entries from myriad instances of applications deployed on multiple physical machines. The purpose of this study is to create a machine learning and statistical core for an application used as a log management tool by a large company. The aim of this core is to enable the real-time classification of single logs in criticality levels, as well as anomaly detection in logging behaviors. Standard log monitoring techniques usually rely on knowing the structure and content of logs to identify potentially relevant issues. Such analyses are often tailored to single applications. In this study a general and automated approach is developed based on semantic tags extracted from special recurring keywords. Classi- fication is performed using Machine Learning algorithms and time anomaly detection is achieved using time series models. For the starting analysis, a data set is constructed incorporating feedback from the client company. The logs are parsed to extract known fields and are subsequently tagged using keyword matching. The logs are classified according to their predicted business critical- ity, and are presented to the users in the resulting criticality ordering. In particular, SVMs are used for the criticality classification of the logs. The time series analysis is based on the analysis of trend and seasonality, and application of ARIMA models in the log frequency over time.
Log messages are text reports written by software programs when events occur or status changes arise. Complex IT systems generate large numbers of log entries from myriad instances of applications deployed on multiple physical machines. The purpose of this study is to create a machine learning and statistical core for an application used as a log management tool by a large company. The aim of this core is to enable the real-time classification of single logs in criticality levels, as well as anomaly detection in logging behaviors. Standard log monitoring techniques usually rely on knowing the structure and content of logs to identify potentially relevant issues. Such analyses are often tailored to single applications. In this study a general and automated approach is developed based on semantic tags extracted from special recurring keywords. Classi- fication is performed using Machine Learning algorithms and time anomaly detection is achieved using time series models. For the starting analysis, a data set is constructed incorporating feedback from the client company. The logs are parsed to extract known fields and are subsequently tagged using keyword matching. The logs are classified according to their predicted business critical- ity, and are presented to the users in the resulting criticality ordering. In particular, SVMs are used for the criticality classification of the logs. The time series analysis is based on the analysis of trend and seasonality, and application of ARIMA models in the log frequency over time.
Classificazione e Rilevamento di Anomalie su dati di Log: Un approccio di Machine Learning supervisionato
PERONI, CLAUDIO
2018/2019
Abstract
Log messages are text reports written by software programs when events occur or status changes arise. Complex IT systems generate large numbers of log entries from myriad instances of applications deployed on multiple physical machines. The purpose of this study is to create a machine learning and statistical core for an application used as a log management tool by a large company. The aim of this core is to enable the real-time classification of single logs in criticality levels, as well as anomaly detection in logging behaviors. Standard log monitoring techniques usually rely on knowing the structure and content of logs to identify potentially relevant issues. Such analyses are often tailored to single applications. In this study a general and automated approach is developed based on semantic tags extracted from special recurring keywords. Classi- fication is performed using Machine Learning algorithms and time anomaly detection is achieved using time series models. For the starting analysis, a data set is constructed incorporating feedback from the client company. The logs are parsed to extract known fields and are subsequently tagged using keyword matching. The logs are classified according to their predicted business critical- ity, and are presented to the users in the resulting criticality ordering. In particular, SVMs are used for the criticality classification of the logs. The time series analysis is based on the analysis of trend and seasonality, and application of ARIMA models in the log frequency over time.File | Dimensione | Formato | |
---|---|---|---|
847495_thesis.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
4.99 MB
Formato
Adobe PDF
|
4.99 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/51345