Classificazione e Rilevamento di Anomalie su dati di Log:
Un approccio di Machine Learning supervisionato

Log messages are text reports written by software programs when events occur or status changes arise. Complex IT systems generate large numbers of log entries from myriad instances of applications deployed on multiple physical machines. The purpose of this study is to create a machine learning and statistical core for an application used as a log management tool by a large company. The aim of this core is to enable the real-time classification of single logs in criticality levels, as well as anomaly detection in logging behaviors. Standard log monitoring techniques usually rely on knowing the structure and content of logs to identify potentially relevant issues. Such analyses are often tailored to single applications. In this study a general and automated approach is developed based on semantic tags extracted from special recurring keywords. Classi- fication is performed using Machine Learning algorithms and time anomaly detection is achieved using time series models. For the starting analysis, a data set is constructed incorporating feedback from the client company. The logs are parsed to extract known fields and are subsequently tagged using keyword matching. The logs are classified according to their predicted business critical- ity, and are presented to the users in the resulting criticality ordering. In particular, SVMs are used for the criticality classification of the logs. The time series analysis is based on the analysis of trend and seasonality, and application of ARIMA models in the log frequency over time.

Classificazione e Rilevamento di Anomalie su dati di Log: Un approccio di Machine Learning supervisionato

PERONI, CLAUDIO

2018/2019

Abstract

Log messages are text reports written by software programs when events occur or status changes arise. Complex IT systems generate large numbers of log entries from myriad instances of applications deployed on multiple physical machines. The purpose of this study is to create a machine learning and statistical core for an application used as a log management tool by a large company. The aim of this core is to enable the real-time classification of single logs in criticality levels, as well as anomaly detection in logging behaviors. Standard log monitoring techniques usually rely on knowing the structure and content of logs to identify potentially relevant issues. Such analyses are often tailored to single applications. In this study a general and automated approach is developed based on semantic tags extracted from special recurring keywords. Classi- fication is performed using Machine Learning algorithms and time anomaly detection is achieved using time series models. For the starting analysis, a data set is constructed incorporating feedback from the client company. The logs are parsed to extract known fields and are subsequently tagged using keyword matching. The logs are classified according to their predicted business critical- ity, and are presented to the users in the resulting criticality ordering. In particular, SVMs are used for the criticality classification of the logs. The time series analysis is based on the analysis of trend and seasonality, and application of ARIMA models in the log frequency over time.

Scheda breve

	Facoltà/Dipartimento
	
				MATEMATICA "GIUSEPPE PEANO"
			
	Corso di studio
	
				STOCHASTICS AND DATA SCIENCE
			
	Lingua
	
				ENG
			
	Abstract in inglese
	
				Log messages are text reports written by software programs when events occur or status changes arise. Complex IT systems generate large numbers of log entries from myriad instances of applications deployed on multiple physical machines.
The purpose of this study is to create a machine learning and statistical core for an application used as a log management tool by a large company. The aim of this core is to enable the real-time classification of single logs in criticality levels, as well as anomaly detection in logging behaviors.
Standard log monitoring techniques usually rely on knowing the structure and content of logs to identify potentially relevant issues. Such analyses are often tailored to single applications.
In this study a general and automated approach is developed based on semantic tags extracted from special recurring keywords. Classi- fication is performed using Machine Learning algorithms and time anomaly detection is achieved using time series models.
For the starting analysis, a data set is constructed incorporating feedback from the client company. The logs are parsed to extract known fields and are subsequently tagged using keyword matching.
The logs are classified according to their predicted business critical- ity, and are presented to the users in the resulting criticality ordering. In particular, SVMs are used for the criticality classification of the logs. The time series analysis is based on the analysis of trend and seasonality, and application of ARIMA models in the log frequency over time.
			
	Relatrice / Relatore
	
				RUFFO, Giancarlo Francesco
			
	Modalità consultazione tesi
	
				IMPORT DA TESIONLINE
			
	Appare nelle tipologie:
	
				Corso di Laurea Magistrale

File in questo prodotto:

File	Dimensione	Formato
847495_thesis.pdf non disponibili Tipologia: Altro materiale allegato Dimensione 4.99 MB Formato Adobe PDF	4.99 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/51345