Machine Learning has gained increasing importance in the last decade: class prediction in particular is critical for many real-world applications. Fraud analysis, cancer prediction and other more are remarkable examples of imbalanced problems: one class of the dataset under consideration contains too few istances. The usual Machine Learning algorithms, like neural networks, require a minimum amount of data to effectively learn the different classes:in an imbalanced situation, failure in minority class' prediction is observed. In this thesis an algorithm for balancing minority class' size was developed (oversampling): the aim was to improve performances of a classifier. Generative Adversarial Network, a neural network for image generation, was used to artificially generate minority class' data. GAN was properly modified depending on the particular dataset's features. The overall framework was tested on MNIST Dataset for handwritten digits and various real-worlddatasets, from seismic bumps to cancer detection. In the end, a comparison with SMOTE package for imbalanced problems is provided.

GAN oversampling: applicazione di una Generative Adversarial Network a dataset sbilanciati.

FANCHIN, FRANCESCO
2017/2018

Abstract

Machine Learning has gained increasing importance in the last decade: class prediction in particular is critical for many real-world applications. Fraud analysis, cancer prediction and other more are remarkable examples of imbalanced problems: one class of the dataset under consideration contains too few istances. The usual Machine Learning algorithms, like neural networks, require a minimum amount of data to effectively learn the different classes:in an imbalanced situation, failure in minority class' prediction is observed. In this thesis an algorithm for balancing minority class' size was developed (oversampling): the aim was to improve performances of a classifier. Generative Adversarial Network, a neural network for image generation, was used to artificially generate minority class' data. GAN was properly modified depending on the particular dataset's features. The overall framework was tested on MNIST Dataset for handwritten digits and various real-worlddatasets, from seismic bumps to cancer detection. In the end, a comparison with SMOTE package for imbalanced problems is provided.
ENG
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
848585_francescofanchintesilmfsc.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 3.05 MB
Formato Adobe PDF
3.05 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/50558