Machine Learning has gained increasing importance in the last decade: class prediction in particular is critical for many real-world applications. Fraud analysis, cancer prediction and other more are remarkable examples of imbalanced problems: one class of the dataset under consideration contains too few istances. The usual Machine Learning algorithms, like neural networks, require a minimum amount of data to effectively learn the different classes:in an imbalanced situation, failure in minority class' prediction is observed. In this thesis an algorithm for balancing minority class' size was developed (oversampling): the aim was to improve performances of a classifier. Generative Adversarial Network, a neural network for image generation, was used to artificially generate minority class' data. GAN was properly modified depending on the particular dataset's features. The overall framework was tested on MNIST Dataset for handwritten digits and various real-worlddatasets, from seismic bumps to cancer detection. In the end, a comparison with SMOTE package for imbalanced problems is provided.
GAN oversampling: applicazione di una Generative Adversarial Network a dataset sbilanciati.
FANCHIN, FRANCESCO
2017/2018
Abstract
Machine Learning has gained increasing importance in the last decade: class prediction in particular is critical for many real-world applications. Fraud analysis, cancer prediction and other more are remarkable examples of imbalanced problems: one class of the dataset under consideration contains too few istances. The usual Machine Learning algorithms, like neural networks, require a minimum amount of data to effectively learn the different classes:in an imbalanced situation, failure in minority class' prediction is observed. In this thesis an algorithm for balancing minority class' size was developed (oversampling): the aim was to improve performances of a classifier. Generative Adversarial Network, a neural network for image generation, was used to artificially generate minority class' data. GAN was properly modified depending on the particular dataset's features. The overall framework was tested on MNIST Dataset for handwritten digits and various real-worlddatasets, from seismic bumps to cancer detection. In the end, a comparison with SMOTE package for imbalanced problems is provided.File | Dimensione | Formato | |
---|---|---|---|
848585_francescofanchintesilmfsc.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
3.05 MB
Formato
Adobe PDF
|
3.05 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/50558