There is a recent and growing literature investigating large-width limits of neural networks (NNs), whose weights and biases have some known distribution, typically Gaussian. In the last few years, this research direction gave remarkable results of practical interest, e.g. Bayesian inference under Gaussian SP priors, kernel regression for infinite-wide deep NNs trained via gradient descent, and information propagation within infinite-wide NNs. Motivated by empirical analysis, showing the potential of replacing the Gaussian distribution with the Stable distribution for the NN's weights, the aim of the thesis is to study the large width asymptotic of deep Stable NNs, that is deep NNs with Stable-distributed weights and biases. In this regard, a recent work characterized the infinitely wide limit of a suitably rescaled deep Stable NN in terms of a stable stochastic process, but under the strong hypothesis of sub-linearity for the activation function. The novelty of the result proposed here lies in the choice of the nonlinearity, which can be also of linear and super-linear growth, and in the fact that different activation functions require a different scaling in order to converge, which is a critical difference between the Gaussian and Stable settings. In particular, we prove that the use of a linear activation, like the famous ReLU function, in the Stable setting leads to an extra logarithmic factor in the scaling in order for the output of the infinitely wide NN to converge. ​

Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions ​

BORDINO, ALBERTO
2021/2022

Abstract

There is a recent and growing literature investigating large-width limits of neural networks (NNs), whose weights and biases have some known distribution, typically Gaussian. In the last few years, this research direction gave remarkable results of practical interest, e.g. Bayesian inference under Gaussian SP priors, kernel regression for infinite-wide deep NNs trained via gradient descent, and information propagation within infinite-wide NNs. Motivated by empirical analysis, showing the potential of replacing the Gaussian distribution with the Stable distribution for the NN's weights, the aim of the thesis is to study the large width asymptotic of deep Stable NNs, that is deep NNs with Stable-distributed weights and biases. In this regard, a recent work characterized the infinitely wide limit of a suitably rescaled deep Stable NN in terms of a stable stochastic process, but under the strong hypothesis of sub-linearity for the activation function. The novelty of the result proposed here lies in the choice of the nonlinearity, which can be also of linear and super-linear growth, and in the fact that different activation functions require a different scaling in order to converge, which is a critical difference between the Gaussian and Stable settings. In particular, we prove that the use of a linear activation, like the famous ReLU function, in the Stable setting leads to an extra logarithmic factor in the scaling in order for the output of the infinitely wide NN to converge. ​
ENG
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
856592_thesis_bordino.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 1.82 MB
Formato Adobe PDF
1.82 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/86493