The analysis of large-width Neural Networks is among the main research themes in modern Deep Learning. Recent studies showed that the output of a Gaussian network trained via gradient flow follows a specific dynamics characterized by a kernel. When the widths of the layers are allowed to go to infinity, such random kernel at initialization converges in probability to a deterministic kernel and is constant during training. As a consequence, a sufficiently wide Gaussian network behaves as its linearization around the initialization, and is thereby equivalent to a kernel regression predictor. Essentially, a Neural Network, which is highly non-linear, boils down to a linear model. This phenomenon is known as lazy training and does not exclusively concern Neural Networks, but any parametric model. Chizat et al. establish a sufficient condition to guarantee the lazy behaviour for a general parametric model. On top of this result, the goal is to study in more detail the lazy training phenomenon in wide Neural Networks. The choice of scaling plays a crucial role in determining whether lazy training is achieved. In particular, under some appropriate conditions on the initialization of the weights and the activation function, if the scaling factor is sufficiently large, then the wide Neural Network exhibits a lazy behaviour.
On the Lazy Training phenomenon in wide Neural Networks
FARINA, REBECCA
2021/2022
Abstract
The analysis of large-width Neural Networks is among the main research themes in modern Deep Learning. Recent studies showed that the output of a Gaussian network trained via gradient flow follows a specific dynamics characterized by a kernel. When the widths of the layers are allowed to go to infinity, such random kernel at initialization converges in probability to a deterministic kernel and is constant during training. As a consequence, a sufficiently wide Gaussian network behaves as its linearization around the initialization, and is thereby equivalent to a kernel regression predictor. Essentially, a Neural Network, which is highly non-linear, boils down to a linear model. This phenomenon is known as lazy training and does not exclusively concern Neural Networks, but any parametric model. Chizat et al. establish a sufficient condition to guarantee the lazy behaviour for a general parametric model. On top of this result, the goal is to study in more detail the lazy training phenomenon in wide Neural Networks. The choice of scaling plays a crucial role in determining whether lazy training is achieved. In particular, under some appropriate conditions on the initialization of the weights and the activation function, if the scaling factor is sufficiently large, then the wide Neural Network exhibits a lazy behaviour.File | Dimensione | Formato | |
---|---|---|---|
858542_rebeccafarinathesis.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
1.74 MB
Formato
Adobe PDF
|
1.74 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/79558