Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

Jentzen A, Riekert A

Forschungsartikel (Zeitschrift) | Peer reviewed

Zusammenfassung

Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of the corresponding gradient flows (GFs). In this work we analyze GF processes in the training of ANNs with ReLU activation and three layers. In particular, in this article we prove in the case where the distribution of the input data is absolutely continuous with respect to the Lebesgue measure that the risk of every bounded GF trajectory converges to the risk of a critical point. In addition, we show in the case of a 1-dimensional affine target function and a uniform input distribution that the risk of every bounded GF trajectory converges to zero if the initial risk is sufficiently small. Finally, we show that the boundedness assumption can be removed if the hidden layer consists of only one neuron.

Details zur Publikation

FachzeitschriftJournal of Mathematical Analysis and Applications (J. Math. Anal. Appl.)
Jahrgang / Bandnr. / Volume517
Ausgabe / Heftnr. / Issue2
Artikelnummer126601
StatusVeröffentlicht
Veröffentlichungsjahr2023
Sprache, in der die Publikation verfasst istEnglisch
DOI10.1016/j.jmaa.2022.126601
Link zum Volltexthttps://www.sciencedirect.com/science/article/abs/pii/S0022247X22006151
StichwörterArtificial neural networks; Nonconvex optimization; Gradient flow; Nonsmooth optimization; Machine learning

Autor*innen der Universität Münster

Jentzen, Arnulf
Institut für Analysis und Numerik
Riekert, Adrian
Mathematisches Institut