Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

Jentzen A, Riekert A

Forschungsartikel (Zeitschrift) | Peer reviewed

Zusammenfassung

Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of the corresponding gradient flows (GFs). In this work we analyze GF processes in the training of ANNs with ReLU activation and three layers. In particular, in this article we prove in the case where the distribution of the input data is absolutely continuous with respect to the Lebesgue measure that the risk of every bounded GF trajectory converges to the risk of a critical point. In addition, we show in the case of a 1-dimensional affine target function and a uniform input distribution that the risk of every bounded GF trajectory converges to zero if the initial risk is sufficiently small. Finally, we show that the boundedness assumption can be removed if the hidden layer consists of only one neuron.

Details zur Publikation

Fachzeitschrift: Journal of Mathematical Analysis and Applications (J. Math. Anal. Appl.)

Jahrgang / Bandnr. / Volume: 517

Ausgabe / Heftnr. / Issue: 2

Artikelnummer: 126601

Status: Veröffentlicht

Veröffentlichungsjahr: 2023

Sprache, in der die Publikation verfasst ist: Englisch

DOI: 10.1016/j.jmaa.2022.126601

Link zum Volltext: https://www.sciencedirect.com/science/article/abs/pii/S0022247X22006151

Stichwörter: Artificial neural networks; Nonconvex optimization; Gradient flow; Nonsmooth optimization; Machine learning

Autor*innen der Universität Münster

Jentzen, Arnulf	Institut für Analysis und Numerik
Riekert, Adrian	Mathematisches Institut

Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

Zusammenfassung

Details zur Publikation

Autor*innen der Universität Münster

Kontakt

Top-Links