Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

Jentzen A, Riekert A

Research article (journal) | Peer reviewed

Abstract

Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of the corresponding gradient flows (GFs). In this work we analyze GF processes in the training of ANNs with ReLU activation and three layers. In particular, in this article we prove in the case where the distribution of the input data is absolutely continuous with respect to the Lebesgue measure that the risk of every bounded GF trajectory converges to the risk of a critical point. In addition, we show in the case of a 1-dimensional affine target function and a uniform input distribution that the risk of every bounded GF trajectory converges to zero if the initial risk is sufficiently small. Finally, we show that the boundedness assumption can be removed if the hidden layer consists of only one neuron.

Details about the publication

JournalJournal of Mathematical Analysis and Applications (J. Math. Anal. Appl.)
Volume517
Issue2
Article number126601
StatusPublished
Release year2023
Language in which the publication is writtenEnglish
DOI10.1016/j.jmaa.2022.126601
Link to the full texthttps://www.sciencedirect.com/science/article/abs/pii/S0022247X22006151
KeywordsArtificial neural networks; Nonconvex optimization; Gradient flow; Nonsmooth optimization; Machine learning

Authors from the University of Münster

Jentzen, Arnulf
Institute for Analysis and Numerics
Riekert, Adrian
Mathematical Institute