Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

Jentzen A, Riekert A

Research article (journal) | Peer reviewed

Abstract

Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of the corresponding gradient flows (GFs). In this work we analyze GF processes in the training of ANNs with ReLU activation and three layers. In particular, in this article we prove in the case where the distribution of the input data is absolutely continuous with respect to the Lebesgue measure that the risk of every bounded GF trajectory converges to the risk of a critical point. In addition, we show in the case of a 1-dimensional affine target function and a uniform input distribution that the risk of every bounded GF trajectory converges to zero if the initial risk is sufficiently small. Finally, we show that the boundedness assumption can be removed if the hidden layer consists of only one neuron.

Details about the publication

Journal: Journal of Mathematical Analysis and Applications (J. Math. Anal. Appl.)

Volume: 517

Issue: 2

Article number: 126601

Status: Published

Release year: 2023

Language in which the publication is written: English

DOI: 10.1016/j.jmaa.2022.126601

Link to the full text: https://www.sciencedirect.com/science/article/abs/pii/S0022247X22006151

Keywords: Artificial neural networks; Nonconvex optimization; Gradient flow; Nonsmooth optimization; Machine learning

Authors from the University of Münster

Jentzen, Arnulf	Institute for Analysis and Numerics
Riekert, Adrian	Mathematical Institute

Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

Abstract

Details about the publication

Authors from the University of Münster

Contact

Top-Links