![]() |
Universität Augsburg
|
![]() |
Professor Dr. Leon Bungert
Universität Würzburg
spricht am
Mittwoch, 15. Juli 2026
um
16:00 Uhr
im
Raum 2004 (L1)
über das Thema:
| Abstract: |
| In this talk I will speak about concentration phenomena of self-attention transformers in the regimes of infinitely many layers and tokens. The dynamics are described by the Fokker–Planck equation ∂tρβ t (x) = −div ρβ t (x)PxV mβ[ρβ t ](x) , (t, x) ∈ [0, T] × Sd−1, (1) where Sd−1 := {x ∈ Rd : |x| = 1} is the sphere in Rd, T > 0 is a time horizon, Px : Rd → Rd, y 7→ y − ⟨x, y⟩x is the projection onto TxSd−1, and mβ[ρβ t ](x) := R Sd−1 eβ⟨By,x⟩y dρβ t (y) R Sd−1 eβ⟨By,x⟩ dρβ t (y) (2) involves the inverse heat parameter β > 0. The matrices V,B ∈ Rd×d contain learned parameters and are assumed to be constant in time. It is known that for β → ∞ solutions of (1) converge to solutions of a linear PDE, the solutions of which concentrate as T → ∞ on the dominating eigendirections of the matrix V B⊤. In our work we will quantify these results by exploiting a striking similarity between (1) and the so-called polarized consensus-based optimization (CBO) method for global optimization. Using a CBO-inspired analysis we give explicit bounds for theWasserstein-2 distance of the solution of (1) and a suitable target measure. The proof relies on an application of a quantitative Laplace principle to (2) as well as a Lyapunov-type analysis for the time asymptotics. Our result sheds more light on the interior dynamics of self-attention transformers and might help identify reduced effective models. This is joint work with Albert Alcalde, Konstantin Riedl, and Tim Roith. |
| Hierzu ergeht herzliche Einladung. |
| Prof. Dr. Jan-Frederik Pietschmann |
Kaffee, Tee und Gebäck eine halbe Stunde vor Vortragsbeginn im Raum 2006 (L1).