![]() |
Universität Augsburg
|
![]() |
François Charton
FAIR, Meta
spricht am
Montag, 25. August 2025
um
16:00 Uhr
im
Building K
über das Thema:
Abstract: |
It is generally understood that transformers struggle to learn arithmetic functions (even integer multiplication), models learn shortcuts, fail to generalize etc. I investigate a complex arithmetic function, predicting distant terms in the Collatz sequence, and show that transformers can learn it to very high accuracy, but incrementally solving the problem for calsses of inputs characterized by their binary representation. This learning pattern is independent of the base used for tokenization. An analysis of model errors unveils a hierarchy of error cases, suggesting that, in latent space, all models are very close to learning the Collatz sequence, no matter the base.
For further information please have a look on our website. |
Hierzu ergeht herzliche Einladung. |
Kai Cieliebak, Milan Zerbin |