Approximation and Expressive Power of Large Language Models

Gabriel Peyré

Large language models process vast sequences of input tokens by alternating 
between classical multi-layer perceptron layers and self-attention mechanisms. 
While the approximation capabilities of perceptrons are relatively well 
understood, those of attention mechanisms remain less explored. 
In this talk, I will compare the proof techniques and approximation results 
associated with these two types of layers, emphasizing key open questions that 
connect large language models with approximation theory in infinite-dimensional 
spaces representing input token distributions.