Research

Softmax ≥ Linear: Transformers may learn to classify in-context by kernel gradient descent

Sara Dragutinovic, Andrew M Saxe, Aaditya K Singh

Strategy coopetition explains the emergence and transience of in-context learning

Aaditya K Singh, Ted Moskovitz, Sara Dragutinovic, Felix Hill, Stephanie CY Chan, Andrew M Saxe