This year, ALT returns to San Diego to be co-located with the Workshop in Information Theory and Applications (ITA) With Vidya Muthukumar, and in collaboration with Alon Orlitsky, we are organising a one-day symposium, ITALT, open to all participants from ALT and ITALT, on Saturday February 24th. The symposium will feature tutorials, a professional development panel, and social/mentoring activities to bridge the two communities of learning theory and information theory.
Venue: Bahia Resort, San Diego, CA (note: different location from ALT 2024, which will be held on UCSD campus)
Registration (via Google Forms): https://forms.gle/tkWoVeXbs5myz4gx7
Tentative schedule (all times in PT)
|Welcome remarks (Vidya Muthukumar)
|Tutorial (Ankur Moitra)
Title: Algorithmic Aspects of Reinforcement Learning
Abstract: In this survey I will give an algorithmic perspective on modern reinforcement learning. Much of the modern theory aims at finding models where we can prove good bounds on the sample complexity, but is built on top of computationally intractable oracles. What happens when we want end-to-end algorithmic guarantees, and what does it tell us about what are the right models to study in the first place? I will tell you about how trying to answer these questions leads to new twists on topics in information theory, regression and sparsification.
|Boxed lunches and walk on the beach 🙂
|Invited talk (Yuanzhi Li)
Title: Physics of Language Models: Knowledge Storage, Extraction, and Manipulation
Abstract: Large language models (LLMs) can memorize a massive amount of knowledge during pre-training, but can they effectively use this knowledge at inference time? In this work, we show several striking results about this question. Using a synthetic biography dataset, we first show that even if an LLM achieves zero training loss when pretraining on the biography dataset, it sometimes can not be finetuned to answer questions as simple as “What is the birthday of XXX” at all. We show that sufficient data augmentation during pre-training, such as rewriting the same biography multiple times or simply using the person’s full name in every sentence, can mitigate this issue. Using linear probing, we unravel that such augmentation forces the model to store knowledge about a person in the token embeddings of their name rather than other locations.
We then show that LLMs are very bad at manipulating knowledge they learn during pre-training unless a chain of thought is used at inference time. We pretrained an LLM on the synthetic biography dataset, so that it could answer “What is the birthday of XXX” with 100% accuracy. Even so, it could not be further fine-tuned to answer questions like “Is the birthday of XXX even or odd?” directly. Even using Chain of Thought training data only helps the model answer such questions in a CoT manner, not directly.
We will also discuss preliminary progress on understanding the scaling law of how large a language model needs to be to store X pieces of knowledge and extract them efficiently. For example, is a 1B parameter language model enough to store all the knowledge of a middle school student?
|Social break (including board games)
|Professional development panel (organized by WiML-T and Let-All)
Panel topic: Navigating academic careers across learning and information theory
Panelists: Kamalika Chaudhuri, Tara Javidi and Rashmi Vinayak
|5:30 pm onwards
|Mentorship roundtables (organized by Let-All) and happy hour
Mentors: Vidya Muthukumar, Daniel Hsu, Alon Orlitsky, Ankur Moitra, and more…