Course: STAT 946 — Adv Topics: Mathematical Foundations of Deep Learning
Blurb:
The goal of this course is to introduce and explore some of the recent theoretical advances that
aim to understand modern deep learning methods and training regimes.
Topics may include: Universal approximations, Uniform convergence, Benign overparamterization, functional limit theory/scaling limits,
NTK, comparison to kernel methods, Training dynamics and related phenomenology (Implicit regularization, Training regimes, sample complexity,
Mean-Field/hydrodynamic limits, DMFT/Effective Dynamics), Loss landscapes and geometry, transformers, diffusion models.
There will be a heavy focus on the analysis of training dynamics (sample complexity and scaling limits).
The course will be interspersed with primers on important tools and techniques from probability theory such as
concentration of measure, random matrix theory, stochastic analysis, and related ideas inspired by statistical physics.
This will be a fast-paced, research-level, seminar-style course. We will be learning as a group.
Students will play an active role in the course: as the course progresses students will pick up and explore these
or other topics to catch us all up!
The following is tentative and comments/suggested reading are welcome!
Lectures
Week 1: Approximation theory
Week 2: Uniform convergence
- Lec 2: Generalization and Rademacher Complexity
- Lec 3: Vacuous bounds and the need for a new approach
- Reading:
Week 3: Implicit regularization and benign overparamterization
- Lec 4: Implicit Regularization
- Lec 5: Interpolation does not imply poor generalzation
- Reading:
Week 4: RMT and double descent
- Lec 6: Random Matrix Theory: a primer
- Lec 7: High-dimensional ridgeless least squares
- Reading:
Week 5: NTK and Lazy training
- Lec 8: The Neural Tangent Kernel
- Lec 9: Lazy training
- Reading:
Week 6: Infinite Width limits
- Lec 10: Guest Lecture
- Lec 11: Neural networks as interacting particle systems
- Reading:
Week 7: Reading Week
Week 8: Sample complexity and scaling limits
- Lec 12: Sample complexity and the information exponent
- Lec 13: Effective dynamics
- Reading:
Week 9: Spectral alignment and Transformers
- Lec 14: Spectral alignment
- Lec 15: Transformers
- Reading:
Week 10: CANCELLED
Week 11: Diffusion Models + Student presentations
- Lec 16: Diffusion models
- Presentation 1: Valentio, Learning GMMs using DDPM
- Reading:
Week 12: Student Presentations
- Presentation 2 Theo, Leap Complexity
- Presentation 3 Varnan, Benefits of Reuse
- Reading:
Week 13: Student Presentations
- Presentation 4 Parsa, Scaling limits of GLMs
- Presentation 5 Juju, Neural Covariance
- Reading:
Week 14: Student Presentations
- Presentation 6: Sammy, Edge of Stability
- Reading: