AI Safety at UCLA Intro Fellowship: Diffusion Track
Table of Contents
- Week 1: Preventing an AI-related catastrophe + Scaling Hypothesis
- Week 2: The future is going to be wild + Image Generation Mathematical Framework
- Week 3: Introducing autoencoders, KL divergence, and unsolved Problems in AI Safety.
- Week 4: AI Safety Field Background + Deep dive into VAEs
- Week 5: AI Alignment Failure Modes + Mathematics behind VAEs.
- Week 6: Open Problems in AI X-Risk + Diffusion Models
Week 1: Preventing an AI-related Catastrophe + Scaling Hypothesis
Core Readings: (70 mins)
- Intelligence Explosion (20 min)
- Circuits, Distilled (50 min)
Learning Goals:
- Familiarize yourself with the arguments for AI being an existential risk
- Understand the rapid scaling of modern AI models and its implications for our interpretability methods.
Week 2: The future is going to be wild + Image Generation Mathematical Framework
Core Content: (2h 15min)
Theoretical Readings (75 min):
- AI and Compute (5 min)
- The Bitter Lesson (10 min)
- All Possible Views About Humanity’s Future are Wild (15 min)
- “This can’t go on” (25 min)
- Intelligence Explosion: Evidence and Import (20 min)
Practical Readings (60 min):
- (if unfamiliar) 3 Blue 1 Brown Neural Networks, Chapters 1 and 2 (30 min)
- DDPMs, Part 1 - Autoencoders (30 mins).
Learning Goals:
Theoretical
- Understand the relationship between growing compute and general capabilities.
- Gain experience with the types of datasets used in modern AI systems.
- See how AI could impact a wide range of industries.
- Reflect on the radical impact AI can have on the future of humanity
- Reflect on the strange possibilities of our economic future.
- Reflect on the speed with which AI will transition from powerful to superintelligence.
Practical
- Understand neural networks and deep learning.
- Understanding the intuition behind compression within autoencoders.
Week 3: Introducing autoencoders, KL divergence, and unsolved Problems in AI Safety.
Core Content: (1h 15 min)
Theoretical Readings (30 min):
- Why AI alignment could be hard with modern deep learning (20 mins)
- Intuitively Understanding KL Divergence (10 mins)
Notebook Exercises (45 mins):
- (Optional) Variational Autoencoder (VAE) Intuition (45 mins): covers material a little bit ahead; read on if you’d like to dive into the mathematics!
Learning Goals:
Theoretical
- Understand issues with only using performance to evaluate classifiers.
Practical
- Establish an intuition of Kullback-Liebler (KL) divergence and what it means as a distance metric between random variables.
Week 4: AI Safety Field Background + Deep dive into VAEs
Core Readings: (1h 45 min)
Theoretical Readings (1h 15 mins):
Notebook Exercises (30 mins):
- DDPM_part_2_vae.ipynb (30 min).
Learning Goals:
- Understand how ML research is conducted and how it affects AI safety research.
- Be able to evaluate if a research agenda advances general capabilities.
- Learn about the variety of different research approaches tackling alignment.
- Understand the mechanics of Variational Autoencoder models and practice writing your own.
Week 5: AI Alignment Failure Modes + Mathematics behind VAEs
Core Readings: (1h 30 min)
Theoretical Readings (45 mins):
- X-Risk Analysis for AI Research (Appendix A pg 13-14) (10 min)
- What Failure Looks Like (10 min)
- Clarifying What Failure Looks Like (25 mins)
Notebook Exercises (45 mins):
- Diffusion, distilled (45 mins).
Learning Goals:
- Be able to determine how an AI safety project may reduce X-risk.
- Evaluate the failure modes of misaligned AI.
- Understand the factors that lead to value lock-in.
Week 6: Open Problems in AI X-Risk + Diffusion Models
Core Readings: (2h)
Theoretical Readings: (1h 15 mins)
- Open Problems in AI X-Risk (60 min)
- AI Governance: Opportunity and Theory of Impact (15 min)
Notebook Exercises (45 mins):
- Full DDPM notebook - complete all exercises (45 mins).
Learning Goals:
- Pick a research agenda you find particularly interesting (perhaps to pursue later).
- Understand the role AI governance plays in the broader field of AI safety.
Final Reflection
If you were to pursue a research question/topic in AI safety, what would it be? What area of AI safety do you find most interesting? What area of AI safety do you find most promising?
Note: The DDPM notebook is incomplete (missing its training loop). Can you write it?