AI Safety at UCLA Intro Fellowship: Diffusion Track
Table of Contents
- Week 1: Preventing an AI-related catastrophe + Scaling Hypothesis
- Week 2: The future is going to be wild + Image Generation mathematical framework
- Week 3: Unsolved Problems in ML Safety, autoencoders, and KL divergence
- Week 4: AI Safety Field Background + Deep dive into VAEs
- Week 5: Failure Modes in AI, understanding VAEs mathematically
- Week 6: Open Problems in AI X-Risk + Diffusion Intro
Week 1: Preventing an AI-related catastrophe + Scaling Hypothesis
Core Readings: (70 mins)
- Intelligence Explosion (20 min)
- Circuits, Distilled (50 min)
Learning Goals:
- Familiarize yourself with the arguments for AI being an existential risk
- Understand the rapid scaling of modern AI models and its implications for our interpretability methods.
Week 2: The future is going to be wild + Image Generation mathematical framework
Core Content: (125 min)
Theoretical Readings (75 min):
- AI and Compute (5 min)
- The Bitter Lesson (10 min)
- All Possible Views About Humanity’s Future are Wild (15 min)
- “This can’t go on” (25 min)
- Intelligence Explosion: Evidence and Import (20 min)
Practical Readings (50 min):
- (if unfamiliar) 3 Blue 1 Brown Neural Networks, Chapters 1 and 2 (30 min)
- Policy Gradient Explanation (20 min)
Learning Goals:
Theoretical
- Understand the relationship between compute and general capabilities.
- Gain experience with the types of datasets used in modern AI systems.
- See how AI could impact a wide range of industries.
- Reflect on the radical impact AI can have on the future of humanity
- Reflect on the strange possibilities of our economic future.
- Reflect on the speed with which AI will transition from powerful to superintelligence.
Practical
- Understand Markov Decision Processes (MDPs)
- Understand the intuition behind the policy gradient.
Week 3: Unsolved Problems in ML Safety, autoencoders, and KL divergence
Core Readings (30 min):
- Why AI alignment could be hard with modern deep learning (20 mins)
- Intuitively Understanding KL Divergence (10 mins)
Notebook:
- (Optional) Variational Autoencoder (VAE) Intuition (30+ mins): covers material a little bit ahead; read on if you’d like to dive into the mathematics!
Learning Goals:
Theoretical
- Understand issues with only using performance to evaluate classifiers.
Practical
- Establish an intuition of Kullback-Liebler (KL) divergence and what it means as a distance metric between random variables.
Week 4: AI Safety Field Background + Deep dive into VAEs
Core Readings: (105 min)
Theoretical
Practical
- Understand the mechanics of Variational Autoencoder models before writing your own.
Learning Goals:
- Understand how ML research is conducted and how it affects AI safety research.
- Be able to evaluate if a research agenda advances general capabilities.
- Learn about the variety of different research approaches tackling alignment.
Week 5: Failure Modes in AI, understanding VAEs mathematically
Core Readings: (55 min)
Theoretical
- X-Risk Analysis for AI Research (Appendix A pg 13-14) (10 min)
- What Failure Looks Like (10 min)
- Clarifying What Failure Looks Like (25 mins)
Practical
- Full VAE derivation and implementation (try optimizing the model!).
Learning Goals:
- Be able to determine how an AI safety project may reduce X-risk.
- Evaluate the failure modes of misaligned AI.
- Understand the factors that lead to value lock-in.
Week 6: Open Problems in AI X-Risk + Diffusion Intro
Core Readings:
Theoretical
- Open Problems in AI X-Risk (60 min)
- AI Governance: Opportunity and Theory of Impact (15 min)
Practical
- RL Connect4 (Stage 4)
Learning Goals:
- Pick a research agenda you find particularly interesting (perhaps to pursue later).
- Understand the role AI governance plays in the broader field of AI safety.