AI Safety at UCLA Intro Fellowship: Diffusion Track

Week 1: Preventing an AI-related catastrophe + Scaling Hypothesis
Week 2: The future is going to be wild + Image Generation Mathematical Framework
Week 3: Introducing autoencoders, KL divergence, and unsolved Problems in AI Safety.
Week 4: AI Safety Field Background + Deep dive into VAEs
Week 5: AI Alignment Failure Modes + Mathematics behind VAEs.
Week 6: Open Problems in AI X-Risk + Diffusion Models

Core Readings: (70 mins)

Intelligence Explosion (20 min)
Circuits, Distilled (50 min)

Learning Goals:

Familiarize yourself with the arguments for AI being an existential risk
Understand the rapid scaling of modern AI models and its implications for our interpretability methods.

Week 2: The future is going to be wild + Image Generation Mathematical Framework

Core Content: (2h 15min)

Theoretical Readings (75 min):

AI and Compute (5 min)
The Bitter Lesson (10 min)
All Possible Views About Humanity’s Future are Wild (15 min)
“This can’t go on” (25 min)
Intelligence Explosion: Evidence and Import (20 min)

Practical Readings (60 min):

(if unfamiliar) 3 Blue 1 Brown Neural Networks, Chapters 1 and 2 (30 min)
DDPMs, Part 1 - Autoencoders (30 mins).

Learning Goals:

Theoretical

Understand the relationship between growing compute and general capabilities.
Gain experience with the types of datasets used in modern AI systems.
See how AI could impact a wide range of industries.
Reflect on the radical impact AI can have on the future of humanity
Reflect on the strange possibilities of our economic future.
Reflect on the speed with which AI will transition from powerful to superintelligence.

Practical

Understand neural networks and deep learning.
Understanding the intuition behind compression within autoencoders.

Week 3: Introducing autoencoders, KL divergence, and unsolved Problems in AI Safety.

Core Content: (1h 15 min)

Theoretical Readings (30 min):

Why AI alignment could be hard with modern deep learning (20 mins)
Intuitively Understanding KL Divergence (10 mins)

Notebook Exercises (45 mins):

(Optional) Variational Autoencoder (VAE) Intuition (45 mins): covers material a little bit ahead; read on if you’d like to dive into the mathematics!

Learning Goals:

Theoretical

Understand issues with only using performance to evaluate classifiers.

Practical

Establish an intuition of Kullback-Liebler (KL) divergence and what it means as a distance metric between random variables.

Week 4: AI Safety Field Background + Deep dive into VAEs

Core Readings: (1h 45 min)

Theoretical Readings (1h 15 mins):

A Bird’s Eye View of the ML Field (45 min)
Paul Christiano: Current work in AI alignment (30 min)

Notebook Exercises (30 mins):

DDPM_part_2_vae.ipynb (30 min).

Learning Goals:

Understand how ML research is conducted and how it affects AI safety research.
Be able to evaluate if a research agenda advances general capabilities.
Learn about the variety of different research approaches tackling alignment.
Understand the mechanics of Variational Autoencoder models and practice writing your own.

Week 5: AI Alignment Failure Modes + Mathematics behind VAEs

Core Readings: (1h 30 min)

Theoretical Readings (45 mins):

X-Risk Analysis for AI Research (Appendix A pg 13-14) (10 min)
What Failure Looks Like (10 min)
Clarifying What Failure Looks Like (25 mins)

Notebook Exercises (45 mins):

Diffusion, distilled (45 mins).

Learning Goals:

Be able to determine how an AI safety project may reduce X-risk.
Evaluate the failure modes of misaligned AI.
Understand the factors that lead to value lock-in.

Week 6: Open Problems in AI X-Risk + Diffusion Models

Core Readings: (2h)

Theoretical Readings: (1h 15 mins)

Open Problems in AI X-Risk (60 min)
AI Governance: Opportunity and Theory of Impact (15 min)

Notebook Exercises (45 mins):

Full DDPM notebook - complete all exercises (45 mins).

Learning Goals:

Pick a research agenda you find particularly interesting (perhaps to pursue later).
Understand the role AI governance plays in the broader field of AI safety.

Final Reflection

If you were to pursue a research question/topic in AI safety, what would it be? What area of AI safety do you find most interesting? What area of AI safety do you find most promising?

Note: The DDPM notebook is incomplete (missing its training loop). Can you write it?

AI Safety at UCLA Intro Fellowship: Diffusion Track

Table of Contents

Week 1: Preventing an AI-related Catastrophe + Scaling Hypothesis

Week 2: The future is going to be wild + Image Generation Mathematical Framework

Week 3: Introducing autoencoders, KL divergence, and unsolved Problems in AI Safety.

Week 4: AI Safety Field Background + Deep dive into VAEs

Week 5: AI Alignment Failure Modes + Mathematics behind VAEs

Week 6: Open Problems in AI X-Risk + Diffusion Models

Final Reflection