AI Safety at UCLA Intro Fellowship: Diffusion Track

Table of Contents

  1. Week 1: Preventing an AI-related catastrophe + Scaling Hypothesis
  2. Week 2: The future is going to be wild + Image Generation mathematical framework
  3. Week 3: Unsolved Problems in ML Safety, autoencoders, and KL divergence
  4. Week 4: AI Safety Field Background + Deep dive into VAEs
  5. Week 5: Failure Modes in AI, understanding VAEs mathematically
  6. Week 6: Open Problems in AI X-Risk + Diffusion Intro

Core Readings: (70 mins)

  1. Intelligence Explosion (20 min)
  2. Circuits, Distilled (50 min)

Learning Goals:

  1. Familiarize yourself with the arguments for AI being an existential risk
  2. Understand the rapid scaling of modern AI models and its implications for our interpretability methods.

Week 2: The future is going to be wild + Image Generation mathematical framework

Core Content: (125 min)

Theoretical Readings (75 min):

  1. AI and Compute (5 min)
  2. The Bitter Lesson (10 min)
  3. All Possible Views About Humanity’s Future are Wild (15 min)
  4. “This can’t go on” (25 min)
  5. Intelligence Explosion: Evidence and Import (20 min)

Practical Readings (50 min):

  1. (if unfamiliar) 3 Blue 1 Brown Neural Networks, Chapters 1 and 2 (30 min)
  2. Policy Gradient Explanation (20 min)

Learning Goals:

Theoretical

  1. Understand the relationship between compute and general capabilities.
  2. Gain experience with the types of datasets used in modern AI systems.
  3. See how AI could impact a wide range of industries.
  4. Reflect on the radical impact AI can have on the future of humanity
  5. Reflect on the strange possibilities of our economic future.
  6. Reflect on the speed with which AI will transition from powerful to superintelligence.

Practical

  1. Understand Markov Decision Processes (MDPs)
  2. Understand the intuition behind the policy gradient.

Week 3: Unsolved Problems in ML Safety, autoencoders, and KL divergence

Core Readings (30 min):

  1. Why AI alignment could be hard with modern deep learning (20 mins)
  2. Intuitively Understanding KL Divergence (10 mins)

Notebook:

  1. (Optional) Variational Autoencoder (VAE) Intuition (30+ mins): covers material a little bit ahead; read on if you’d like to dive into the mathematics!

Learning Goals:

Theoretical

  1. Understand issues with only using performance to evaluate classifiers.

Practical

  1. Establish an intuition of Kullback-Liebler (KL) divergence and what it means as a distance metric between random variables.

Week 4: AI Safety Field Background + Deep dive into VAEs

Core Readings: (105 min)

Theoretical

  1. A Bird’s Eye View of the ML Field (45 min)
  2. Paul Christiano: Current work in AI alignment (30 min)

Practical

  1. Understand the mechanics of Variational Autoencoder models before writing your own.

Learning Goals:

  1. Understand how ML research is conducted and how it affects AI safety research.
  2. Be able to evaluate if a research agenda advances general capabilities.
  3. Learn about the variety of different research approaches tackling alignment.

Week 5: Failure Modes in AI, understanding VAEs mathematically

Core Readings: (55 min)

Theoretical

  1. X-Risk Analysis for AI Research (Appendix A pg 13-14) (10 min)
  2. What Failure Looks Like (10 min)
  3. Clarifying What Failure Looks Like (25 mins)

Practical

  1. Full VAE derivation and implementation (try optimizing the model!).

Learning Goals:

  1. Be able to determine how an AI safety project may reduce X-risk.
  2. Evaluate the failure modes of misaligned AI.
  3. Understand the factors that lead to value lock-in.

Week 6: Open Problems in AI X-Risk + Diffusion Intro

Core Readings:

Theoretical

  1. Open Problems in AI X-Risk (60 min)
  2. AI Governance: Opportunity and Theory of Impact (15 min)

Practical

  1. RL Connect4 (Stage 4)

Learning Goals:

  1. Pick a research agenda you find particularly interesting (perhaps to pursue later).
  2. Understand the role AI governance plays in the broader field of AI safety.