AI Safety at UCLA Intro Fellowship: Reinforcement Learning Track

Table of Contents

  1. Week 1: Preventing an AI-related catastrophe
  2. Week 2: The future is going to be wild
  3. Week 3: Why AI Safety?
  4. Week 4: AI Safety Field Background
  5. Week 5: Failure Modes in AI
  6. Week 6: Open Problems in AI X-Risk

Core Readings (150 mins)

  1. Intelligence Explosion (20 min)
  2. AlphaGo - The Movie (1hr 30 min)

Learning Goals:

  1. Familiarize yourself with the arguments for AI being an existential risk

  2. Understand why RL enables superhuman performance

Week 2: The future is going to be wild + Policy Gradient

Core Readings: (85 min)

Theoretical

  1. AI and Compute (5 min)
  2. The Bitter Lesson (10 min)
  3. All Possible Views About Humanity’s Future are Wild (15 min)
  4. “This can’t go on” (25 min)

Practical

  1. (if unfamiliar) Neural Networks, Chapters 1 and 2 (30 min)
  2. Policy Gradient Explanation (20 min)

Learning Goals:

Theoretical

  1. Understand the relationship between compute and general capabilities.
  2. Gain experience with the types of datasets used in modern AI systems.
  3. See how AI could impact a wide range of industries.
  4. Reflect on the radical impact AI can have on the future of humanity
  5. Reflect on the strange possibilities of our economic future.
  6. Reflect on the speed with which AI will transition from powerful to superintelligence.

Practical

  1. Understand Markov Decision Processes (MDPs)
  2. Understand the intuition behind the policy gradient.

Week 3: Pytorch Intro + Unsolved Problems in ML Safety

Core Readings:

  1. Why AI alignment could be hard with modern deep learning (20 mins)
  2. Policy Gradient Discrete Exercise (60 min)
  3. Policy Gradient Continuous Exercise (60 min)

Learning Goals:

Theoretical

  • Understand issues with only using performance to evaluate classifiers.

Practical

  • Implement both the discrete and continuous versions of the Policy Gradient.

Week 4: AI Safety Field Background

Core Readings: (105 min)

Theoretical

  1. A Bird’s Eye View of the ML Field (45 min)
  2. Paul Christiano: Current work in AI alignment (30 min)

Practical

  1. (Stage 1 and 2): Connect4

Learning Goals:

  1. Understand how ML research is conducted and how it affects AI safety research.
  2. Be able to evaluate if a research agenda advances general capabilities.
  3. Learn about the variety of different research approaches tackling alignment.

Week 5: Failure Modes in AI

Core Readings: (55 min)

Theoretical

  1. X-Risk Analysis for AI Research (Appendix A pg 13-14) (10 min)
  2. What Failure Looks Like (10 min)**
  3. Clarifying What Failure Looks Like (25 mins)**

Practical

  1. (Stage 3): Connect4

Learning Goals:

  1. Be able to determine how an AI safety project may reduce X-risk.
  2. Evaluate the failure modes of misaligned AI.
  3. Understand the factors that lead to value lock-in.

Week 6: Open Problems in AI X-Risk

Core Readings:

Theoretical

  1. Open Problems in AI X-Risk (60 min)
  2. AI Governance: Opportunity and Theory of Impact (15 min)

Practical

Learning Goals:

  1. Pick a research agenda you find particularly interesting (perhaps to pursue later).
  2. Understand the role AI governance plays in the broader field of AI safety.

Before next meeting, think (or write down) your answers to these questions:

  1. If you were to pursue a research question/topic in AI safety, what would it be?
  2. What area of AI safety do you find most interesting? What area of AI safety do you find most promising?