AI Safety at UCLA Intro Fellowship: Reinforcement Learning Track

Table of Contents

Week 1: Preventing an AI-related catastrophe
Week 2: The future is going to be wild
Week 3: Why AI Safety?
Week 4: AI Safety Field Background
Week 5: Failure Modes in AI
Week 6: Open Problems in AI X-Risk

Core Readings (150 mins)

Intelligence Explosion (20 min)
AlphaGo - The Movie (1hr 30 min)

Learning Goals:

Familiarize yourself with the arguments for AI being an existential risk
Understand why RL enables superhuman performance

Week 2: The future is going to be wild + Policy Gradient

Core Readings: (85 min)

Theoretical

AI and Compute (5 min)
The Bitter Lesson (10 min)
All Possible Views About Humanity’s Future are Wild (15 min)
“This can’t go on” (25 min)

Practical

(if unfamiliar) Neural Networks, Chapters 1 and 2 (30 min)
Policy Gradient Explanation (20 min)

Learning Goals:

Theoretical

Understand the relationship between compute and general capabilities.
Gain experience with the types of datasets used in modern AI systems.
See how AI could impact a wide range of industries.
Reflect on the radical impact AI can have on the future of humanity
Reflect on the strange possibilities of our economic future.
Reflect on the speed with which AI will transition from powerful to superintelligence.

Practical

Understand Markov Decision Processes (MDPs)
Understand the intuition behind the policy gradient.

Week 3: Pytorch Intro + Unsolved Problems in ML Safety

Core Readings:

Why AI alignment could be hard with modern deep learning (20 mins)
Policy Gradient Discrete Exercise (60 min)
Policy Gradient Continuous Exercise (60 min)

Learning Goals:

Theoretical

Understand issues with only using performance to evaluate classifiers.

Practical

Implement both the discrete and continuous versions of the Policy Gradient.

Week 4: AI Safety Field Background

Core Readings: (105 min)

Theoretical

A Bird’s Eye View of the ML Field (45 min)
Paul Christiano: Current work in AI alignment (30 min)

Practical

(Stage 1 and 2): Connect4

Learning Goals:

Understand how ML research is conducted and how it affects AI safety research.
Be able to evaluate if a research agenda advances general capabilities.
Learn about the variety of different research approaches tackling alignment.

Week 5: Failure Modes in AI

Core Readings: (55 min)

Theoretical

X-Risk Analysis for AI Research (Appendix A pg 13-14) (10 min)
What Failure Looks Like (10 min)**
Clarifying What Failure Looks Like (25 mins)**

Practical

(Stage 3): Connect4

Learning Goals:

Be able to determine how an AI safety project may reduce X-risk.
Evaluate the failure modes of misaligned AI.
Understand the factors that lead to value lock-in.

Week 6: Open Problems in AI X-Risk

Core Readings:

Theoretical

Open Problems in AI X-Risk (60 min)
AI Governance: Opportunity and Theory of Impact (15 min)

Practical

PPO PPO Notebook

Learning Goals:

Pick a research agenda you find particularly interesting (perhaps to pursue later).
Understand the role AI governance plays in the broader field of AI safety.

Before next meeting, think (or write down) your answers to these questions:

If you were to pursue a research question/topic in AI safety, what would it be?
What area of AI safety do you find most interesting? What area of AI safety do you find most promising?