AI Safety at UCLA Intro Fellowship: Transformers Track

Week 1: Preventing an AI-related catastrophe
Week 2: The future is going to be wild
Week 3: AI Safety Field Background
Week 4: Unsolved Problems in ML Safety
Week 5: Failure Modes in AI
Week 6: Open Problems in AI X-Risk

Core Content (< 200 mins):

Preventing an AI-related catastrophe (120 min)
Review the neural network architecture, forward & backward propagation, and weights & biases (Neural Networks Chapters 1-5, 3Blue1Brown)

Optional Additional Practice (< 180 mins):

Learning Goals:

Recognize the arguments for AI being an existential risk
Understand the neural network architecture and how models learn
Apply the PyTorch API to implement backpropagation

Week 2:The future is going to be wild + The Bigram Model

The progress of AI has been quite fast, AI today is quite capable, and AI has been very useful in solving problems that other methods cannot solve.

Core Content: (< 200 min)

Conceptual Readings (60 mins):

AI and Compute (5 min)
The Bitter Lesson (10 min)
All Possible Views About Humanity’s Future are Wild (15 min)
This can’t go on (20 min)
Intelligence Explosion: Evidence and Import (20 min)

Practical (< 150 mins)

The Bigram Model (Andrej Karpathy’s Zero-to-Hero) (120 min)
You may need to watch part 1 to understand the MLP task (75 min)

Additional Optional Content (< 50 mins)

CS M146 - Generative AI by Prof. Aditya Grover (50 min)

Learning Goals:

Recognize the relationship between compute and capabilities.
Recognize the radical impact AI can have on the future of humanity
Understand the speed with which AI will transition from powerful to superintelligence.
Understand probability distributions in generative language models
Recognize trainable parameters/tasks in language generation at a small scale

Week 3: Unsolved Problems in ML Safety + Positional Encoding

Core Content (< 150 mins):

Conceptual (< 80 mins)

Unsolved Problems in ML Safety (60 min)
Why AI alignment could be hard with modern deep learning (20 mins)

Technical Readings (< 60 mins):

Intro to Positional Encoding (30 min)
Code Emporium’s Positional Encoding Video (10 min)
PyTorch’s Transformer Example (30 min)

Learning Goals:

Recognize the unsolved problems in AI safety and how current research groups are tackling them
Understand issues with only using performance to evaluate classifiers.
Understand how positional encoding works in the transformer architecture
Recognize the application of positional encoding using the PyTorch API
Apply positional encoding and embedding into your bigram model (take a look at week 6’s Karpathy video)

Week 4: The AI Safety Landscape + Self-Attention

Core readings: (< 200 min):

Conceptual readings (80 min):

A Bird’s Eye View of the ML Field (45 min)
Paul Christiano: Current work in AI alignment (30 min)

Technical Content (< 120 mins):

Learning Goals:

Understand how ML research is conducted and how it affects AI safety research.
Recognize if a research agenda advances general capabilities.
Understand the variety of different research approaches tackling alignment.
Understand how attention changed the landscape of natural language processing
Recognize how attention is implemented with QKV vectors

Week 5: Failure Modes in AI + Multi-Headed Attention

Core readings (< 180 min):

Conceptual Readings (55 min):

X-Risk Analysis for AI Research (Only read Appendix A) (10 min)
What Failure Looks Like (10 min)
Clarifying What Failure Looks Like (25 mins)

Technical Content (< 130 mins):

Learning Goals:

Recognize how an AI safety project may reduce X-risk.
Understand the failure modes of misaligned AI.
Understand the factors that lead to value lock-in.
Understand multi-headed attention
Apply self-attention in your bigram model (take a look at week 6’s Karpathy video)

Week 6: Open Problems in AI X-Risk + Transformers & GPT-2 from scratch

Core Content (< 200 mins)

Conceptual readings (60 min):

Open Problems in AI X-Risk (60 min)
AI Governance: Opportunity and Theory of Impact (15 min)

Technical content (120 min): Implementing GPT-2 through Andrej Karpathy’s NN-Zero-to-Hero (120 min)

Learning Goals:

Recognize the open problems in AI X-risk and find some points to look into for self-study and possible ART projects
Understand the issues with lobbying for AI governance and policy to “maintain” capabilities research alongside safety research
Apply embeddings, positional encoding, and multi-headed attention in a transformer model
Understand how large language models work internally through each step of the process
Recognize the challenges of creating and training large scale language models

AI Safety at UCLA Intro Fellowship: Transformers Track

Table of Contents

Week 1: Preventing an AI-related catastrophe + Review NN Architecture

Week 2:The future is going to be wild + The Bigram Model

Week 3: Unsolved Problems in ML Safety + Positional Encoding

Week 4: The AI Safety Landscape + Self-Attention

Week 5: Failure Modes in AI + Multi-Headed Attention

Week 6: Open Problems in AI X-Risk + Transformers & GPT-2 from scratch