AI Safety at UCLA Intro Fellowship: Transformers Track

Table of Contents

  1. Week 1: Preventing an AI-related catastrophe
  2. Week 2: The future is going to be wild
  3. Week 3: AI Safety Field Background
  4. Week 4: Unsolved Problems in ML Safety
  5. Week 5: Failure Modes in AI
  6. Week 6: Open Problems in AI X-Risk

Core Content (< 200 mins):

  1. Preventing an AI-related catastrophe (120 min)
  2. Review the neural network architecture, forward & backward propagation, and weights & biases (Neural Networks Chapters 1-5, 3Blue1Brown)

Optional Additional Practice (< 180 mins):

  1. Implementing Micrograd (Andrej Karpathy’s NN-Zero-to-Hero): (120 min)
  2. PyTorch Tutorial (PyTorch)

Learning Goals:

  1. Recognize the arguments for AI being an existential risk
  2. Understand the neural network architecture and how models learn
  3. Apply the PyTorch API to implement backpropagation

Week 2:The future is going to be wild + The Bigram Model

The progress of AI has been quite fast, AI today is quite capable, and AI has been very useful in solving problems that other methods cannot solve.

Core Content: (< 200 min)

Conceptual Readings (60 mins):

  1. AI and Compute (5 min)
  2. The Bitter Lesson (10 min)
  3. All Possible Views About Humanity’s Future are Wild (15 min)
  4. This can’t go on (20 min)
  5. Intelligence Explosion: Evidence and Import (20 min)

Practical (< 150 mins)

  1. The Bigram Model (Andrej Karpathy’s Zero-to-Hero) (120 min)
  2. You may need to watch part 1 to understand the MLP task (75 min)

Additional Optional Content (< 50 mins)

  1. CS M146 - Generative AI by Prof. Aditya Grover (50 min)

Learning Goals:

  1. Recognize the relationship between compute and capabilities.
  2. Recognize the radical impact AI can have on the future of humanity
  3. Understand the speed with which AI will transition from powerful to superintelligence.
  4. Understand probability distributions in generative language models
  5. Recognize trainable parameters/tasks in language generation at a small scale

Week 3: Unsolved Problems in ML Safety + Positional Encoding

Core Content (< 150 mins):

Conceptual (< 80 mins)

  1. Unsolved Problems in ML Safety (60 min)
  2. Why AI alignment could be hard with modern deep learning (20 mins)

Technical Readings (< 60 mins):

  1. Intro to Positional Encoding (30 min)
  2. Code Emporium’s Positional Encoding Video (10 min)
  3. PyTorch’s Transformer Example (30 min)

Learning Goals:

  1. Recognize the unsolved problems in AI safety and how current research groups are tackling them
  2. Understand issues with only using performance to evaluate classifiers.
  3. Understand how positional encoding works in the transformer architecture
  4. Recognize the application of positional encoding using the PyTorch API
  5. Apply positional encoding and embedding into your bigram model (take a look at week 6’s Karpathy video)

Week 4: The AI Safety Landscape + Self-Attention

Core readings: (< 200 min):

Conceptual readings (80 min):

  1. A Bird’s Eye View of the ML Field (45 min)
  2. Paul Christiano: Current work in AI alignment (30 min)

Technical Content (< 120 mins):

  1. The Transformer by Mohammed Terry-Jack
  2. Language Modeling with Transformers, read from the Attention until the end (45 min)
  3. The Annotated Transformer read through Part 1 (60 min)
  4. Rasa Algorithm Whiteboard - Transformers & Attention 1: Self Attention (10 min)
  5. Rasa Algorithm Whiteboard - Transformers & Attention 2: Keys, Values, Queries (10 min)

Learning Goals:

  1. Understand how ML research is conducted and how it affects AI safety research.
  2. Recognize if a research agenda advances general capabilities.
  3. Understand the variety of different research approaches tackling alignment.
  4. Understand how attention changed the landscape of natural language processing
  5. Recognize how attention is implemented with QKV vectors

Week 5: Failure Modes in AI + Multi-Headed Attention

Core readings (< 180 min):

Conceptual Readings (55 min):

  1. X-Risk Analysis for AI Research (Only read Appendix A) (10 min)
  2. What Failure Looks Like (10 min)
  3. Clarifying What Failure Looks Like (25 mins)

Technical Content (< 130 mins):

  1. The Annotated Transformer read through the rest (100 min)
  2. Rasa Algorithm Whiteboard - Transformers & Attention 3: Multi Head Attention (10 min)
  3. The BERT Paper skim through to understand the architecture of real-world models (30 min)

Learning Goals:

  1. Recognize how an AI safety project may reduce X-risk.
  2. Understand the failure modes of misaligned AI.
  3. Understand the factors that lead to value lock-in.
  4. Understand multi-headed attention
  5. Apply self-attention in your bigram model (take a look at week 6’s Karpathy video)

Week 6: Open Problems in AI X-Risk + Transformers & GPT-2 from scratch

Core Content (< 200 mins)

Conceptual readings (60 min):

  1. Open Problems in AI X-Risk (60 min)
  2. AI Governance: Opportunity and Theory of Impact (15 min)

Technical content (120 min): Implementing GPT-2 through Andrej Karpathy’s NN-Zero-to-Hero (120 min)

Learning Goals:

  1. Recognize the open problems in AI X-risk and find some points to look into for self-study and possible ART projects
  2. Understand the issues with lobbying for AI governance and policy to “maintain” capabilities research alongside safety research
  3. Apply embeddings, positional encoding, and multi-headed attention in a transformer model
  4. Understand how large language models work internally through each step of the process
  5. Recognize the challenges of creating and training large scale language models