Skip to content

Latest commit

 

History

History
419 lines (211 loc) · 38.9 KB

TransformersResources.md

File metadata and controls

419 lines (211 loc) · 38.9 KB

Transformer Resources

books

articles

Understanding Transformers, Interpretability of Transformers, Mathematical Models of Transformers

Embeddings

In-context learning with Transformers

Cross-Layer Attention in Transformers

Reinforcement Learning in Transformers

Hyper-Networks, MotherNet and PFNs (Prior-Data Fitted Networks)

Sequential Decision Modeling and Predictive Sequence Models

... More articles on Transformers

Vision Transformers

Long Short Term Memory (the precursor of Transformers)

State Space Models (an alternative of Transformers)

Time Series Forecasting

Medium

Classes and Lectures on Transformers

Stanford CS 25

Stanford CS 25 Home url

Stanford CS 25 Transformers United: 25 Lectures Set, youtube playlist

Stanford CS 25: Lecture 1 Transformers United: DL Models that have revolutionized NLP, CV, RL

Stanford CS 25: Lecture 2 Transformers in Language: The development of GPT Models, GPT3

Stanford CS 25: Lecture 3 Transformers in Vision: Tackling problems in Computer Vision

Stanford CS 25: Lecture 4 Decision Transformer: Reinforcement Learning via Sequence Modeling

Stanford CS 25: Lecture 5 Mixture of Experts (MoE) paradigm and the Switch Transformer

Stanford CS 25: Lecture 6 DeepMind's Perceiver and Perceiver IO: new data family architecture

Stanford CS 25: Lecture 7 Self Attention and Non-parametric transformers (NPTs)

Stanford CS 25: Lecture 8 Transformer Circuits, Induction Heads, In-Context Learning

Stanford CS 25: Lecture 9 Audio Research: Transformers for Applications in Audio, Speech, Music

Stanford CS 25: Lecture 10 Represent part-whole hierarchies in a neural network, Geoff Hinton

Stanford CS 25: Lecture 11 Introduction to Transformers w/ Andrej Karpathy

Stanford CS 25: Lecture 12 Language and Human Alignment

Stanford CS 25: Lecture 13 Emergent Abilities and Scaling in LLMs

Stanford CS 25: Lecture 14 Strategic Games

Stanford CS 25: Lecture 15 Robotics and Imitation Learning

Stanford CS 25: Lecture 16 Common Sense Reasoning

Stanford CS 25: Lecture 17 Biomedical Transformers

Stanford CS 25: Lecture 18 Neuroscience-Inspired Artificial Intelligence

Stanford CS 25: Lecture 19 Low-level Embodied Intelligence w/ Foundation Models

Stanford CS 25: Lecture 20 Generalist Agents in Open-Ended Worlds

Stanford CS 25: Lecture 21 How I Learned to Stop Worrying and Love the Transformer

Stanford CS 25: Lecture 22 Recipe for Training Helpful Chatbots

Stanford CS 25: Lecture 23 No Language Left Behind: Scaling Human-Centered Machine Translation

Stanford CS 25: Lecture 24 Beyond LLMs: Agents, Emergent Abilities, Intermediate-Guided Reasoning, BabyLM

Stanford CS 25: Lecture 25 Retrieval Augmented Language Models

Youtube videos and presentations

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training, Umar Jamil, 2023, youtube video

GPT - DYI

GPT in 60 Lines of code: https://jaykmody.com/blog/gpt-from-scratch/