Large Language Models Resources

articles

LLM Pruning and Distillation in Practice: The Minitron Approach, ST Sreenivas et al, NVidia, 2024
To Believe or Not to Believe Your LLM, Yasin Abbasi Yadkori et al, Google DeepMind, 2024
States as Strings as Strategies: Steering Language Models with Game-Theoretic Solvers, Ian Gemp et al, Google DeepMind, 2024
The Consensus Game: Language Model Generation via Equilibrium Search, Athul Paul Jacob et al, MIT, 2024
Evaluating Reward Models for Language Modeling, N. Lambert et al, U. Washington, 2024
Jamba: AHybrid Transformer-Mamba Language Model, Opher Lieber et al, 2024
Better & Faster Large Language Models via Multi-token Prediction, Fabian Gloeckle et al, 2024
The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text, Y. Guo et al, EP Paris, 2024
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework, Sachin Mehta et al, 2024
FACtual enTailment fOr hallucInation Detection, Vipula Rawte et al, 2024
Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks, Kim et al, Korea U., Imperial College, 2024
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, T. Munkhdalai et al, Google, 2024
Formal Aspects of Language Modeling, Ryan Cotterell, Anej Svete, Clara Meister, Tianyu Liu and Li Du, Lecture Notes, 2023
ReALM:Reference Resolution As Language Modeling, Joel Ruben Antony Moniz et al, Apple, 2024
A Neural Probabilistic Language Model, Y. Bengio et al, 2003
Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey, X. Liu et al, U of Maryland College Park, 2024
Unfamiliar Finetuning Examples Control How Language Models Hallucinate, Katie Kang et al, 2024
Demystifying Embedding Spaces using Large Language Models, G. Tennenholtz et al, Google Research, 2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training, B. McKinzie et al, Apple, 2024
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, Eric Zelikman et al, Stanford U., 2024
Self-Discover: Large Language Models Self-Compose Reasoning Structures, P. Zhou et al, 2024
DeLLMa: A Framework for Decision Making Under Uncertainty with Large Language Models, Ollie Liu et al, USC, 2024
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models, Y. Liu et al, Lehigh U., Microsoft Research, 2024
Solving olympiad geometry without human demonstrations, TH Trinh et al, DeepMind, 2023
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, E. Hubinger et al, Anthropic, 2024
Self-Rewarding Language Models, W. Yuan et al, Meta, NYU, 2024
Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts, M. Besta et al, ETH Zurich, 2024
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits, S. Ma et al , 2024
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding, X. Ning et al, Microsoft, Tsinghua U., 2023
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator, Li et al, DeepMind, Stanford U., UC Berkeley, 2023

related repo: https://sites.google.com/view/chain-of-code
Llama 2: Open Foundation and Fine-Tuned Chat Models, Hugo Touvron, Louis Martin, et al, 2023
GPT-4 Technical Report, OpenAI, 2023
The Dawn of LMMs: Preliminary Explorations with GPT-4Vision, Yang et al, Microsoft, 2023
Sparks of artificial general intelligence: early experiments with GPT-4, Microsoft Research
Large Language Models can learn rules, Zhu et al, DeepMind, 2023
LLaMA: Open and Efficient Foundation Language Models, Meta AI
Physics of Large Language Models (Part 1), Context-free Grammar, Meta FAIR Labs, 2023
Evaluating Large Language Models is a minefield, A. Narayan, S. Kapoor, Princeton U., 2023, online blog
Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks, M. Mitchell et al, Santa Fe Institute, 2023
Understanding LLMs: A Comprehensive Overview from Training to Inference, Liu et al, 2024
Multimodality and Large Multimodal Models (LMMs), Chip Huyen, 2023, online article
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation, S. Saha et al, UNC Chapel Hill, 2023
Introduction to Transformers: an NLP Perspective, T. Xiao et al, 2023
Transformers Learn In-Context by Gradient Descent, Oswald et al, 2023
Transformers as Algorithms: Generalization and Stability in In-context Learning, Li et al., 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models, Poli et al., 2023
Toward Understanding Why Adam Converges faster Than SGD for Transformers, Pan et al., CMU, 2023
Can GPT-3 Perform Statutory Reasoning?, Blair-Stanek, A et al., 2023
An Explanation of In-context Learning as Implicity Bayesian Inference, Xie et al., Stanford, 2022
Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions, S. Bhattamishra, Oxford U., 2023
Efficient Transformers: A Survey, Tay et al., Google Research, 2022
Emergent Abilities of Large Language Models, Wei et al., Google Research, 2022
A Path Towards Autonomous Machine Intelligence, Yann LeCun, 2022
Holisitc Evaluation of Language Models, Center for Research on Foundation Models, Stanford, 2022
A Systematic Evaluation of Large Language Models of Code, Xu et al., CMU, 2022
Evaluating Large Language Models Trained on Code, Chen et al., OpenAI, 2021
Language Models are Few-Shot Learners, Brown et al., OpenAI, 2020
Program Synthesis, Gulwani et al., Microsoft Research, 2017
Adam: A Method for Stochasitc Optimization, D. Kingma et al, 2014
Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning, Roemmele et al, 2011
Catastrophic Interference In Connectionist Networks: The Sequential Learning Problem, McCloskey, Cohen, 1989
Attention Is All You Need, Vaswani et al, Google Brain, 2017
HyperAttention: Long-context Attention in Near-Linear Time, Insu Han et al, 2023
The Annotated Transformer, 2018
The Illustrated Transformer, Jay Alamar's blog, 2021
Attention in Natural Language Processing, Galassi et al., 2020
Vision Language Transformers: A Survey, Clayton Fields, Casey Kennington, 2023
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization, Jin et al, Peking U., 2023
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., Google AI, 2019
FAIRSEQ: A Fast, Extensible Toolkit for Sequence Modeling, Ott et al., 2019
Autoencoders, Dor Bank et al, 2021
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, Chung et al., 2014
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, Cho et al., U de Montreal, 2014
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network, A. Sherstinsky, 2021
A Decomposable Attention Model for Natural Language Inference, Parikh et al., Google Research, 2016
Sequence to Sequence Learning with Neural Networks, Sutskever et al, Google Research, 2014
Transforming Auto-encoders, G. Hinton, A. Krizhevsky, et al., 2011
Long Short-Term Memory, Sepp Hochreiter et al., 1997
Understanding LSTM: a tutorial into Long Short-Term Memory, R. Staudemeyer et al., 2019
What Can Transformers Learn in Context? A Case Study of Simple Function Classes, Carg S., et al, 2023
Toward Understanding Why Adam Converges Faster Than SGD for Transformers, Pan, Y, et al, 2023
How ChatGPT Behavior is Changing Over Time?, Chen, L, Stanford, UC Berkeley, 2023
What Is ChatGPT Doing and Why Does It Work?, S. Wolfram, Feb 2023, online article
Retentive Network: A Successor to Transformer for Large Language Models, Sun, Y., Microsoft Research, 2023
Meta-Transformer: A Unifed Framework for Multi-Modal Learning, Zhang, Y., et al, 2023
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin, Google AI, 2019
Improving Language Understanding by Generative Pre-Training, A. Redford, Open AI, 2018
Llama 2: Open Foundation and Fine-Tuned Chat Models, H. Touvron, Meta, 2023
Scaling Language Models - Methods, Analysis and Insights from Training Gopher,JW Rae, DeepMind, 2021
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering, Tal Ridnik et al, CodiumAI, 2024
Pengi: An Audio Language Model for Audio Tasks, S. Deshmukh et al, 2024
A Comprehensive Overview of Large Language Models, H. Naveed et al, 2024
Are Long-LLMs A Necessity For Long-Context Tasks?, H. Qian et al, 2024
... More articles on Transformers
...More LLM articles on this repo

Human-like Reasoning and Representation Learning

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, Eric Zelikman et al, Stanford U., 2024
STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning, E. Zelikman et al, 2022
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold, A. Setlur et al, CMU, 2024

Theorem Proving

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, Z. Shao et al, Tsinghua U., 2024
Solving Olympiad Geometry without Human Demonstrations, TH Trinh et al, Google DeepMind, 2023
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models, K. Yang et al, 2023
NeurIPS Tutorial on Machine Learning for Theorem Proving, video
DeepMath - Deep Sequence Models for Premise Selection, Alexander Alemi, Francois Chollet et al, 2016

LLM Tokenization

Let's build the GPT Tokenizer with Andrej Karpathy (February 2024)
Minimal Byte Pair Encoding Algorithm (Andrej Karpathy repo)
Language Models are Unsupervised Multitask Learners, Alec Radford et al, 2018
Neural Machine Translation of Rare Words with Subword Units, Rico Senrich et al, 2016

Context Window representations and implementations

Towards infinite LLM context windows, Towards Data Science, Krzysztof K. Zdeb, 2024
RoFormer: Enhanced Transformer with Rotary Position Embedding, Jianlin Su et al, 2023
YaRN: Efficient Context Window Extension of Large Language Models, B. Peng et al, U of Geneva, 2023
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training, D. Zhu et al, Peking U., 2024
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, Y. Ding et al, 2024
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, T. Munkhdalai et al, Google, 2024

Time-series forecasting and classification tasks

iTransformer: The Latest Breakthrough in Time Series Forecasting, Marco Peixeiro, Towards Data Science, April 2024

relevant paper: iTransformer: Inverted Transformers Are Effective for Time Series Forecasting, Yong Liu et al, 2023
MOMENT: A Foundation Model for Time Series Forecasting, Classification, Anomaly Detection, Nikos Kafritsas, Apr 27, 2024, Medium
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Colin Raffel et al, Google, 2023

relevant repo: https://github.com/google-research/text-to-text-transfer-transformer
TimesFM: Google's Foundation Model For Time-Series Forecasting, Nikos Kafritas, 2023, AI Horizon Forecast
MOIRAI: Salesforce's Foundation Transformer For Time-Series Forecasting, Nikos Kafritas, 2023, AI Horizon Forecast

relevant paper: Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting, K. Rasul et al, 2023

relevant paper: A decoder-only foundation model for time-series forecasting, A. Das et al, 2023

relevant paper: Chronos: Learning the Language of Time Series, AF Ansari et al, 2024

relevant paper: Unified Training of Universal Time Series Forecasting Transformers, Woo, G et al, 2024
How to Effectively Forecast Time Series with Amazon's New Time Series Forecasting Model, Eivind Kjosbakken, April 9, 2024, Towards Data Science

relevant paper: Chronos: Learning the Language of Time Series, AF Ansari et al, 2024
TimeGPT: The First Foundation Model for Time Series Forecasting, Marco Peixeiro, October, 2023, Towards Data Science
TimeGPT-1, Azul Garza, Max Mergenthaler-Canseco, Nixtla, 2023
Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series, Vijay Ekambaram et al, 2024

TTM model and source code: https://huggingface.co/ibm/TTM, https://github.com/IBM/tsfm/tree/main/tsfm_public/models/tinytimemixer
Are Language Models Actually Useful for Time Series Forecasting? M. Tan et al, U. of Virginia, U. of Washington, 2024

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis et al, Facebook AI, UCL, 2021
Retrieval-Augmented Generation for Large Language Models: A Survey, Y. gao et al, 2024
From Local to Global: A Graph RAG Approach to Query-Focused Summarization, Darren Edge et al, Microsoft Research, 2024

relevant repo: https://github.com/microsoft/graphrag

relevant online article: https://www.microsoft.com/en-us/research/blog/graphrag-new-tool-for-complex-data-discovery-now-on-github/
Searching for Best Practices in Retrieval-Augmented Generation, X. Wang et al, 2024
Graph Retrieval-Augmented Generation: A Survey, B. Peng et al, 2024

Retrieval-Augmented Fine Tuning (RAFT)

RAFT: A new way to teach LLMs to be better at RAG, Cedric Vidal, 2024
RAFT: Adapting Language Model to Domain Specific RAG, T. Zhang et al, 2024 (blog)
RAFT: Adapting Language Model to Domain Specific RAG, T. Zhang et al, 2024 (paper)

relevant repos:

https://github.com/ShishirPatil/gorilla/tree/main/raft
https://github.com/ShishirPatil/gorilla
How to Build a Local Open-Source LLM Chatbot With RAG: Talking to PDF documents with Google’s Gemma-2b-it, LangChain, and Streamlit, Dr. Leon Eversberg, medium

The Attention Mechanicsm in Large Language Models

Contextual Position Encoding: Learning to Count What's Important, O. Golovneva et al, Meta, 2024
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, T. Dao et al, Stanford U., 2022
HyperAttention: Long COntext Attention in Near Linear Time, Insu Han et al, Yale, Google Research, 2023
Augmenting Language Models with Long Term Memory, W. Wang et al, UC Santa Barbara, 2023

Compiler Optimization using LLM

Meta Large Language Model Compiler: Foundation Models of Compiler Optimization, C. Cummins et al, MetaAI, 2024

huggingface repo: LLM compiler

Evaluation of LLMs

Evaluating LLM Systems: Essential Metrics, Benchmarks, and Best Practices, Jeffrey Ip, online article, 2024
LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide, Jeffrey Ip, online article, 2024
confident-ai repo for LLM evaluation: https://github.com/confident-ai/deepeval
Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions, T. Hu et al, 2024

online videos and blogs

GPT in 60 lines of NumPy code with Jay Mody, blog, (January, 2023)
How ChatGPT is Trained with Ari Seff (February, 2023)
Let's build GPT: from scratch, in code, spelled out with Andrej Karpathy (February 2023)
Let's build the GPT Tokenizer with Andrej Karpathy (February 2024)

Resource on LLM visualization

The resource below attempts to visualize what is happening in LLM under the hood and is a helpful tool to comprehend the work of decoder-only Transformer-based LLMs. The author Brendan Bycroft has made an interesting attempt to visualize these structures and clarify how they operate. This webpage in the link below provides visualization for a family of GPT models, presented in 3D animations with walkthrough. The tool provides a step-by-step guide for single-token inference, coupled with interactive elements for a hands-on experience.

https://bbycroft.net/llm

Articles on LLMs in Cornell University's Advancing AI for Humanity blog

The blog: https://thegenerality.com/agi/

some of the articles:

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits, S. Ma et al , 2024
BitNet: Scaling 1-bit Transformers for Large Language Models
Retentive Network: A Successor to Transformer for Large Language Models, Sun et al, 2023
Large Language Model for Science: A Study on P vs. NP, Q. Dong et al, 2023
Augmenting Language Models with Long-Term Memory, W. Wang et al, 2023
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers, Dai et al, 2023
LONGNET: Scaling Transformers to 1,000,000,000 Tokens, J. Ding et al, 2023
A Length-Extrapolatable Transformer, Sun et al, 2022

medium

The Transformer Architecture of GPT Models with Beatriz Stollniz
Learning Transformers Code First Part 1 - The Setup with Lily Hughs-Robinson
Learning Transformers Code First Part 2 - GPT Up Close and Personal with Lily Hughs-Robinson
Understanding Large Language Models: The Physics of ChatGPT and BERT with Tim Lou
Transformer Architectures and the Rise of BERT, GPT, and T5: A Beginner's Guide with Manas Joshi
Inside GPT - I: Understanding the text generation with Fatih Demirci
Platypus: Quick, Cheap and Powerful LLM with Salvatore Raieli
Configuring Nemo-Guardrails Your Way: An Alternative Method for LLM with Masatake Hirono
ChatGPT stories compiled by Mateusz Wasalski
RetNet: Transformer killer is here with Vishal Rajput
Fine-Tuning Large Language Models (LLMs) with Shawhin Talebi
How to Build an LLM from Scratch with Shawhin Talebi
Conversations as Directed Graphs with LangChain with Daniel Warfield
Mastering Language Models with Samuel Montgomery
Self-Supervised Learning Using Projection Heads with Daniel Warfield
Summing Coin Values in Images using Lang-SAM and Deep Learning with Gamze Zorlubas
‘Talk’ to Your SQL Database Using LangChain and Azure OpenAI with Satwiki De
RLHF: Reinforcement Learning from Human Feedback with Ms Aerin

related paper: Training language models to follow instructions with human feedback, Ouyang et al, OpenAI, 2022

related code: Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture
How to Convert Any Text Into a Graph of Concepts with Rahul Nayak

related repo: Knowledge_Graph
LLMs for Everyone: Running LangChain and a MistralAI 7B Model in Google Colab with Dmitrii Eliuseev
LLMs for Everyone: Running the LLaMA-13B model and LangChain in Google Colab with Dmitrii Eliuseev

related repo: https://github.com/ggerganov/llama.cpp

related repo: https://github.com/langchain-ai/langchain

related repo: https://colab.research.google.com/
Is Mamba the End of ChatGPT As We Know It? Igancio de Gregorio

related paper: Mamba: Linear-Time Sequence Modeling with Selective State Spaces, A. Gu et al, CMU, 2024
RLAIF: Reinforcement Learning from AI Feedback with Cameron R. Wolfe, Jan, 2024

related paper: RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback, Harrison Lee et al, 2023

related paper: Constitutional AI: Harmlessness from AI Feedback, Y. Bai, 2022

related paper: PaLM: Scaling Language Modeling with Pathways, A. Chowdhery et al, 2022

related paper: PaLM 2 Technical Report, Google, 2023

related paper: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al, Google Research, 2022

related paper: Self-Consistency Improves Chain of Thought Reasoning in Language Models, Wang et al, Google Research, ICLR 2023

related paper: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Bai et al, Anthropic, 2022

related paper: A General Language Assistant as a Laboratory for Alignment, A. Askell et al, Anthropic, 2021

related paper: Learning to summarize from human feedback, N. Stiennon et al, OpenAI, 2022
Mistral AI vs. Meta: Comparing Top Open-source LLMs with Luis Roque, Jan 2024
Text Embeddings, Classification, and Semantic Search, Shaw Talebi, March 2024
How to Build a Local Open-Source LLM Chatbot With RAG, Dr. Leon Eversberg, April, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LargeLanguageModelsResources.md

LargeLanguageModelsResources.md

Large Language Models Resources

articles

Human-like Reasoning and Representation Learning

Theorem Proving

LLM Tokenization

Context Window representations and implementations

Time-series forecasting and classification tasks

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Fine Tuning (RAFT)

The Attention Mechanicsm in Large Language Models

Compiler Optimization using LLM

Evaluation of LLMs

online videos and blogs

Resource on LLM visualization

Articles on LLMs in Cornell University's Advancing AI for Humanity blog

medium

Files

LargeLanguageModelsResources.md

Latest commit

History

LargeLanguageModelsResources.md

File metadata and controls

Large Language Models Resources

articles

Human-like Reasoning and Representation Learning

Theorem Proving

LLM Tokenization

Context Window representations and implementations

Time-series forecasting and classification tasks

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Fine Tuning (RAFT)

The Attention Mechanicsm in Large Language Models

Compiler Optimization using LLM

Evaluation of LLMs

online videos and blogs

Resource on LLM visualization

Articles on LLMs in Cornell University's Advancing AI for Humanity blog

medium