Skip to content

Latest commit

 

History

History

ch_1_Introduction

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Chapter 1 Introduction

What is IPEX-LLM

IPEX-LLM is a low-bit LLM library on Intel XPU (Xeon/Core/Flex/Arc/PVC), featuring broadest model support, lowest latency and smallest memory footprint. It is released under Apache 2.0 License.

What can you do with IPEX-LLM

You can use IPEX-LLM to run any pytorch model (e.g. HuggingFace transformers models). It automatically optimizes and accelerates LLMs using low-bit optimizations, modern hardware accelerations and latest software optimizations.

Using IPEX-LLM is easy. With just 1-line of code change, you can immediately observe significant speedup 1 .

Example: Optimize LLaMA model with optimize_model

from ipex_llm import optimize_model

from transformers import LlamaForCausalLM, LlamaTokenizer
model = LlamaForCausalLM.from_pretrained(model_path,...)

# apply ipex-llm low-bit optimization, by default uses INT4
model = optimize_model(model)

...

IPEX-LLM provides a variety of low-bit optimizations (e.g., INT3/NF3/INT4/NF4/INT5/INT8), and allows you to run LLMs on low-cost PCs (CPU-only), on PCs with GPU, or on cloud.

The demos below shows the experiences of running 7B and 13B model on a 16G memory laptop.

6B model running on an Intel 12-Gen Core PC (real-time screen capture):

13B model running on an Intel 12-Gen Core PC (real-time screen capture):

What's Next

The following chapters in this tutorial will explain in more details about how to use IPEX-LLM to build LLM applications, e.g. best practices for setting up your environment, APIs, Chinese support, GPU, application development guides with case studies, etc. Most chapters provide runnable notebooks using popular open source models. Read along to learn more and run the code on your laptop.

Also, you can check out our GitHub repo for more information and latest news.

We have already verified many models on IPEX-LLM and provided ready-to-run examples, such as Llama2, Vicuna, ChatGLM, ChatGLM2, Baichuan, MOSS, Falcon, Dolly-v1, Dolly-v2, StarCoder, Mistral, RedPajama, Whisper, etc. You can find more model examples here.

Footnotes

  1. Performance varies by use, configuration and other factors. ipex-llm may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.