Run RAGFlow with IPEX-LLM on Intel GPU

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding; by integrating it with ipex-llm, users can now easily leverage local LLMs running on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max).

See the demo of ragflow running Qwen2:7B on Intel Arc A770 below.

You could also click here to watch the demo video.

Quickstart

0. Prerequisites

CPU >= 4 cores
RAM >= 16 GB
Disk >= 50 GB
Docker >= 24.0.0 & Docker Compose >= v2.26.1

1. Install and Start `Ollama` Service on Intel GPU

Follow the steps in Run Ollama with IPEX-LLM on Intel GPU Guide to install and run Ollama on Intel GPU. Ensure that ollama serve is running correctly and can be accessed through a local URL (e.g., https://127.0.0.1:11434) or a remote URL (e.g., http://your_ip:11434).

Important

If the RAGFlow is not deployed on the same machine where Ollama is running (which means RAGFlow needs to connect to a remote Ollama service), you must configure the Ollama service to accept connections from any IP address. To achieve this, set or export the environment variable OLLAMA_HOST=0.0.0.0 before executing the command ollama serve.

2. Pull Model

Now we need to pull a model for RAG using Ollama. Here we use Qwen/Qwen2-7B model as an example. Open a new terminal window, run the following command to pull qwen2:latest.

For Linux users:

export no_proxy=localhost,127.0.0.1
./ollama pull qwen2:latest

For Windows users:

Please run the following command in Miniforge or Anaconda Prompt.
```
set no_proxy=localhost,127.0.0.1
ollama pull qwen2:latest
```

Tip

Besides Qwen2, there are other LLM models you might want to explore, such as Llama3, Phi3, Mistral, etc. You can find all available models in the Ollama model library. Simply search for the model, pull it in a similar manner, and give it a try.

3. Start `RAGFlow` Service

Note

The steps in section 3 is verified on Linux system only.

3.1 Download `RAGFlow`

You can either clone the repository or download the source zip from github:

$ git clone https://github.com/infiniflow/ragflow.git

3.2 Environment Settings

Ensure vm.max_map_count is set to at least 262144. To check the current value of vm.max_map_count, use:

$ sysctl vm.max_map_count

Changing `vm.max_map_count`

To set the value temporarily, use:

$ sudo sysctl -w vm.max_map_count=262144

To make the change permanent and ensure it persists after a reboot, add or update the following line in /etc/sysctl.conf:

vm.max_map_count=262144

3.3 Start the `RAGFlow` server using Docker

Build the pre-built Docker images and start up the server:

Note

Running the following commands automatically downloads the dev version RAGFlow Docker image. To download and run a specified Docker version, update RAGFLOW_VERSION in docker/.env to the intended version, for example RAGFLOW_VERSION=v0.7.0, before running the following commands.

$ export no_proxy=localhost,127.0.0.1
$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d

Note

The core image is about 9 GB in size and may take a while to load.

Check the server status after having the server up and running:

$ docker logs -f ragflow-server

Upon successful deployment, you will see logs in the terminal similar to the following:

    ____                 ______ __
   / __ \ ____ _ ____ _ / ____// /____  _      __
  / /_/ // __ `// __ `// /_   / // __ \| | /| / /
 / _, _// /_/ // /_/ // __/  / // /_/ /| |/ |/ /
/_/ |_| \__,_/ \__, //_/    /_/ \____/ |__/|__/
              /____/

* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:9380
* Running on http://x.x.x.x:9380
INFO:werkzeug:Press CTRL+C to quit

You can now open a browser and access the RAGflow web portal. With the default settings, simply enter http://IP_OF_YOUR_MACHINE (without the port number), as the default HTTP serving port 80 can be omitted. If RAGflow is deployed on the same machine as your browser, you can also access the web portal at http://127.0.0.1 or http://localhost.

4. Using `RAGFlow`

Note

For detailed information about how to use RAGFlow, visit the README of RAGFlow official repository.

Log-in

If this is your first time using RAGFlow, you will need to register. After registering, log in with your new account to access the portal.

Configure `Ollama` service URL

Access the Ollama settings through Settings -> Model Providers in the menu. Fill out the Base URL, and then click the OK button at the bottom.

If the connection is successful, you will see the model listed down Show more models as illustrated below.

Note

If you want to use an Ollama server hosted at a different URL, simply update the Ollama Base URL to the new URL and press the OK button again to re-confirm the connection to Ollama.

Create Knowledge Base

Go to Knowledge Base by clicking on Knowledge Base in the top bar. Click the +Create knowledge base button on the right. You will be prompted to input a name for the knowledge base.

Edit Knowledge Base

After entering a name, you will be directed to edit the knowledge base. Click on Dataset on the left, then click + Add file -> Local files. Upload your file in the pop-up window and click OK.

After the upload is successful, you will see a new record in the dataset. The Parsing Status column will show UNSTARTED. Click the green start button in the Action column to begin file parsing. Once parsing is finished, the Parsing Status column will change to SUCCESS.

Next, go to Configuration on the left menu and click Save at the bottom to save the changes.

Chat with the Model

Start new conversations by clicking Chat in the top navbar.

On the left side, create a conversation by clicking Create an Assistant. Under Assistant Setting, give it a name and select your knowledge bases.

Next, go to Model Setting, choose your model added by Ollama, and disable the Max Tokens toggle. Finally, click OK to start.

Tip

Enabling the Max Tokens toggle may result in very short answers.

Input your questions into the Message Resume Assistant textbox at the bottom, and click the button on the right to get responses.

Exit

To shut down the RAGFlow server, use Ctrl+C in the terminal where the Ragflow server is runing, then close your browser tab.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ragflow_quickstart.md

ragflow_quickstart.md

Run RAGFlow with IPEX-LLM on Intel GPU

Table of Contents

Quickstart

0. Prerequisites

1. Install and Start `Ollama` Service on Intel GPU

2. Pull Model

3. Start `RAGFlow` Service

3.1 Download `RAGFlow`

3.2 Environment Settings

Changing `vm.max_map_count`

3.3 Start the `RAGFlow` server using Docker

4. Using `RAGFlow`

Log-in

Configure `Ollama` service URL

Create Knowledge Base

Edit Knowledge Base

Chat with the Model

Exit

Files

ragflow_quickstart.md

Latest commit

History

ragflow_quickstart.md

File metadata and controls

Run RAGFlow with IPEX-LLM on Intel GPU

Table of Contents

Quickstart

0. Prerequisites

1. Install and Start Ollama Service on Intel GPU

2. Pull Model

3. Start RAGFlow Service

3.1 Download RAGFlow

3.2 Environment Settings

Changing vm.max_map_count

3.3 Start the RAGFlow server using Docker

4. Using RAGFlow

Log-in

Configure Ollama service URL

Create Knowledge Base

Edit Knowledge Base

Chat with the Model

Exit

1. Install and Start `Ollama` Service on Intel GPU

3. Start `RAGFlow` Service

3.1 Download `RAGFlow`

Changing `vm.max_map_count`

3.3 Start the `RAGFlow` server using Docker

4. Using `RAGFlow`

Configure `Ollama` service URL