Skip to content

Localized Multimodal Large Language Model (MLLM) integrated with Streamlit and Ollama for text and image processing tasks.

License

Notifications You must be signed in to change notification settings

NotYuSheng/Multimodal-Large-Language-Model

Repository files navigation

Multimodal-Large-Language-Model (MLLM)

GitHub last commit Sphinx

Thank you for checking out the Multimodal-Large-Language-Model project. Please note that this project was created for research purposes.

For a more robust and well-developed solution, you may consider using open-webui/open-webui with ollama/ollama.

Demo image

Documentation

You can access the project documentation at [GitHub Pages].

Host requirements

  • Docker: [Installation Guide]
  • Docker Compose: [Installation Guide]
  • Compatibile with Linux and Windows Host
  • Ensure port 8501 and 11434 are not already in use
  • You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. [Source]
  • Project can be ran on either CPU or GPU

Running on GPU

Tested Model(s)

Model Name Size Link
llava:7b 4.7GB Link
llava:34b 20GB Link

Llava is pulled and loaded by default, other models from Ollama can be added into ollama/ollama-build.sh

Usage

Note

Project will run on GPU by default. To run on CPU, use the docker-compose.cpu.yml instead

  1. Clone this repository and navigate to project folder
git clone https://github.com/NotYuSheng/Multimodal-Large-Language-Model.git
cd Multimodal-Large-Language-Model
  1. Build the Docker images:
docker-compose build
  1. Run images
docker-compose up -d
  1. Access Streamlit webpage from host
<host-ip>:8501

API calls to Ollama server can be made to

<host-ip>:11434