Gollama

Gollama is a macOS / Linux tool for managing Ollama models.

It provides a TUI (Text User Interface) for listing, inspecting, deleting, copying, and pushing Ollama models as well as optionally linking them to LM Studio*.

The application allows users to interactively select models, sort, filter, edit, run, unload and perform actions on them using hotkeys.

Features

The project started off as a rewrite of my llamalink project, but I decided to expand it to include more features and make it more user-friendly.

It's in active development, so there are some bugs and missing features, however I'm finding it useful for managing my models every day, especially for cleaning up old models.

List available models
Display metadata such as size, quantisation level, model family, and modified date
Edit / update a model's Modelfile
Sort models by name, size, modification date, quantisation level, family etc
Select and delete models
Run and unload models
Inspect model for additional details
Calculate approximate vRAM usage for a model
Link models to LM Studio Note: This is currently broken on the latest LM-Studio versions, see #82
Copy / rename models
Push models to a registry
Show running models
Has some cool bugs

Installation

From go:

go install github.com/sammcj/gollama@HEAD

From Github:

Download the most recent release from the releases page and extract the binary to a directory in your PATH.

e.g. zip -d gollama*.zip -d gollama && mv gollama /usr/local/bin

Usage

To run the gollama application, use the following command:

gollama

Tip: I like to alias gollama to g for quick access:

echo "alias g=gollama" >> ~/.zshrc

Key Bindings

Space: Select
Enter: Run model (Ollama run)
i: Inspect model
t: Top (show running models)
D: Delete model
e: Edit model new
c: Copy model
U: Unload all models
p: Pull an existing model new
g: Pull (get) new model new
P: Push model
n: Sort by name
s: Sort by size
m: Sort by modified
k: Sort by quantisation
f: Sort by family
l: Link model to LM Studio Note: This is currently broken on the latest LM-Studio versions, see #82
L: Link all models to LM Studio *^
r: Rename model (Work in progress)
q: Quit

Top

Top (t)

Inspect

Inspect (i)

Command-line Options

-l: List all available Ollama models and exit
-L: Link all available Ollama models to LM Studio and exit Note: This is currently broken on the latest LM-Studio versions, see #82
-s <search term>: Search for models by name
- OR operator ('term1|term2') returns models that match either term
- AND operator ('term1&term2') returns models that match both terms
-e <model>: Edit the Modelfile for a model
-ollama-dir: Custom Ollama models directory
-lm-dir: Custom LM Studio models directory
-cleanup: Remove all symlinked models and empty directories and exit
-no-cleanup: Don't cleanup broken symlinks
-u: Unload all running models
-v: Print the version and exit
-h, or --host: Specify the host for the Ollama API, if you provide l as the host it will automatically use http://localhost:11434
--vram: Estimate vRAM usage for an existing (pulled) Ollama model name (e.g. llama3.1:8b-instruct-q6_K) huggingface model ID (e.g. NousResearch/Hermes-2-Theta-Llama-3-8B), you can also set GOLLAMA_DEFAULT_MODEL in your environment and provide default as the model name (I'll add a config option for this soon)
- --fits: Available memory in GB for context calculation (e.g. 6 for 6GB)

Simple model listing

Gollama can also be called with -l to list models without the TUI.

gollama -l

List (gollama -l):

Edit

Gollama can be called with -e to edit the Modelfile for a model.

gollama -e my-model

Search

Gollama can be called with -s to search for models by name.

gollama -s my-model # returns models that contain 'my-model'

gollama -s 'my-model|my-other-model' # returns models that contain either 'my-model' or 'my-other-model'

gollama -s 'my-model&instruct' # returns models that contain both 'my-model' and 'instruct'

vRAM Estimation

Gollama includes a comprehensive vRAM estimation feature:

Calculate vRAM usage for a given huggingface model ID (e.g. author/name) or pulled Ollama model (e.g. my-model:mytag)
Determine maximum context length for a given vRAM constraint
Find the best quantisation setting for a given vRAM and context constraint
Shows estimates for different k/v cache quantisation options (fp16, q8_0, q4_0)
Automatic detection of available CUDA vRAM (coming soon!) or system RAM

To estimate (v)RAM usage:

gollama --vram llama3.1:8b-instruct-q6_K

📊 VRAM Estimation for Model: llama3.1:8b-instruct-q6_K

| QUANT   | CTX  | BPW | 2K  | 8K              | 16K             | 32K             | 49K             | 64K |
| ------- | ---- | --- | --- | --------------- | --------------- | --------------- | --------------- |
| IQ1_S   | 1.56 | 2.2 | 2.8 | 3.7(3.7,3.7)    | 5.5(5.5,5.5)    | 7.3(7.3,7.3)    | 9.1(9.1,9.1)    |
| IQ2_XXS | 2.06 | 2.6 | 3.3 | 4.3(4.3,4.3)    | 6.1(6.1,6.1)    | 7.9(7.9,7.9)    | 9.8(9.8,9.8)    |
| IQ2_XS  | 2.31 | 2.9 | 3.6 | 4.5(4.5,4.5)    | 6.4(6.4,6.4)    | 8.2(8.2,8.2)    | 10.1(10.1,10.1) |
| IQ2_S   | 2.50 | 3.1 | 3.8 | 4.7(4.7,4.7)    | 6.6(6.6,6.6)    | 8.5(8.5,8.5)    | 10.4(10.4,10.4) |
| IQ2_M   | 2.70 | 3.2 | 4.0 | 4.9(4.9,4.9)    | 6.8(6.8,6.8)    | 8.7(8.7,8.7)    | 10.6(10.6,10.6) |
| IQ3_XXS | 3.06 | 3.6 | 4.3 | 5.3(5.3,5.3)    | 7.2(7.2,7.2)    | 9.2(9.2,9.2)    | 11.1(11.1,11.1) |
| IQ3_XS  | 3.30 | 3.8 | 4.5 | 5.5(5.5,5.5)    | 7.5(7.5,7.5)    | 9.5(9.5,9.5)    | 11.4(11.4,11.4) |
| Q2_K    | 3.35 | 3.9 | 4.6 | 5.6(5.6,5.6)    | 7.6(7.6,7.6)    | 9.5(9.5,9.5)    | 11.5(11.5,11.5) |
| Q3_K_S  | 3.50 | 4.0 | 4.8 | 5.7(5.7,5.7)    | 7.7(7.7,7.7)    | 9.7(9.7,9.7)    | 11.7(11.7,11.7) |
| IQ3_S   | 3.50 | 4.0 | 4.8 | 5.7(5.7,5.7)    | 7.7(7.7,7.7)    | 9.7(9.7,9.7)    | 11.7(11.7,11.7) |
| IQ3_M   | 3.70 | 4.2 | 5.0 | 6.0(6.0,6.0)    | 8.0(8.0,8.0)    | 9.9(9.9,9.9)    | 12.0(12.0,12.0) |
| Q3_K_M  | 3.91 | 4.4 | 5.2 | 6.2(6.2,6.2)    | 8.2(8.2,8.2)    | 10.2(10.2,10.2) | 12.2(12.2,12.2) |
| IQ4_XS  | 4.25 | 4.7 | 5.5 | 6.5(6.5,6.5)    | 8.6(8.6,8.6)    | 10.6(10.6,10.6) | 12.7(12.7,12.7) |
| Q3_K_L  | 4.27 | 4.7 | 5.5 | 6.5(6.5,6.5)    | 8.6(8.6,8.6)    | 10.7(10.7,10.7) | 12.7(12.7,12.7) |
| IQ4_NL  | 4.50 | 5.0 | 5.7 | 6.8(6.8,6.8)    | 8.9(8.9,8.9)    | 10.9(10.9,10.9) | 13.0(13.0,13.0) |
| Q4_0    | 4.55 | 5.0 | 5.8 | 6.8(6.8,6.8)    | 8.9(8.9,8.9)    | 11.0(11.0,11.0) | 13.1(13.1,13.1) |
| Q4_K_S  | 4.58 | 5.0 | 5.8 | 6.9(6.9,6.9)    | 8.9(8.9,8.9)    | 11.0(11.0,11.0) | 13.1(13.1,13.1) |
| Q4_K_M  | 4.85 | 5.3 | 6.1 | 7.1(7.1,7.1)    | 9.2(9.2,9.2)    | 11.4(11.4,11.4) | 13.5(13.5,13.5) |
| Q4_K_L  | 4.90 | 5.3 | 6.1 | 7.2(7.2,7.2)    | 9.3(9.3,9.3)    | 11.4(11.4,11.4) | 13.6(13.6,13.6) |
| Q5_K_S  | 5.54 | 5.9 | 6.8 | 7.8(7.8,7.8)    | 10.0(10.0,10.0) | 12.2(12.2,12.2) | 14.4(14.4,14.4) |
| Q5_0    | 5.54 | 5.9 | 6.8 | 7.8(7.8,7.8)    | 10.0(10.0,10.0) | 12.2(12.2,12.2) | 14.4(14.4,14.4) |
| Q5_K_M  | 5.69 | 6.1 | 6.9 | 8.0(8.0,8.0)    | 10.2(10.2,10.2) | 12.4(12.4,12.4) | 14.6(14.6,14.6) |
| Q5_K_L  | 5.75 | 6.1 | 7.0 | 8.1(8.1,8.1)    | 10.3(10.3,10.3) | 12.5(12.5,12.5) | 14.7(14.7,14.7) |
| Q6_K    | 6.59 | 7.0 | 8.0 | 9.4(9.4,9.4)    | 12.2(12.2,12.2) | 15.0(15.0,15.0) | 17.8(17.8,17.8) |
| Q8_0    | 8.50 | 8.8 | 9.9 | 11.4(11.4,11.4) | 14.4(14.4,14.4) | 17.4(17.4,17.4) | 20.3(20.3,20.3) |

To find the best quantisation type for a given memory constraint (e.g. 6GB) you can provide --fits <number of GB>:

gollama --vram NousResearch/Hermes-2-Theta-Llama-3-8B --fits 6

📊 VRAM Estimation for Model: NousResearch/Hermes-2-Theta-Llama-3-8B

| QUANT/CTX | BPW  | 2K  | 8K   | 16K             | 32K             | 49K             | 64K             |
| --------- | ---- | --- | ---- | --------------- | --------------- | --------------- | --------------- |
| IQ1_S     | 1.56 | 2.4 | 3.8  | 5.7(4.7,4.2)    | 9.5(7.5,6.5)    | 13.3(10.3,8.8)  | 17.1(13.1,11.1) |
| IQ2_XXS   | 2.06 | 2.9 | 4.3  | 6.3(5.3,4.8)    | 10.1(8.1,7.1)   | 13.9(10.9,9.4)  | 17.8(13.8,11.8) |
...

This will display a table showing vRAM usage for various quantisation types and context sizes.

The vRAM estimator works by:

Fetching the model configuration from Hugging Face (if not cached locally)
Calculating the memory requirements for model parameters, activations, and KV cache
Adjusting calculations based on the specified quantisation settings
Performing binary and linear searches to optimize for context length or quantisation settings

Note: The estimator will attempt to use CUDA vRAM if available, otherwise it will fall back to system RAM for calculations.

Configuration

Gollama uses a JSON configuration file located at ~/.config/gollama/config.json. The configuration file includes options for sorting, columns, API keys, log levels etc...

Example configuration:

{
  "default_sort": "modified",
  "columns": [
    "Name",
    "Size",
    "Quant",
    "Family",
    "Modified",
    "ID"
  ],
  "ollama_api_key": "",
  "ollama_api_url": "http://localhost:11434",
  "lm_studio_file_paths": "",
  "log_level": "info",
  "log_file_path": "/Users/username/.config/gollama/gollama.log",
  "sort_order": "Size",
  "strip_string": "my-private-registry.internal/",
  "editor": "",
  "docker_container": ""
}

strip_string can be used to remove a prefix from model names as they are displayed in the TUI. This can be useful if you have a common prefix such as a private registry that you want to remove for display purposes.
docker_container - experimental - if set, gollama will attempt to perform any run operations inside the specified container.
editor - experimental - if set, gollama will use this editor to open the Modelfile for editing.

Installation and build from source

Clone the repository:

git clone https://github.com/sammcj/gollama.git
cd gollama

Build:
```
go get
make build
```
Run:
```
./gollama
```

Logging

Logs can be found in the gollama.log which is stored in $HOME/.config/gollama/gollama.log by default. The log level can be set in the configuration file.

Contributing

Contributions are welcome! Please fork the repository and create a pull request with your changes.

_Sam

_{Jose Almaraz}

_{Jose Roberto Almaraz}

_anrgct

Acknowledgements

Thank you to folks such as Matt Williams, Fahd Mirza and AI Code King for giving this a shot and providing feedback.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.github		.github
config		config
logging		logging
screenshots		screenshots
vramestimator		vramestimator
.editorconfig		.editorconfig
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
app_model.go		app_model.go
go.mod		go.mod
go.sum		go.sum
gollama-logo.png		gollama-logo.png
helpers.go		helpers.go
item_delegate.go		item_delegate.go
keymap.go		keymap.go
main.go		main.go
model.go		model.go
operations.go		operations.go
operations_test.go		operations_test.go
progress.go		progress.go
renovate.json		renovate.json
styles.go		styles.go
text_input.go		text_input.go
top_view.go		top_view.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gollama

Table of Contents

Features

Installation

Usage

Key Bindings

Top

Inspect

Command-line Options

Simple model listing

Edit

Search

vRAM Estimation

Configuration

Installation and build from source

Logging

Contributing

Acknowledgements

License

About

Releases 35

Sponsor this project

Packages

Contributors 6

Languages

License

sammcj/gollama

Folders and files

Latest commit

History

Repository files navigation

Gollama

Table of Contents

Features

Installation

Usage

Key Bindings

Top

Inspect

Command-line Options

Simple model listing

Edit

Search

vRAM Estimation

Configuration

Installation and build from source

Logging

Contributing

Acknowledgements

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 35

Sponsor this project

Packages 0

Contributors 6

Languages

Packages