Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Develop #442

Closed
wants to merge 63 commits into from
Closed

Develop #442

wants to merge 63 commits into from

Conversation

philpax
Copy link
Collaborator

@philpax philpax commented Nov 12, 2023

The pending PRs were interrelated, but I didn't want to leave main in a half-working state, so I've merged all the PRs into a new develop branch. The plan is to work on this branch and leave main in maintenance mode until this is ready.

Closes #365, closes #403, closes #439, closes #77.

This integrates:

  • a GGML version upgrade
  • GGUF support
  • BERT support
  • APIs for context-shuffling

This is the to-do list:

  • Update to the latest GGML
  • Fix CUDA inference
  • Fix OpenCL inference
  • Fix Metal inference
  • Fix the embedded tokenizer
  • Readd quantisation
  • Modularize the model definitions (i.e. move block inference to the block struct)
  • Fix models (ensure they're uncommented in llm):
    • Fix BLOOM
    • Fix GPT-NeoX
    • Fix Falcon
    • Fix GPT-2
    • Fix GPT-J
    • Fix MPT
    • Fix BERT
  • Remove the expects
  • Fix the TODOs

oppiliappan and others added 30 commits August 7, 2023 14:55
Co-authored-by: Lukas Kreussel <lukaskreussel@gmail.com>
Co-authored-by: Philpax <me@philpax.me>
* with some heavy caveats, see the PR
Build against newer GGML version
Add "context swap" functions to session and add "decoded_tokens" to snapshot read/write
@philpax philpax closed this Jun 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants