Skip to content

Commit

Permalink
chore: update readme and add changelog
Browse files Browse the repository at this point in the history
  • Loading branch information
aallam committed Oct 8, 2023
1 parent c87cb4c commit 4c649b1
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 11 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# 0.1.0
> 08 Oct 2023
Initial release.

### Added
- Encodings: `CL100K_BASE` , `R50K_BASE`, `P50K_BASE` and `P50K_EDIT`
- Custom encoding support
20 changes: 9 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[![Maven Central](https://img.shields.io/maven-central/v/com.aallam.ktoken/ktoken?color=blue&label=Download)](https://central.sonatype.com/namespace/com.aallam.ktoken)
[![License](https://img.shields.io/github/license/aallam/ktoken?color=yellow)](LICENSE.md)

**Kt**oken, a BPE tokeniser for use with OpenAI's models.
**Ktoken**, a BPE tokenizer for use with OpenAI's models.

## ⚡️ Getting Started

Expand All @@ -19,11 +19,11 @@ dependencies {
}
```

It is possible to use the library in two modes: **Remote** (*default*) and **Local**.
The library can be used in two modes: **Remote** (*default*) and **Local**.

### Local

Use `LocalPbeLoader` to load encoding from local files:
Use `LocalPbeLoader` to load encodings from local files:

```kotlin
val tokenizer = Tokenizer.getEncoding(encodingName = EncodingName.CL100K_BASE, loader = LocalPbeLoader(FileSystem.SYSTEM))
Expand All @@ -36,17 +36,16 @@ val text = tokenizer.decode(listOf(15339, 1917))

#### JVM

JVM artifacts include encoding files, you can use `LocalPbeLoader` with `FileSystem.RESOURCES` to load them:
JVM artifacts include encoding files. You can use `LocalPbeLoader` with `FileSystem.RESOURCES` to load them:

```kotlin
val tokenizer = Tokenizer.getEncoding(encodingName = EncodingName.CL100K_BASE, loader = LocalPbeLoader(FileSystem.RESOURCES))
```

### Remote (default)

1. Choose and add to your dependencies one of [Ktor's engines](https://ktor.io/docs/http-client-engines.html) to your `build.gradle` file.

2. Use `LocalPbeLoader` to load encoding from local files:
1. Choose and add one of [Ktor's engines](https://ktor.io/docs/http-client-engines.html) to your dependencies in the `build.gradle` file.
2. Use `RemoteBpeLoader` to load encoding from remote sources:

```kotlin
val tokenizer = Tokenizer.getEncoding(encodingName = EncodingName.CL100K_BASE, loader = RemoteBpeLoader())
Expand All @@ -59,7 +58,7 @@ val text = tokenizer.decode(listOf(15339, 1917))

#### BOM

Alternatively, you can use [ktoken-bom](/ktoken-bom) by adding the following dependency to your `build.gradle` file
Alternatively, you can use [ktoken-bom](/ktoken-bom) by adding the following dependency to your `build.gradle` file:

```groovy
dependencies {
Expand All @@ -74,10 +73,9 @@ dependencies {

#### Multiplaform

In multiplatform projects, add **ktoken** dependency to `commonMain`, and choose
an [engine](https://ktor.io/docs/http-client-engines.html) for each target.
In multiplatform projects, add the **ktoken** dependency to `commonMain`, and choose an [engine](https://ktor.io/docs/http-client-engines.html) for each target.

## 📄 License

Ktoken is an open-sourced software licensed under the [MIT license](LICENSE.md).
Ktoken is open-source software licensed under the [MIT license](LICENSE.md).
**This is not affiliated with nor endorsed by OpenAI**.

0 comments on commit 4c649b1

Please sign in to comment.