Mismatch to OpenAI's tokenizer? #29
Unanswered
hitsthings
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I was trying to compare for correctness and it seems OpenAI counts an extra token in their organizer? The first <h is two tokens for them.
Is that related to #19 ? Or is there something else going on? Interestingly, the other models that gpt-tokenizer supports seem to match what is on the Tokenizer page (even though cl100k_base is listed as the gpt-3.5 turbo tokenizer).
As someone new to the repo, I'm sure I'm just ignorant and this is expected. It would be great to get help understanding the gotchas on when it might differ.
Beta Was this translation helpful? Give feedback.
All reactions