gpt3_tokenizer

Encoder/Decoder and tokens counter for GPT3


Keywords
openai, gpt, gpt-3, gpt3, gpt4, gpt-4, tokenizer
License
MIT
Install
pip install gpt3_tokenizer==0.1.5

Documentation

gpt3_tokenizer

An OpenAI GPT3 helper library for encoding/decoding strings and counting tokens.
Counting tokens gives the same output as OpenAI's tokenizer

Tested with versions: 2.7.12, 2.7.18 and all 3.x.x versions

Installing

pip install gpt3_tokenizer

Examples

Encoding/decoding a string

import gpt3_tokenizer

a_string = "That's my beautiful and sweet string"
encoded = gpt3_tokenizer.encode(a_string) # outputs [2504, 338, 616, 4950, 290, 6029, 4731]
decoded = gpt3_tokenizer.decode(encoded) # outputs "That's my beautiful and sweet string"

Counting tokens

import gpt3_tokenizer

a_string = "That's my beautiful and sweet string"
tokens_count = gpt3_tokenizer.count_tokens(a_string) # outputs 7