Count tokens

Simple tool that have one purpose - count tokens in a text file.

Requirements

This package is using tiktoken library for tokenization.

Installation

For usage from comman line install the package in isolated environement with pipx:

$ pipx install count-tokens

or install it in your current environment with pip.

Usage

Open terminal and run:

$ count-tokens document.txt

You should see something like this:

File: document.txt
Encoding: cl100k_base
Number of tokens: 67

if you want to see just the tokens count run:

$ count-tokens document.txt --quiet

and the output will be:

NOTE: tiktoken supports three encodings used by OpenAI models:

Encoding name	OpenAI models
`cl100k_base`	`gpt-4`, `gpt-3.5-turbo`, `text-embedding-ada-002`
`p50k_base`	Codex models, `text-davinci-002`, `text-davinci-003`
`r50k_base` (or `gpt2`)	GPT-3 models like `davinci`

to use token-count with other than default cl100k_base encoding use the additional input argument -e or --encoding:

$ count-tokens document.txt -e r50k_base

In case you need the results a bit faster and you don't need the exact number of tokens you can use the --approx parameter with w to have approximation based on number of words or c to have approximation based on number of characters.

$ count-tokens document.txt --approx w

It is based on assumption that there is 4/3 (1 and 1/3) tokens per word and 4 characters per token.

## Programmatic usage

```python
from count_tokens import count_tokens_in_file

num_tokens = count_tokens_in_file("document.txt")

from count_tokens import count_tokens_in_string

num_tokens = count_tokens_in_string("This is a string.")

tiktoken - tokenization library used by this package

Credits

Thanks to the authors of the tiktoken library for open sourcing their work.

count-tokens
Release 0.7.0

Release 0.7.0

0.4.0

0.2.0

0.3.0

0.1.0

0.7.0

0.6.0

0.5.0

Documentation

Count tokens

Requirements

Installation

Usage

Approximate number of tokens

Credits

License

Stats

Development practices

Releases

Contributors

count-tokens Release 0.7.0

Release 0.7.0 Toggle Dropdown 0.4.0 0.2.0 0.3.0 0.1.0 0.7.0 0.6.0 0.5.0