Thursday, June 20, 2024

Calculating tokens for words

For LLM applications, we often use embedding models like ada-002 or davinci models. While using these models, we need to often estimate the number of tokens that would required for our application. 

For the English language, a good thumb rule is that 3 to 4 chars make up a token. 

A nifty online tool that can help you estimate the number of tokens is: https://www.quizrise.com/token-counter


No comments:

Post a Comment