For LLM applications, we often use embedding models like ada-002 or davinci models. While using these models, we need to often estimate the number of tokens that would required for our application.
For the English language, a good thumb rule is that 3 to 4 chars make up a token.
A nifty online tool that can help you estimate the number of tokens is: https://www.quizrise.com/token-counter
No comments:
Post a Comment