Question for ChatGPT: How do ChatGPT tokens work?

June 6, 2023 Vernon Budinger

Tokens are a fundamental concept in ChatGPT and other language models. A token can be as short as a single character or as long as a word, but in general, it represents a discrete unit of text. The model processes text in chunks or batches, and each chunk is made up of a certain number of tokens.

In ChatGPT, both input and output consume tokens. The total number of tokens affects the cost, response time, and even whether a conversation fits within the model's maximum limit. The specific details regarding tokens in ChatGPT are as follows:

Token count: Each message, including user messages and assistant responses, is tokenized into a sequence of tokens. For example, the sentence "How are you?" would be tokenized into four tokens: ["How", "are", "you", "?"].
Token limit: ChatGPT has a maximum token limit. If a conversation exceeds this limit, you will need to truncate or reduce the text to fit. Note that very long conversations are more likely to receive incomplete responses due to the token limit.
Cost: When using the OpenAI API, you are billed based on the total number of tokens in both input and output. Both messages sent to the API and received from it contribute to the token count for billing purposes.
Response time: Generating a response takes more time as the number of tokens increases. Longer responses generally require more computational resources and may take more time to receive a reply.
Interaction constraints: The total token count affects the ability to have back-and-forth conversations. Longer conversations can consume a significant portion of the available tokens, leaving less room for context and meaningful responses.

To manage tokens effectively, it's important to be mindful of the token limit, prioritize essential information, and be concise in your communication to ensure optimal performance and cost efficiency.