The Secret Language of AI: What Exactly is a “Token”?
When you type a prompt into an AI like me, it feels like we are having a normal, human conversation. You write words, and AI write words back. But behind the scenes, AI models do not actually read English, Spanish or Chinese. They read numbers.
To bridge the gap between human language and computer code, AI uses a system of translation based on tokens. If you want to understand how AI works, why it sometimes makes weird mistakes, or how much it costs to run, you have to understand the token.
The Lego Analogy: Building Blocks of Language
The easiest way to understand tokens is to think of Lego bricks.
Imagine you want to build a Lego castle (a sentence). You do not just use one giant, castle-shaped piece. Instead, you build it using a collection of smaller bricks.
In the AI world, a token is a single Lego brick. When you feed a sentence to an AI, it breaks your text down into these individual bricks before processing it.
- Sometimes, a very common word is a single, large Lego brick. (e.g., “Apple” = 1 token).
- Sometimes, a complex or less common word is broken down into two or three smaller Lego bricks. (e.g., “Stethoscope” might be broken into “Steth” + “osc” + “ope” = 3 tokens).
- Sometimes, a token is just a single letter or a punctuation mark.
When the AI replies to you, it is simply predicting which Lego brick should logically come next, assembling the sentence one token at a time.
How Tokens are Calculated: The English vs. Chinese Divide
To truly understand how these Lego bricks are calculated, we have to look at how different languages are built. Tokens are created using a system called Byte-Pair Encoding, which looks for common patterns in data. Because modern AI was primarily developed in English-speaking countries, the “molds” for these Lego bricks were optimized for English.
Here is how that calculation plays out in the real world.
- The English Calculation: Spaces and Syllables
English is easy for an AI to parse because words are separated by spaces. The AI can look at a word, check its vocabulary, and assign a token. For English text, the standard rule of thumb is that 100 tokens roughly equals 75 words. If you type the phrase “Artificial Intelligence,” the AI recognizes two very common words and uses just 2 tokens. - The Chinese Calculation: The “Multilingual Tax”
Chinese token calculation is fundamentally different. Chinese text is written continuously without spaces, so the AI cannot easily rely on spaces to figure out where a concept begins and ends. Furthermore, in standard computer data (UTF-8 encoding), a single English letter takes up 1 byte of data, but a single Chinese character takes up 3 bytes.
Historically, because AI tokenizers were not trained on enough Chinese text, they did not have the right “Lego molds” for Chinese characters. Instead of recognizing a whole character, the AI would panic and break it down into its raw, underlying computer bytes.
This created a “Multilingual Tax.” Let’s look at the calculation difference for the exact same meaning:
| Language | Phrase | Meaning | Token Calculation |
| English | Artificial Intelligence | “Artificial Intelligence” | 2 tokens (Recognized as two whole words) |
| Chinese (Older AI) | 人工智能 | “Artificial Intelligence” | 4 to 8 tokens (Broken down into tiny, byte-level pieces) |
| Chinese (Modern AI) | 人工智能 | “Artificial Intelligence” | 2 to 3 tokens (Upgraded vocabulary recognizes characters) |
As you can see, early AI models had to use far more “bricks” to process Chinese than English. Fortunately, modern AI models have drastically expanded their vocabularies, learning to recognize Chinese characters as solid, single tokens rather than smashing them into tiny fragments.
Now, let’s look at token calculation from a different angle.
How Tokens are Calculated: ChatGPT vs. Gemini vs. Claude vs. DeepSeek
Before an AI model can be trained, its creators have to build a tokenizer. It’s essentially a giant, mathematical dictionary that tells the AI exactly how to chop up letters, words and spaces.
Every company builds their dictionary differently. They decide how many total tokens will exist in their master vocabulary (e.g., 50,000 vs. 200,000 distinct pieces) and exactly which text chunks get their own dedicated token.
Here is a look at how the major players handle it:
- ChatGPT (OpenAI): OpenAI uses the tiktoken tokenizer, and newer models like GPT‑4o use the o200k_base vocabulary (≈200k tokens). It is highly efficient for English and programming languages, while still supporting multilingual text well.
- Gemini (Google): Gemini uses SentencePiece, a tokenizer designed for multilingual and multimodal input. Because of this, its token counts, especially for non‑English languages, often differ noticeably from OpenAI’s.
- Claude (Anthropic): Claude uses a proprietary tokenizer optimized for efficient processing of long-form documents. This helps Claude make better use of its extremely large context windows.
- DeepSeek: DeepSeek’s tokenizer is optimized for Chinese, English, math and code. As a result, Chinese text often consumes fewer tokens in DeepSeek than in Western LLM tokenizers, though the exact savings depend on the content.
Every AI has its own unique way of cutting up human language. You cannot take a token count from one AI and apply it to another.
Why Does Any of This Matter?
Understanding tokens isn’t just a fun technical trivia fact. It directly impacts how you use AI.
- It is the Currency of AI: When developers build apps using AI, they do not pay by the word. They pay by the token. Knowing how tokens are calculated helps developers estimate costs.
- It Dictates AI Memory: Every AI has a “context window,” which is its short-term memory limit. If an AI has an 8,000-token limit, it will physically “forget” the beginning of your conversation once you pass that threshold. Because of how tokens are calculated, a conversation in English might last longer in the AI’s memory than a conversation in another language.
- It Explains AI Blind Spots: Have you ever asked an AI to count the number of “r”s in the word “strawberry,” only for it to confidently give you the wrong answer (if you have used AI in early days)? That is because the AI doesn’t see the letters s-t-r-a-w-b-e-r-r-y. It sees the token “straw” and the token “berry”. You are asking it to analyze the plastic of the Lego brick, rather than the brick itself.
Ultimately, tokens are the fundamental heartbeat of generative AI. By understanding how these digital Lego bricks are formed, measured and assembled, you gain a clearer picture of how artificial minds actually process the human world.