What are tokens in AI Models?
In AI language models, a token is a small piece of text, often a word, part of a word, or even punctuation. Different models tokenize differently, but the idea is the same: tokens are the units AI models read and write. For example:
Why did the banana go to the hospital? ( If you know the answer , comment :D )
When you type text into a language model, it doesn't understand raw text like humans do. It works with numbers.
Text → "Why did the banana go to the hospital?"
Tokenizer breaks it down into tokens
Each token is mapped to a token ID (a number). We will explore “How tokenizers works” in this series ) , For how each token assigned with ID.
These token IDs are what the model actually sees.
When you read “Claude can work with 200K tokens" mean?
That means Claude (an AI model by Anthropic) can handle over 200,000 tokens in one conversation or prompt.
To put that in perspective:
1,000 tokens ≈ 750 words
200,000 tokens ≈ 150,000 words (Roughly the length of a 500-page book!)
Key Points to Remember
API Usage: Most LLM APIs (like OpenAI's, Anthropic's, Google's) charge per token (both input and output). Sending a small question and getting a small answer through a model designed for massive contexts still incurs costs.
Specialization : Smaller models or models with moderate context windows can be more easily and cheaply fine-tuned on specific QA datasets, potentially outperforming a general-purpose large-context model on those specific tasks.
Avoid Irrelevant Information : If you're tempted to "fill up" the large context window with a lot of text just because you can, you might introduce noise or irrelevant information that could confuse the model or lead it to generate less precise answers for a simple question.
Latency (Speed): Processing more tokens takes more time. Even if your actual question and answer are short, a model designed for and handling larger context windows might have inherent architectural overheads that make it slower than a smaller, more specialized model.
What are Parameters in AI Models?
At their core, Parameters are the adjustable internal numbers that an AI model learns from data during training. These numbers store the model's knowledge and determine how it makes predictions or decisions.
Think of like neurons and connections in a brain.
During a process called model training, the model looks at tons of data, It makes guesses, gets feedback and then adjusts these 'knobs' (parameters) slightly. Technically parameters also called “Weights”
After training, the final settings of these parameters represent everything the AI model has 'learned' from the data and store these patterns and relationships. For example, specific parameter values (one knob details) might help the model distinguish a cat's pointy ears from a dog's floppy ones.
When you give the trained AI model new input, it uses these learned parameter settings to process that input and produce an output.
When someone says, "GPT-4 has 175 billion parameters," they’re talking about the model’s brainpower.
Based on Size/Scale (Parameter Count)
Small LLMs: (e.g., < 7 Billion parameters) - Can run on consumer hardware, good for specific tasks. Examples: DistilBERT, some smaller Llama/Mistral variants.
Medium LLMs: (e.g., 7B - 70B parameters) - Offer a good balance of capability and resource requirements. Examples: Llama 2 7B/13B/70B, Mistral 7B.
Large LLMs: (e.g., >100B parameters) - State-of-the-art performance, require significant computational resources. Examples: GPT-4, PaLM 2, Gemini Ultra.
Key Points to Remember
Larger Parameter Count = Higher API Costs & More Resources. Generally, models with more parameters are more expensive to use via an API. You might pay more per token processed or have higher base fees for accessing the more capable model.
Larger Parameter Count = Broader & More Nuanced Capabilities.
Larger models can generally handle more complex, open-ended, and nuanced tasks. They excel at deep reasoning, creative generation, understanding intricate instructions, and synthesizing information from larger contexts.
Larger Parameter Count = Generally Slower Inference (Higher Latency).
Processing information through billions of parameters takes more computational time than processing it through millions. This means responses from larger models will typically take longer to generate.
Smaller models are quicker and can provide near real-time responses, which is crucial for applications like interactive chatbots, quick data extraction, or autocomplete features.
Great summary!