LLM Training Stages: Pre-Training → Mid-Training → Post-Training
pasted
Large Language Model (LLM) development happens across three major stages, each adding different capabilities.
Post-training is the final stage — and the focus of fine-tuning and reinforcement learning.
1️⃣ Pre-Training — Learning Raw Intelligence
Goal: Learn patterns of language by predicting the next token
Method: Train on vast text datasets (internet-scale) to predict the next word
How it works
Input: huge corpus of text
Task: next-token prediction
Example:“The sky is …” → blue
“The sun is setting, the sky is …” → orange
Key points
Starts with random weights
Training is very expensive (months of compute)
Learns concepts, associations, world knowledge
But still only predicts the next token — not helpful behavior
2️⃣ Mid-Training — Targeted Expansion
Goal: Continue training, but with curated datasets to refine abilities
What mid-training improves
| Capability | Example |
|---|---|
| New languages | teach Chinese |
| New modalities | add audio or images |
| Longer context windows | expand attention range |
Mid-training = continuous pre-training, done by a different internal team in many labs.
3️⃣ Post-Training — Making Models Helpful
Goal: Shape behavior so the model becomes usable, aligned, and reliable
Primary post-training methods
| Technique | Purpose |
|---|---|
| Fine-Tuning (SFT) | Teach exact desired outputs |
| Reinforcement Learning (RL/RLHF) | Reward good behavior & discourage bad responses |
Fine-Tuning (Supervised Fine-Tuning / SFT)
Provide input + correct output
Model learns how it should respond
Can be highly efficient using LoRA adapters
(trainable small layers instead of the whole model)
Reinforcement Learning (RL / RLHF)
Provide scores or rewards for generated responses
Often uses multiple models (reward model, policy model, etc.)
Enables models to:
follow human preferences
reason in multiple steps
improve based on feedback
RL is powerful but expensive.
Comparison Across Training Stages
| Stage | What the model is doing | What it gains |
|---|---|---|
| Pre-Training | Reads everything | raw intelligence |
| Mid-Training | Reads curated, advanced sources | languages, domains, modalities |
| Post-Training | Practices being a helpful assistant | aligned, safe, useful behavior |
Lab Exercise Context
In your lab, you will compare:
Base model (pre-trained)
Fine-tuned model
Reinforcement-trained model
You will observe:
Behavior change
Reasoning depth
Response helpfulness
Performance on math prompts & datasets
Stage Metaphor
| Stage | Analogy |
|---|---|
| Pre-Training | Reading an entire library |
| Mid-Training | Reading advanced curated textbooks |
| Post-Training | Becoming a polite, skilled tutor who can answer questions reliably |
What Comes Next
You’ll learn:
Why fine-tuning & RL work
How they differ
How to apply them efficiently
How to get practical control over LLM behavior