LLM Training Stages: Pre-Training → Mid-Training → Post-Training

pasted

Large Language Model (LLM) development happens across three major stages, each adding different capabilities.
Post-training is the final stage — and the focus of fine-tuning and reinforcement learning.

1️⃣ Pre-Training — Learning Raw Intelligence

Goal: Learn patterns of language by predicting the next token
Method: Train on vast text datasets (internet-scale) to predict the next word

How it works

Input: huge corpus of text
Task: next-token prediction
Example:
“The sky is …” → blue
“The sun is setting, the sky is …” → orange

Key points

Starts with random weights
Training is very expensive (months of compute)
Learns concepts, associations, world knowledge
But still only predicts the next token — not helpful behavior

2️⃣ Mid-Training — Targeted Expansion

Goal: Continue training, but with curated datasets to refine abilities

What mid-training improves

Capability	Example
New languages	teach Chinese
New modalities	add audio or images
Longer context windows	expand attention range

Mid-training = continuous pre-training, done by a different internal team in many labs.

3️⃣ Post-Training — Making Models Helpful

Goal: Shape behavior so the model becomes usable, aligned, and reliable

Primary post-training methods

Technique	Purpose
Fine-Tuning (SFT)	Teach exact desired outputs
Reinforcement Learning (RL/RLHF)	Reward good behavior & discourage bad responses

Fine-Tuning (Supervised Fine-Tuning / SFT)

Provide input + correct output
Model learns how it should respond
Can be highly efficient using LoRA adapters
(trainable small layers instead of the whole model)

Reinforcement Learning (RL / RLHF)

Provide scores or rewards for generated responses
Often uses multiple models (reward model, policy model, etc.)
Enables models to:
- follow human preferences
- reason in multiple steps
- improve based on feedback

RL is powerful but expensive.

Comparison Across Training Stages

Stage	What the model is doing	What it gains
Pre-Training	Reads everything	raw intelligence
Mid-Training	Reads curated, advanced sources	languages, domains, modalities
Post-Training	Practices being a helpful assistant	aligned, safe, useful behavior

Lab Exercise Context

In your lab, you will compare:

Base model (pre-trained)
Fine-tuned model
Reinforcement-trained model

You will observe:

Behavior change
Reasoning depth
Response helpfulness
Performance on math prompts & datasets

Stage Metaphor

Stage	Analogy
Pre-Training	Reading an entire library
Mid-Training	Reading advanced curated textbooks
Post-Training	Becoming a polite, skilled tutor who can answer questions reliably

What Comes Next

You’ll learn:

Why fine-tuning & RL work
How they differ
How to apply them efficiently
How to get practical control over LLM behavior