LLM Training Stages: Pre-Training → Mid-Training → Post-Training

 

LLM Training Stages: Pre-Training → Mid-Training → Post-Training

pasted

Large Language Model (LLM) development happens across three major stages, each adding different capabilities.
Post-training is the final stage — and the focus of fine-tuning and reinforcement learning.


1️⃣ Pre-Training — Learning Raw Intelligence

Goal: Learn patterns of language by predicting the next token
Method: Train on vast text datasets (internet-scale) to predict the next word

How it works

  • Input: huge corpus of text

  • Task: next-token prediction
    Example:

    “The sky is …” → blue
    “The sun is setting, the sky is …” → orange

Key points

  • Starts with random weights

  • Training is very expensive (months of compute)

  • Learns concepts, associations, world knowledge

  • But still only predicts the next token — not helpful behavior


2️⃣ Mid-Training — Targeted Expansion

Goal: Continue training, but with curated datasets to refine abilities

What mid-training improves

CapabilityExample
New languagesteach Chinese
New modalitiesadd audio or images
Longer context windowsexpand attention range

Mid-training = continuous pre-training, done by a different internal team in many labs.


3️⃣ Post-Training — Making Models Helpful

Goal: Shape behavior so the model becomes usable, aligned, and reliable

Primary post-training methods

TechniquePurpose
Fine-Tuning (SFT)Teach exact desired outputs
Reinforcement Learning (RL/RLHF)Reward good behavior & discourage bad responses

Fine-Tuning (Supervised Fine-Tuning / SFT)

  • Provide input + correct output

  • Model learns how it should respond

  • Can be highly efficient using LoRA adapters
    (trainable small layers instead of the whole model)


Reinforcement Learning (RL / RLHF)

  • Provide scores or rewards for generated responses

  • Often uses multiple models (reward model, policy model, etc.)

  • Enables models to:

    • follow human preferences

    • reason in multiple steps

    • improve based on feedback

RL is powerful but expensive.


Comparison Across Training Stages

StageWhat the model is doingWhat it gains
Pre-TrainingReads everythingraw intelligence
Mid-TrainingReads curated, advanced sourceslanguages, domains, modalities
Post-TrainingPractices being a helpful assistantaligned, safe, useful behavior

Lab Exercise Context

In your lab, you will compare:

  • Base model (pre-trained)

  • Fine-tuned model

  • Reinforcement-trained model

You will observe:

  • Behavior change

  • Reasoning depth

  • Response helpfulness

  • Performance on math prompts & datasets


Stage Metaphor

StageAnalogy
Pre-TrainingReading an entire library
Mid-TrainingReading advanced curated textbooks
Post-TrainingBecoming a polite, skilled tutor who can answer questions reliably

What Comes Next

You’ll learn:

  • Why fine-tuning & RL work

  • How they differ

  • How to apply them efficiently

  • How to get practical control over LLM behavior