LLM, et al: Getting started: practical fine-tuning steps

Getting started: practical fine-tuning steps

Define the task → collect/input–output pairs; if data is scarce, synthesize with prompts/templates.
Start small → try a 400M–1B parameter model to gauge baseline performance.
Vary data size to see returns from more examples.
Evaluate → inspect what works/doesn’t; iterate by collecting targeted new data.

Scaling difficulty

Reading vs. writing tasks: writing (chat, emails, code) is harder—more output tokens → usually needs larger models.
Composite/agentic tasks (multiple abilities in one go) further increase difficulty → consider bigger models.

Compute & hardware realities

Training needs far more memory than inference (gradients + optimizer states).
Example: a single V100 (16 GB) can infer a ~7B model but typically only train around ~1B params without tricks.
To fit larger models or speed up: use more GPUs or efficiency methods.

PEFT (Parameter-Efficient Fine-Tuning) & LoRA

PEFT adapts big models by training small add-on weights while freezing base weights.
LoRA (Low-Rank Adaptation):
- Replaces full weight updates with low-rank matrices in selected layers.
- Reported effects (example cited in lesson): up to 10,000× fewer trainable params and ~3× less GPU memory vs full fine-tune.
- Slight accuracy drop vs full FT, but same inference latency (adapters can be merged into base weights).
- Great for multi-tenant setups: train separate LoRA adapters per customer/task and swap/merge at inference.

Bottom line: Start with a smaller model and simple tasks, iterate with data and evaluation, then scale model size and complexity. Use PEFT/LoRA to train efficiently when hardware is tight or tasks multiply.