Getting started: practical fine-tuning steps

 

Getting started: practical fine-tuning steps

  • Define the task → collect/input–output pairs; if data is scarce, synthesize with prompts/templates.

  • Start small → try a 400M–1B parameter model to gauge baseline performance.

  • Vary data size to see returns from more examples.

  • Evaluate → inspect what works/doesn’t; iterate by collecting targeted new data.

Scaling difficulty

  • Reading vs. writing tasks: writing (chat, emails, code) is harder—more output tokens → usually needs larger models.

  • Composite/agentic tasks (multiple abilities in one go) further increase difficulty → consider bigger models.

Compute & hardware realities

  • Training needs far more memory than inference (gradients + optimizer states).

  • Example: a single V100 (16 GB) can infer a ~7B model but typically only train around ~1B params without tricks.

  • To fit larger models or speed up: use more GPUs or efficiency methods.

PEFT (Parameter-Efficient Fine-Tuning) & LoRA

  • PEFT adapts big models by training small add-on weights while freezing base weights.

  • LoRA (Low-Rank Adaptation):

    • Replaces full weight updates with low-rank matrices in selected layers.

    • Reported effects (example cited in lesson): up to 10,000× fewer trainable params and ~3× less GPU memory vs full fine-tune.

    • Slight accuracy drop vs full FT, but same inference latency (adapters can be merged into base weights).

    • Great for multi-tenant setups: train separate LoRA adapters per customer/task and swap/merge at inference.

Bottom line: Start with a smaller model and simple tasks, iterate with data and evaluation, then scale model size and complexity. Use PEFT/LoRA to train efficiently when hardware is tight or tasks multiply.