Fine-tuning massive LLMs used to be painfully slow, but not anymore!
4 open source libraries that accelerate fine-tuning of Large Language Models
1. Unsloth AI
• Fine-tune models like Qwen3, Llama 4, and Gemma 3 up to 2× faster with 70% less VRAM
• Uses optimized Triton kernels and manual backprop for exact accuracy
• Supports low-resource setups and runs on consumer GPUs or even Colab/Kaggle with ~3 GB VRAM
GitHub repo →
2. LLaMA Factory
• Fine-tune over 100 models (LLaMA, Mistral, Gemma, etc.) using a simple CLI or WebUI
• Supports LoRA, QLoRA, full or frozen fine-tuning across 2–8‑bit precision
• Includes built-in dataset templates, training monitors, and model export options
GitHub repo →
3. DeepSpeed
• Built for large-scale distributed fine-tuning with ZeRO and FSDP
• Optimized for multi-GPU and multi-node training with advanced memory management
• Trusted in production environments for scalable LLM training
GitHub repo →
4. Axolotl
• Yaml-based setup for fine-tuning, LoRA/QLoRA, DPO, GRPO, and multimodal workflows
• Includes kernel optimizations for memory-efficient training
• Actively maintained with support for Hugging Face, model export, and inference
GitHub repo →
显示更多