Developers of Unsloth, the fine-tuning library, enable:

  • 5x faster fine-tuning
  • with 50% less memory usage
  • and 0% loss in accuracy
  • all locally on NVIDIA GPUs… for free.

They are hand-deriving backpropagation steps, writing kernels in lower-level languages, and applying various mathematical optimizations. They not only share the repository as OSS, they have extensive documentation on their website and Jupyter notebook guides.