Benchmarking effects of quantization

Begins with “Like many of you”, who want to understand how much model quality they are giving up by using quants. They find that FP is significantly better, but Q4 can outperform Q8, Q6 and Q5, and that there is major drop-off in performance below Q4.

They pose some questions regarding their research back to the community, like “does this trend hold true for Llama3-70B?” or “Can this test be formalized into an automatic script?”

So this kind of post can work as a flywheel to get the community building on top of one contribution.

pstore

Explorer

Benchmarking effects of quantization

Graph View

Backlinks