Mistral-Large 35GB quantization

This is similar to Llama 3 quants comparison, exl2 quantization, and Benchmarking effects of quantization, but different in that they are the ones who quantized the model, and also the ones documenting the benchmark results.

Interestingly, they share two more links, https://github.com/OpenGVLab/EfficientQAT and https://arxiv.org/abs/2407.11062, which is the codebase and paper for a new quantization method. This is not only utilizing an open source tool to make quantizations of new models, but also providing transparency to the tools that they used.

pstore

Explorer

Mistral-Large 35GB quantization

Graph View

Backlinks