Optimize Whisper for fast inference

OP is from the Open Source Audio team at Hugging Face, who put together tips and tricks to get speedups while using Whisper.

A tangible word of gratitude is “Thanks I’ve been throwing Whisper on Runpod without taking the effort to optimise properly, this could save some money”

To a commenter asking “could you cobble together a standalone program I can drop an LLM into and interact with?”, OP replies “Yes! That’s on my list of projects for this week haha!”

Sometimes, problems completely unrelated to the original post get solved in the comments. For example, one asks how one can use the OP’s tool to process live audio, to which OP helped discover the ffmpeg_microphone_live() function. Similarly, while this post relies on optimizations that require a GPU, another commenter asks whether OP has any pointers to the fastest Whisper for low-end CPUs, to which they had an answer for.

pstore

Explorer

Optimize Whisper for fast inference

Graph View

Backlinks