Comparing different whisper packages

OP simply shared the comparisons of open source Whisper packages that support long-form transcription. Some comments:

“I have been using whisper.cpp for a while. I guess I should try faster whisper and whisperX” (based on the benchmark results posted by OP)
“I love that you shared the notebook for running these benchmarks” (which isn’t standard practice)
“Have you tried distilled whisper v2? It was more accurate for me.”
- OP assumed not trying it because they tried whisper-large-v3, but the commenter says contrary to intuition, distilled whisper v2 is more accurate.
- The OP follows up: https://www.reddit.com/r/LocalLLaMA/comments/1brqwun/comment/kxfts9p/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

One of the commenters’ reason for being in the thread is interesting: “I research whisper for a company project I work. We use it for subtitling.”

OP also does a good job of updating the thread with more models: “Update: I benchmarked large-v3 and distill-large-v2. Here are the updated results with color formatting https://preview.redd.it/iv60rvqa1qrc1.png?width=1337&format=png&auto=webp&s=4954ababfbd98bffea555285bc048b437e513f98 You can find all the results as a csv file in the blog post.”

The most notable comment in this thread is from a Huggingface Transformers maintainer, who found in their benchmarks that it is possible to get the chunked algorithm to get within 1.5% absolute WER of OpenAI’s sequential algorithm, and that OP might have set the hyperparameters chunk_length_s and return_typestamps wrong. They are looking out for the community, but also thanking the OP for providing a useful resource.

pstore

Explorer

Comparing different whisper packages

Graph View

Backlinks