OP simply shared the comparisons of open source Whisper packages that support long-form transcription. Some comments:

One of the commenters’ reason for being in the thread is interesting: “I research whisper for a company project I work. We use it for subtitling.”

OP also does a good job of updating the thread with more models: “Update: I benchmarked large-v3 and distill-large-v2. Here are the updated results with color formatting https://preview.redd.it/iv60rvqa1qrc1.png?width=1337&format=png&auto=webp&s=4954ababfbd98bffea555285bc048b437e513f98 You can find all the results as a csv file in the blog post.”

The most notable comment in this thread is from a Huggingface Transformers maintainer, who found in their benchmarks that it is possible to get the chunked algorithm to get within 1.5% absolute WER of OpenAI’s sequential algorithm, and that OP might have set the hyperparameters chunk_length_s and return_typestamps wrong. They are looking out for the community, but also thanking the OP for providing a useful resource.