pstore

/

/

LLM fine tuning datasets

LLM fine-tuning datasets

Nov 30, 20251 min read

tags

model/finetune

post_id

t3_1cg2ce7

post_code

ffc

url

https://kweel.pythonanywhere.com/post/t3_1cg2ce7?file=topic_20.csv

github

https://github.com/mlabonne/llm-datasets

comment_codes

acc: 1

Nice! This is the author of Uncensor any LLM with abliteration, and post-tuning lead at LiquidAI.

The repo is a list of “good LLM datasets” for fine-tuning, hinging on three standards:

Accuracy: factual correctness
Diversity: cover many use cases
Complexity: multi-turn, multilinguality, well-written

The call to action here is to contribute if anyone finds it interesting. This indeed has 6 contributors at this moment, with 4k stars.

People recommend adding a column specifying the license for each dataset, and how the author evaluates models post-tuning.

Graph View

Backlinks

DIY AI Posts RQ3

Created with Quartz v4.5.2 © 2025

GitHub
Email