Open source alternatives to AI products
13 tools (filtered)
Hugging Face
Platform for sharing ML models and datasets
Hugging Face datasets library
Common Crawl
Open repository of web crawl data
EleutherAI
Large-scale diverse text dataset
LAION
Large-scale image-text datasets
Together
Open reproduction of LLaMA training data
AI2
AI2's open corpus for training LLMs
Hugging Face's 15T token web dataset
Open-source assistant training data
Stanford
Instruction-following dataset from Stanford
WizardLM
Evolved instruction dataset for LLMs
Community
Shared ChatGPT conversations dataset
LMSYS
Large-scale real-world LLM conversations