Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
appvoid
's Collections
rewrite series
cool datasets
arco releases
cool spaces
cool datasets
updated
28 days ago
some interesting datasets to use for language modeling
Upvote
-
Sort: Collection
54rt1n/wikipedia-summary-dataset
Viewer
•
Updated
Sep 10, 2024
•
5.32M
•
52
•
3
appvoid/raw-corpus
Viewer
•
Updated
Feb 23, 2025
•
1.6M
•
33
•
1
pszemraj/simple_wikipedia
Viewer
•
Updated
Dec 29, 2025
•
238k
•
387
•
10
common-pile/youtube
Viewer
•
Updated
Jun 6, 2025
•
1.13M
•
93
•
12
srinivasbilla/self-instruct-base
Viewer
•
Updated
Jan 24, 2023
•
82.6k
•
23
•
5
agentlans/high-quality-english-sentences
Viewer
•
Updated
Oct 1, 2024
•
1.71M
•
347
•
37
agentlans/note-taking-v2
Viewer
•
Updated
Sep 22, 2025
•
17.6k
•
51
•
1
PleIAs/SYNTH
Viewer
•
Updated
May 6
•
68M
•
11.6k
•
269
wikimedia/structured-wikipedia
Viewer
•
Updated
May 19
•
10.5M
•
18.2k
•
384
Upvote
-
Sort: Collection
Share collection
View history
Collection guide
Browse collections