@ClementDelangue: AI teams shouldn’t have to choose between expensive object storage and painful git workflows. @huggingface Storage is b…
Summary
Hugging Face launches Storage Buckets, a purpose-built storage solution for AI teams offering per-TB pricing, built-in CDN, and Xet deduplication for model weights, datasets, and checkpoints.
View Cached Full Text
Cached at: 05/17/26, 03:27 AM
AI teams shouldn’t have to choose between expensive object storage and painful git workflows.
@huggingface Storage is built for model weights, datasets, checkpoints and artifacts:
- simple per-TB pricing
- built-in CDN
- Xet deduplication
- private by default when needed
Store your AI data where your AI work already happens:
Storage - Hugging Face
Source: https://huggingface.co/storage
Hugging Face Storage BucketsStorage Buckets
Store models, datasets, and artifacts with simple per-TB pricing. Built-in CDN, Xet deduplication, and no git overhead.
Trusted by more than10,000AI teams
Storage
Storage built for AI teams
Store models, datasets, and artifacts with simple per-TB pricing. Xet deduplication. Included CDN. No git overhead.
- Per-TB pricing with built-in CDN and deduplication speedups.
- No Git constraints: commit-free sync and fast object updates.
- Designed for ML workflows: datasets, checkpoints, model artifacts.
Xet Technology
Next-gen large-scale storage for AI
Xet uses content-defined chunking to break files into byte-level chunks and deduplicates across your entire bucket. When you retrain a model and only 5% of weights change, only that 5% is re-uploaded.
- Raw + processed dataset: stored once, billed once*
- 4x less data per upload, verified with real-world workloads
*RequiresEnterprise or Enterprise Plusplan
Pricing
Transparent, volume-based pricing
Simple per-TB pricing that scales with usage. Egress and CDN are included at no extra cost.
Data Storage
Assemble training data at any scale
Pour raw data from every source into a single bucket: crawls, annotations, synthetic outputs, partner datasets. No git overhead, no commit queues, no file-count limits. When training begins, your data is already there, streamed to GPUs via the included CDN.
- Immediate availability on upload, no queued commits
- Batch API processes thousands of files in a single call
- Raw + processed datasets with dedup = no double billing*
*RequiresEnterprise or Enterprise Plusplan
CDN
Built-in CDN for blazing fast access
Every bucket includes a CDN. Warm localized cache close to where you compute for ultra fast streaming and downloads. Egress is included up to a generous 8:1 ratio of your total storage.
- Pre-warm cache in any cloud region you need
- Our CDN is deployed inside GCP and AWS networks
- Egress included up to 8:1 your storage
More providers coming soon
Coding Agents
Give your coding agents persistent storage
Coding agents run in ephemeral environments, but their outputs shouldn’t vanish. Checkpoints, benchmark results, generated datasets: onehf synccommand in your agent’s bash tool is all it takes.
- Pre-warmed CDN and no git overhead for fast reads and writes
- Persist artifacts across ephemeral CI runs and terminal sessions
- Install the officialHF CLI skilland your agent knows every command
AES-256 Encryption
End-to-end encryption at rest and in transit
Audit Logs
Full visibility into every access event
SSO & RBAC
Enterprise SSO with role-based access control
US & EU Regions
Choose where your data lives
|
Get started withHF Storage BucketsHF Storage Buckets
Start with buckets, sync your AI data, and unlock object storage built for ML workflows.
Similar Articles
Introducing Storage Buckets on the Hugging Face Hub
Hugging Face introduces Storage Buckets, a new mutable, S3-like object storage feature on the Hub optimized for production ML workflows using its Xet backend for efficient deduplication.
@ClementDelangue: Great to see @CommonCrawl using and recommending @huggingface Buckets for large constantly evolving training datasets! …
Hugging Face announces Storage Buckets, a storage solution for large, evolving training datasets with built-in CDN and deduplication, recommended by CommonCrawl.
@adithya_s_k: HF storage buckets are so underrated and makes life so much simpler if you're doing anything with data at scale. Before…
Hugging Face storage buckets are praised as a cost-effective and simple solution for large-scale data management, avoiding high egress costs of other providers.
@ClementDelangue: The scale of the infra on HF is insane. If you're still hosting models, datasets, agent memory,... in S3 or R2, talk to…
Clement Delangue promotes Hugging Face's infrastructure for hosting models, datasets, and agent memory, claiming it's better, faster, cheaper, and safer than S3 or R2.
@ClementDelangue: I believe on-prem and local AI - based on @huggingface open-source models - will be an important answer to the GPU shor…
Clement Delangue announces a partnership between Hugging Face and Dell to enable on-prem and local AI using open-source models, addressing GPU shortages for enterprise customers, unveiled at Dell Technologies World.