@yacinelearning: okay folks buckle up because this thursday we have @joelniklaus from @huggingface that will join us on stream to teach …
Summary
Joel Niklaus from Hugging Face will give a live stream on synthetic data's role in advancing pretraining; the team has also published a playbook on the topic.
View Cached Full Text
Cached at: 06/16/26, 03:15 AM
okay folks buckle up because this thursday we have @joelniklaus from @huggingface that will join us on stream to teach about how synthetic data is pushing pretraining forward
the team published a whole playbook on this topic so grab that as a pre-read and shoot your questions https://t.co/Uf0qGm4810
Similar Articles
@yacinelearning: very awesome resource from hugging face with available slides about how they generated 1T synthetic data a really cool …
Hugging Face shared slides detailing how they generated 1 trillion tokens of synthetic data for training foundation models.
@tom_doerr: Hugging Face deep reinforcement learning course with practical exercises https://github.com/huggingface/deep-rl-class…
Hugging Face offers a deep reinforcement learning course with practical exercises, now in low-maintenance state but still a useful resource for learning theory and hands-on DRL.
@qjoyliu: The future of training is open source. Super excited to announce that we've joined forces with HuggingFace, Nvidia, Met…
OpenEnv, a training environment, is being opened to the community with support from HuggingFace, Nvidia, Meta, and other leading companies.
@Thom_Wolf: Love this work from Aksel and the post-training team at Hugging Face! Turns out the HF ecosystem (papers, datasets, mod…
Hugging Face’s post-training team demonstrates how the HF ecosystem enables ML agents to autonomously train any AI model to peak performance.
@anyscalecompute: In this session, you'll learn: - Build and scale data pipelines with Ray - What is video data curation - Stream large d…
Anyscale is hosting a hands-on virtual lab session teaching developers how to build and scale data pipelines with Ray, covering video data curation, distributed GPU inference, and CPU/GPU streaming pipelines.