@Azure: Three open-source image models, one platform. Microsoft Foundry and Hugging Face bring developers the largest catalog f…

X AI KOLs Following 05/19/26, 07:00 PM Tools

open-source image-generation microsoft-azure hugging-face foundry platform

Summary

Microsoft Foundry integrates three open-source image models (SDXL, FLUX.1-schnell, and Z-Image-Turbo) via Hugging Face, offering developers a unified platform for AI image generation.

Three open-source image models, one platform. Microsoft Foundry and Hugging Face bring developers the largest catalog for AI innovation. Build with Stability AI's SDXL, Black Forest Labs' FLUX.1-schnell, and Tongyi-MAI's Z-Image-Turbo in Foundry today: https://t.co/ceTA6AF0we https://t.co/p7ioLwIch8

Original Article

View Cached Full Text

Cached at: 05/23/26, 10:12 PM

Three open-source image models, one platform. Microsoft Foundry and Hugging Face bring developers the largest catalog for AI innovation.

Build with Stability AI’s SDXL, Black Forest Labs’ FLUX.1-schnell, and Tongyi-MAI’s Z-Image-Turbo in Foundry today: https://t.co/ceTA6AF0we https://t.co/p7ioLwIch8

Now in Foundry: Tongyi-MAI Z-Image-Turbo, with FLUX.1-schnell and SDXL base 1.0

Source: https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/now-in-foundry-tongyi-mai-z-image-turbo-with-flux-1-schnell-and-sdxl-base-1-0/4520199 This week’s Model Mondays edition pairs three models available through theHugging Face collectioninMicrosoft Foundry:**Tongyi-MAI’s Z-Image-Turbo,**a new designed for lower latency on a single GPU and native bilingual text rendering;**Black Forest Labs’ FLUX.1-schnell,a 12B rectified flow transformer distilled to 1–4 step inference and one of the most adopted open-weight image models since its 2024 release; andStability AI’s stable-diffusion-xl-base-1.0 (SDXL),**a latent diffusion research model that can be used to generate and modify images based on text prompts.

Model Specs

Parameters / size: 6B (BF16)
Resolution: Up to 1024×1024 native
Primary task: Text-to-image generation (English and Chinese)

Why it’s interesting (Spotlight)

**Scalable Single-Stream Diffusion Transformer (S3-DiT) architecture:**Z-Image concatenates text tokens, visual semantic tokens, and image VAE tokens into a single unified input stream rather than running text and image through separate branches. This single-stream design can improve parameter efficiency relative to dual-stream DiT architectures at the same capacity. See theZ-Image technical reportfor details.
**8-step inference at sub-second latency, fits in 16GB VRAM:**Z-Image-Turbo is distilled with Decoupled Distribution Matching Distillation (Decoupled-DMD) and further refined with DMDR, a method that fuses DMD with reinforcement learning during post-training. The result is a model that runs 8 Number-of-Function-Evaluations (NFE) per image with no Classifier-Free Guidance (CFG)—which roughly halves the per-step compute compared to CFG-based inference. See theDecoupled-DMDandDMDRpapers.
**Native bilingual text rendering and strong instruction adherence:**Unlike most open-weight image models, which struggle with legible in-image text, Z-Image-Turbo renders complex English and Chinese text accurately which is useful for posters, signage, packaging mockups, and marketing creative.

Try it

Figure 1. Cherry cake generated by Z-Image-TurboFigure 2. Using the original image to create a poster for marketing material Imagine you’re a community programs coordinator at your city’s parks department, planning a new summer event series — a “Cake Picnic in the Park” — designed to bring neighbors together over food in shared green space. The event is a few weeks out. You haven’t booked bakery partners yet, so no actual cake exists, and you need marketing assets this week to start driving sign-ups: a hero image for the registration page, a flyer for community centers and libraries, social tiles for the city’s channels. Use the prompt below and a photorealistic image, that can now be scaled to become additional assets like printed flyers or social images in minutes using image editing tools (or another model).

Prompt: A round layered cake displayed on a white ceramic cake stand, topped with glossy fresh red cherries and smooth pastel pink buttercream frosting piped in delicate rosettes around the edge. One generous slice has been cleanly cut and removed from the front, revealing a perfect cross-section: four distinct horizontal layers alternating between soft pink sponge cake and fluffy white vanilla cream frosting. Professional bakery photography, soft natural window light from the left, shallow depth of field, marble countertop, warm and inviting atmosphere, photorealistic detail on the cake texture, cherry highlights, and frosting swirls.

Model Specs

Parameters / size: 12B (rectified flow transformer)
Resolution: Flexible up to 2 megapixels
Primary task: Text-to-image generation

Why it’s interesting (Spotlight)

**Rectified flow transformer with adversarial distillation for 1–4 step inference:**FLUX.1-schnell is the distilled, Apache 2.0 sibling of the FLUX.1 family. It uses a rectified flow formulation (a diffusion variant that learns straight-line probability paths between noise and data, reducing the number of solver steps needed) and is further compressed with latent adversarial diffusion distillation. The model generates high quality images in for latency-sensitive workloads.
**Permissive licensing for commercial use:**Released under Apache 2.0, FLUX.1-schnell can be used for personal, scientific, and commercial purposes. This has driven broad adoption across product features that need an open, redistributable image backbone.
Strong prompt adherence at its parameter range: At 12B parameters, FLUX.1-schnell sits between the SDXL family and frontier proprietary image models, and it remains a common reference point for evaluating open image generation prompt following—particularly for complex compositional prompts and longer captions—roughly two years after its initial release.

Try it

Hugging Face Spaces give developers the ability to experiment and try new models before deploying them. Test out a few prompts here:

https://black-forest-labs-flux-1-schnell.hf.spacethen when you are ready, deploy the model in Microsoft Foundry.

Figure 2. Architectural diagram available here:stabilityai/stable-diffusion-xl-base-1.0 · Hugging Face

Model Specs

Parameters / size: 2.6B UNet (≈3.5B total with text encoders)
Resolution: 1024×1024 native
Primary task: Text-to-image generation

Why it’s interesting (Spotlight)

**Dual text encoder design and an ensemble-of-experts pipeline:**SDXL uses two pretrained text encoders—OpenCLIP-ViT/G and CLIP-ViT/L—concatenated to capture both broad semantic alignment and finer-grained token-level cues. It can be run standalone or paired with theSDXL refinerin an ensemble-of-experts pipeline where the base model handles early denoising and the refiner specializes in the final steps. See theSDXL reportfor the original training and architecture details.
**CreativeML Open RAIL++-M licensing for managed deployments:**SDXL is distributed under the CreativeML Open RAIL++-M license, which permits commercial use and downstream fine-tuning with documented use restrictions.

Try it

To go deeper on SDXL, take a look at Stability AI’sgenerative-models GitHub repository, which implements the most popular diffusion frameworks for both training and inference and continues to expand with new capabilities like distillation.

You can deploy open-source Hugging Face models directly in Microsoft Foundry in two ways. The first by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. The second way is direct through the Hugging Face Hub, select any supported model and then choose “Deploy on Microsoft Foundry”, which brings you straight into Azure. Learn how to discover models and deploy them using Microsoft Foundry documentation:

@Azure: Three open-source image models, one platform. Microsoft Foundry and Hugging Face bring developers the largest catalog f…

Now in Foundry: Tongyi-MAI Z-Image-Turbo, with FLUX.1-schnell and SDXL base 1.0

Similar Articles

Hugging Face Models on Foundry Managed Compute

@HuggingPapers: Microsoft just released Lens on Hugging Face A 3.8B parameter text-to-image model delivering efficient training and hig…

microsoft/Mage-Flow

Microsoft's New MAI-Image and MAI-Voice (2 minute read)

@HowToAI_: Microsoft has released a 4B parameter model that turns any image into a 3D asset in 3 seconds. It uses a new geometry f…

Submit Feedback

Similar Articles

Hugging Face Models on Foundry Managed Compute

@HuggingPapers: Microsoft just released Lens on Hugging Face A 3.8B parameter text-to-image model delivering efficient training and hig…

Microsoft's New MAI-Image and MAI-Voice (2 minute read)

@HowToAI_: Microsoft has released a 4B parameter model that turns any image into a 3D asset in 3 seconds. It uses a new geometry f…