Google open-sources speedy DiffusionGemma text diffusion model

June 11, 2026

114

Google LLC today released DiffusionGemma, a large language model based on an emerging machine learning approach known as text diffusion.

The company says the algorithm can generate text four times faster than traditional LLMs. Furthermore, DiffusionGemma does so using less RAM. The model’s memory efficiency enables it to run on high-end consumer graphics cards that usually struggle to support LLMs.

DiffusionGemma’s text diffusion architecture is derived from a method that AI models use to generate images. The image generation workflow begins with a blurry photo that contains a type of error called Gaussian noise. An AI model removes a small portion of the noise, analyzes the enhanced photo and uses its findings to restore another batch of pixels. It then repeats the process until arriving at a usable image.

When DiffusionGemma receives a prompt, it generates a placeholder response that comprises random words. It then replaces a subset of the random text with words that will form part of its answer to the user’s prompt. DiffusionGemma reviews the edits, generates a few more words and repeats the process until its prompt response is ready.

AI models usually generate prompt responses one token at a time. DiffusionGemma’s text diffusion architecture, by contrast, enables it to produce 256 tokens at once. That parallelization is what makes the model faster than standard LLMs.

Google says that DiffusionGemma can generate more than 1,000 tokens per second when running on a single H100, a server-grade GPU that Nvidia Corp. launched in 2022. The model can generate over 700 tokens per second on the chipmaker’s desktop-grade GeForce RTX 5090 chip.

One reason DiffusionGemma can run on consumer GPUs is that it’s based on a mixture-of-experts architecture. The model includes 26 billion parameters but activates only 3.8 billion of them to answer the prompt, which lowers memory usage. DiffusionGemma further lowers RAM consumption by keeping information in a lightweight data format called NVFP4.

DiffusionGemma is based on an LLM called Gemma 4 26B A4B that Google released in April. To facilitate text diffusion, the search giant replaced the latter model’s attention mechanism, the software module it uses to interpret prompts. The original mechanism inferred the meaning of each word in a prompt by analyzing the preceding text. The new attention module also reviews the text that follows a given word.

“While the AI research community has explored diffusion-based text generation for years, applying it to large models has remained a challenge,” Google research scientists Brendan O’Donoghue and Sebastian Flennerhag wrote in a blog post today. “DiffusionGemma changes this by shifting how models use hardware.”

DiffusionGemma is available on Hugging Face under an open-source license.

Image: Google

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Google open-sources speedy DiffusionGemma text diffusion model

Image: Google

Must Read

AI Is Making 5 Human Skills More Valuable Than Ever. Most Companies Are Investing in the Wrong Thing

Home sales are positive but higher rates slowing demand

People Are More Skeptical of ‘Experts’ These Days. Here’s How You Actually Turn Your Experience Into Authority.

HPE targets HPC and AI infrastructure convergence as supercomputing scale meets enterprise AI

AMD targets physical AI computing with integrated platform for robotics and autonomous systems

(305) 677-3654

editor@miamibusinessmagazine.com

903 West 54th, Miami, FL 33127

Latest articles

AI Is Making 5 Human Skills More Valuable Than Ever. Most Companies Are Investing in the Wrong Thing

Home sales are positive but higher rates slowing demand

People Are More Skeptical of ‘Experts’ These Days. Here’s How You Actually Turn Your Experience Into Authority.

Popular Categories

Google open-sources speedy DiffusionGemma text diffusion model

Image: Google

RELATED ARTICLES

Must Read

(305) 677-3654

editor@miamibusinessmagazine.com

903 West 54th, Miami, FL 33127

Latest articles

Popular Categories