Google unveiled two major AI advances, Gemini 3.5 Live Translate, a near-real-time speech translation system that supports more than 70 languages, and DiffusionGemma, an experimental text-generation model designed for very fast output using diffusion methods.
Real-time speech translation across 70+ languages
Gemini 3.5 Live Translate moves beyond traditional turn-by-turn translators by processing and translating speech continuously. The system stays only a few seconds behind a speaker, preserving natural features such as intonation, pacing, and pitch to make translations sound more fluid and conversational.
The feature is rolling out in three places, Google Translate app (Android and iOS) worldwide, Gemini Live API and Google AI Studio, in public preview for developers, and Google Meet, in private preview for select Workspace customers. Meet’s support expands speech translation from five languages to more than 70, enabling over 2,000 language combinations in a single meeting.
A new “listening mode” on Android lets users hear translations through the phone’s earpiece without headphones by holding the device to the ear like a call. Google also embeds SynthID watermarks in generated audio to help identify AI-produced speech.
DiffusionGemma: speeding up text generation
Separately, Google DeepMind introduced DiffusionGemma, a 26-billion-parameter mixture-of-experts model that generates text using diffusion techniques. Instead of predicting one token at a time, DiffusionGemma begins with noise and refines whole blocks of up to 256 tokens in parallel, an approach that produces text far more quickly.
Performance highlights include activating about 3.8 billion parameters during inference (out of 26 billion total), achieving more than 1,000 tokens per second on a single NVIDIA H100 GPU and roughly 700 tokens per second on a GeForce RTX 5090. The model weights are open‑sourced on Hugging Face under the Apache 2.0 license.
Trade-offs and intended use
Google cautions that DiffusionGemma is experimental and currently lags its standard Gemma 4 models on some quality benchmarks. The company recommends the model for speed-sensitive, local tasks such as rapid editing, interactive agent loops, and other workflows where latency matters more than perfect output quality.
Industry support and availability
NVIDIA optimized DiffusionGemma across its hardware, and the model has day-one support in prominent inference libraries including vLLM, Hugging Face Transformers, and Unsloth. Gemini 3.5 Live Translate is available to consumers via the Translate app today, with developer and enterprise integrations rolling out through previews.
Gemini 3.5 Live Translate could significantly lower language barriers in real-world conversations by making translated speech feel more natural and immediate. DiffusionGemma points to a new direction for language models, trading some quality for dramatic speed gains that unlock fast, local AI experiences on both consumer and data-center GPUs.

0 Comments