Google Launches Gemma 3n AI for Phones

featured-image

Google debuts Gemma 3n, a new AI model that can do complex multimodal tasks on phones without internet access. The AI model, revealed in early May 2025, is built for on-device and can run on just 2GB of RAM, making it viable for low-power hardware.

Gemma 3n is a huge leap forward for AI on mobile and edge devices. It accepts various types of input (audio, image, video, text), and does not rely on any cloud-based system. This allows for the development of efficient AI capabilities for offline, memory-constrained devices.

New MatFormer Architecture Enhances Flexibility
Central to Gemma 3n is MatFormer (short for Matryoshka Transformer) which enables the model to nest smaller, full-fledged sub-models inside larger ones—akin to Russian dolls. This approach allows developers to tune performance to hardware capabilities. There are two versions, one being E2B (which takes only 2GB) and the other is E4B (that takes about 3GB).

Despite having 5 to 8 billion parameters, both versions of Gemma 3n remain efficient. Techniques such as Per-Layer Embeddings (PLE) off-load processing tasks from the device's GPU onto the CPU, thus saving memory, increasing performance and enabling deeper and wider models. Another capability, know as KV Cache Sharing, accelerates long audio and videos, cutting latency times by around half and enabling real-time use cases such as voice assistants.

Speech and Vision Enhancements and Capabilities
Gemma 3n has an integrated audio encoder that is built on Google's Universal Speech Model which in turn allows functionalities, such as speech-to-text and language translation, to be had even without an internet connection. Initial tests indicate excellent translation between English and other European languages, among them Spanish, French, Italian and Portuguese.

For visual processing, the author used MobileNet-V5 – a lightweight vision encoder performing at 60 frames per second on video streams on the Google Pixel. This guarantees seamless and precise video capturing in actual time.

Gemma3n can be accessed via Hugging Face Transformers, Ollama, MLX, and llama by developers. cpp. As proof of its offline utility, Google has hosted the "Gemma 3n Impact Challenge" and even introduced a grant of $150,000 for the applications created by working with this model.

With 140+ languages supported and 35 languages can be understood by content, Gemma 3n brings about a real leap into accessible, private and efficient AI on mobile.