Gemini 3.1 Flash-Lite: Built for intelligence at scale

Gemini 3.1 Flash-Lite: Built for intelligence at scale — Google DeepMind News
Source: Google DeepMind News

Gemini 3.1 Flash-Lite is now available in preview to developers via the Gemini API in Google AI Studio and to enterprises via Vertex AI. Built for high-volume developer workloads, it is the fastest and most cost-efficient model in the Gemini 3 series. Priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens, 3.1 Flash-Lite delivers enhanced performance at a fraction of the cost of larger models.

The Artificial Analysis benchmark shows a 2.5X faster Time to First Answer Token and a 45% increase in output speed over 2.5 Flash while maintaining similar or better quality, offering the low latency needed for high-frequency workflows and responsive, real-time experiences.

It scores 1432 Elo on the Arena.ai Leaderboard and posts 86.9% on GPQA Diamond and 76.8% on MMMU Pro, outperforming some prior-generation Gemini models.

gemini 3.1, flash-lite, gemini api, google ai, vertex ai, input tokens, output tokens, pricing, low latency, benchmarks