Gemini 3.1 Flash-Lite: Built for intelligence at scale
Gemini 3.1 Flash-Lite is now available in preview to developers via the Gemini API in Google AI Studio and to enterprises via Vertex AI. Built for high-volume developer workloads, it is the fastest and most cost-efficient model in the Gemini 3 series. Priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens, 3.1 Flash-Lite delivers enhanced performance at a fraction of the cost of larger models.
The Artificial Analysis benchmark shows a 2.5X faster Time to First Answer Token and a 45% increase in output speed over 2.5 Flash while maintaining similar or better quality, offering the low latency needed for high-frequency workflows and responsive, real-time experiences.
It scores 1432 Elo on the Arena.ai Leaderboard and posts 86.9% on GPQA Diamond and 76.8% on MMMU Pro, outperforming some prior-generation Gemini models.
gemini 3.1, flash-lite, gemini api, google ai, vertex ai, input tokens, output tokens, pricing, low latency, benchmarks