Gemini 3.1 Flash-Lite: Adjustable Thinking Levels, 1M Token Context, and Batch API for Cost-Efficient AI Workloads
Google DeepMind released Gemini 3.1 Flash-Lite, a low-cost model in the Gemini API with a 1M-token context window and adjustable reasoning depth (minimal to high), letting developers trade off cost versus accuracy per request. It supports code execution, function calling, structured outputs, batch processing, and caching, making it practical for high-volume production pipelines. It is currently in preview via Google AI Studio and does not yet support audio/image generation or the Live API.