modelintermediateGeneral AI

Gemini 3.1 Flash-Lite: Adjustable Thinking Levels, 1M Token Context, and Batch API for Cost-Efficient AI Workloads

Google DeepMind released Gemini 3.1 Flash-Lite, a low-cost model in the Gemini API with a 1M-token context window and adjustable reasoning depth (minimal to high), letting developers trade off cost versus accuracy per request. It supports code execution, function calling, structured outputs, batch processing, and caching, making it practical for high-volume production pipelines. It is currently in preview via Google AI Studio and does not yet support audio/image generation or the Live API.

AI SETUP PROMPT

Paste into Claude Code or Codex CLI — it will scan your project and set everything up

# Evaluate Model: Gemini 3.1 Flash-Lite: Adjustable Thinking Levels, 1M Token Context, and Batch API for Cost-Efficient AI Workloads

## What This Is
Google DeepMind released Gemini 3.1 Flash-Lite, a low-cost model in the Gemini API with a 1M-token context window and adjustable reasoning depth (minimal to high), letting developers trade off cost versus accuracy per request. It supports code execution, function calling, structured outputs, batch processing, and caching, making it practical for high-volume production pipelines. It is currently in preview via Google AI Studio and does not yet support audio/image generation or the Live API.

Source: https://ai.google.dev/gemini-api/docs/changelog

## Before You Start

Scan my workspace and analyze:
- The project language, framework, and current AI integrations
- Existing AI provider config (check .env, .env.local, config files for API keys — OpenRouter, OpenAI, Anthropic, Google AI, etc.)
- Which AI models I currently use and for what purposes

Then ask me before proceeding:
1. Am I interested in evaluating this model for my project, or just want a summary of what it offers?
2. If I want to try it — which part of my current AI stack should it replace or complement?

## Source Access Note

The source URL (https://ai.google.dev/gemini-api/docs/changelog) may not be directly accessible from the terminal. Use the Reference Implementation and Additional Context sections below instead. If you need more details, ask me to paste relevant content from the source.

## What to Implement

This is a **New AI Model** — a model release, update, or capability announcement.

- Analyze the best use cases for this model within my project and current AI stack
- Compare its strengths, pricing, and context window against whatever I currently use
- Give me a clear, convincing argument for why this model would (or would not) be a good fit for my project
- If I want to try it: update my API configuration (provider, model ID, any new parameters) to point to this model
- If it requires a new API key or provider signup, tell me exactly what to do

## Additional Context

- Open Google AI Studio at aistudio.google.com, select Gemini 3.1 Flash-Lite from the model dropdown, and run a test prompt with thinking level set to 'minimal' versus 'high' to measure latency and output quality differences on your actual use case.
- Call the Gemini API with the thinking parameter set explicitly (e.g., thinking_config: {thinking_budget: 'low'}) on a batch of 10–20 real production inputs, then compare token costs and accuracy against your current model to quantify savings.
- Enable batch API mode for an existing repetitive task—such as document summarization or structured data extraction—by submitting requests asynchronously and benchmarking throughput and cost per 1K tokens against synchronous calls.

## Guidelines

- Adapt everything to my existing project — do not assume a specific stack or directory layout
- Use whichever AI provider I already have configured; if I need a new one, tell me what to sign up for and I'll give you the key
- Check my .env files for existing API keys (OpenRouter, OpenAI, Anthropic, Google AI) before asking me to add one
- Review any fetched code for safety before installing or executing it
- After setup, run a quick verification and show me a summary of exactly what was installed, where, and how to use it

3,398 charactersCompatible with Claude Code & Codex CLI

MANUAL SETUP STEPS

01Open Google AI Studio at aistudio.google.com, select Gemini 3.1 Flash-Lite from the model dropdown, and run a test prompt with thinking level set to 'minimal' versus 'high' to measure latency and output quality differences on your actual use case.
02Call the Gemini API with the thinking parameter set explicitly (e.g., thinking_config: {thinking_budget: 'low'}) on a batch of 10–20 real production inputs, then compare token costs and accuracy against your current model to quantify savings.
03Enable batch API mode for an existing repetitive task—such as document summarization or structured data extraction—by submitting requests asynchronously and benchmarking throughput and cost per 1K tokens against synchronous calls.

FIELD OPERATIONS

Contract Review Pipeline with Adaptive Reasoning

Build a document analysis tool that ingests up to 1M tokens of legal or procurement contracts, uses 'low' thinking for routine clause extraction and 'high' thinking only for flagged risk clauses, and outputs structured JSON summaries—cutting inference costs by routing effort appropriately.

High-Volume E-commerce Product Tagging Service

Create a batch classification service that sends thousands of product descriptions to Gemini 3.1 Flash-Lite via the Batch API with 'minimal' thinking, auto-generates structured category tags and SEO metadata, and caches repeated product types to reduce redundant API calls and cost.

STRATEGIC APPLICATIONS