Key Highlights
- Google introduced Flex and Priority as new Gemini API inference service tiers
- Flex tier provides 50% cost reduction for tasks that can tolerate delayed responses
- Priority tier charges 75–100% premium for mission-critical, real-time operations
- Batch API continues offering 50% savings with potential 24-hour processing delays
- Caching tier uses token-based pricing tied to storage duration
On April 2, Google rolled out a comprehensive update to its Gemini API pricing structure, introducing five separate service tiers: Standard, Flex, Priority, Batch, and Caching. This expansion provides developers with greater flexibility to optimize their applications by choosing the right balance between expenses, performance, and dependability.
The newly introduced Flex tier targets workloads that operate in the background and don’t require immediate responses. By leveraging underutilized computing resources during off-peak hours, it delivers a 50% price reduction compared to standard rates. Response latency varies between 1 and 15 minutes without service guarantees. Ideal applications include customer relationship management updates, academic research simulations, and autonomous agent workflows.
What sets Flex apart from the previously available Batch API is its synchronous endpoint design. Developers avoid the complexity of managing separate input/output file systems or checking job status repeatedly. The simplified architecture delivers identical cost benefits with less operational overhead.
On the opposite end, the Priority tier caters to applications demanding maximum reliability and speed. Priced at 75% to 100% above standard rates, it delivers response times ranging from milliseconds to just a few seconds.
Google positions Priority for use cases like real-time customer service chatbots, fraud prevention systems, and automated content review workflows. When Priority tier usage surpasses allocated quotas, additional requests automatically shift to the Standard tier instead of generating errors.
Complete Tier Overview
The pre-existing Batch API continues to operate at 50% below standard pricing, accommodating workloads that can wait up to 24 hours for completion. It remains the optimal choice for extensive offline data processing where immediacy isn’t required.
The Caching tier employs pricing calculated by token volume and content retention period. Google recommends this tier for conversational AI systems with extensive system prompts, recurring analysis of large video content, or search operations across substantial document repositories.
Both Flex and Priority tiers operate through the identical service_tier parameter within API calls. Developers can switch between tiers by adjusting a single configuration setting, and the API returns confirmation of which tier processed each request.
Flex tier access extends to all paid subscription users for GenerateContent and Interactions API calls. Priority tier availability is restricted to Tier 2 and Tier 3 paid accounts on the same API endpoints.
Developer Benefits
The consolidated interface represents the most significant advancement in this release. Previously, managing both background operations and interactive workloads necessitated maintaining separate architectures for synchronous and asynchronous processing. The updated system allows both workflow types to operate through identical synchronous endpoints.
Google positioned this enhancement as part of its broader strategy to enable sophisticated AI agent development, which frequently requires handling both delayed background tasks and time-critical interactive operations simultaneously.
The pricing update was announced by Gemini API product manager Lucia Loher alongside engineering lead Hussein Hassan Harrirou on April 2, 2026.


