FIND-20260331-015 · 2026-03-31 · Innovation Veille

Gemini 3.1 Pro available on Vertex AI — 1M context, enhanced reasoning and agentic coding

release HIGH
Google released Gemini 3.1 Pro in preview on Vertex AI (March 2026), alongside Gemini 3.1 Flash-Lite. Gemini 3.1 Pro brings a 1M-token context window, improved SWE and agentic capabilities, and a new 'thinking_level' parameter (LOW/MEDIUM/HIGH) for cost/performance trade-offs. Flash-Lite is positioned as the fastest and most cost-efficient model at $0.25/1M input tokens with 2.5x faster time-to-first-token vs Gemini 2.5 Flash. Both models accept multimodal inputs (text, images, audio, video, PDFs, full code repositories). Available via Vertex AI, Gemini CLI, Gemini Enterprise, and Google AI Studio. Simultaneously, Google announced fractional G4 GPU VMs (1/2, 1/4, 1/8 slices) and NVIDIA Dynamo integration with GKE Inference Gateway as part of its AI Hypercomputer architecture unveiled at GTC 2026.

Source

https://cloud.google.com/blog/products/ai-machine-learning/gemini-3-1-pro-on-gemini-cli-gemini-enterprise-and-vertex-ai

ODS Impact

Direct relevance on multiple fronts: (1) ODS AI agents (ADLC, veille, spec-writer) run via Claude API today — Gemini 3.1 Pro on Vertex AI is a credible alternative or fallback, especially since ODS infrastructure is already on GCP e2-standard-4 VMs. (2) Gemini CLI could accelerate local dev workflows for the team. (3) Fractional G4 GPU VMs open the door to affordable on-GCP GPU inference for future AI-powered ODS features (e.g., document understanding in DocStore, smart PDF processing in PDF Engine) without renting a full GPU instance. (4) Flash-Lite pricing ($0.25/1M input) is competitive for high-volume agent tasks like BA reviews and daily veille scans. Recommend evaluating Gemini 3.1 Flash-Lite as a cost-optimized model for routine pipeline tasks.

Security Review

License: Google Gemini API Terms of Service (proprietary) | Maintenance: ACTIVE | Risk: LOW | Recommendation: SAFE_TO_USE

Tags

google-cloud gemini vertex-ai llm ai-agents gcp gpu inference gke nvidia