Reference rapide couvrant les modeles Gemini (3.x, 2.5, 1.5), l'API Gemini, Google AI Studio, le multimodal, le pricing, les thinking models, le grounding, le context caching, et l'ecosysteme Google AI. Donnees a jour : fevrier 2026.
| Modele | ID | Contexte | Force |
|---|---|---|---|
| Gemini 3 Pro | gemini-3.0-pro | 2M | Raisonnement avance |
| Gemini 3 Flash | gemini-3.0-flash | 1M | Vitesse + qualite |
| Modele | ID | Contexte | Force |
|---|---|---|---|
| 2.5 Pro | gemini-2.5-pro-preview-05-06 | 1M | Best-in-class coding |
| 2.5 Flash | gemini-2.5-flash-preview-05-20 | 1M | Thinking budget |
| 2.5 Flash-Lite | gemini-2.5-flash-lite | 1M | Ultra-economique |
| Modele | ID | Contexte | Force |
|---|---|---|---|
| 1.5 Pro | gemini-1.5-pro | 2M | Long contexte pionnier |
| 1.5 Flash | gemini-1.5-flash | 1M | Rapide, multimodal |
| 1.5 Flash-8B | gemini-1.5-flash-8b | 1M | Volume eleve, bas cout |
Note: 1.5 Pro reste pertinent pour les contextes tres longs (2M natif).
Besoin de raisonnement profond? OUI -> Gemini 3 Pro (Deep Think) NON -> Budget serre / volume eleve? OUI -> 2.5 Flash-Lite NON -> Besoin thinking configurable? OUI -> 2.5 Flash NON -> Gemini 3 Flash
| Modele | Input | Output | Input >200K | Output >200K |
|---|---|---|---|---|
| Gemini 3 Pro | $1.25 | $10.00 | $2.50 | $15.00 |
| Gemini 3 Flash | $0.15 | $0.60 | $0.30 | $1.20 |
| 2.5 Pro | $1.25 | $10.00 | $2.50 | $15.00 |
| 2.5 Flash | $0.15 | $0.60 | $0.30 | $1.20 |
| 2.5 Flash-Lite | $0.075 | $0.30 | $0.15 | $0.60 |
| 1.5 Pro | $1.25 | $5.00 | $2.50 | $10.00 |
| 1.5 Flash | $0.075 | $0.30 | $0.15 | $0.60 |
| Modele | Thinking Input | Thinking Output |
|---|---|---|
| 2.5 Flash | $0.15/M | $3.50/M |
| 2.5 Pro | $1.25/M | $10.00/M |
| Gemini 3 Pro | $1.25/M | $10.00/M |
| Modele | Cache Input | Reduction | Stockage/1M/h |
|---|---|---|---|
| 3 Flash | $0.0375/M | -75% | $0.0025 |
| 2.5 Flash | $0.0375/M | -75% | $0.0025 |
| 2.5 Pro | $0.3125/M | -75% | $0.0625 |
| 1.5 Flash | $0.01875/M | -75% | $0.0025 |
Traitement asynchrone avec 50% de reduction sur tous les modeles :
# Batch request format batch_request = { "requests": [ {"id": "req-1", "request": { "model": "gemini-2.5-flash", "contents": [{"parts": [{"text": "..."}]}] }} ] }
# Exemple: 10K requetes/jour, 2.5 Flash # 2000 tokens input + 500 tokens output # Sans optimisation: cout_base = 10000 * (2000/1e6 * 0.15 + 500/1e6 * 0.60) # = $3.00 + $3.00 = $6.00/jour # Avec cache implicite (80% cache hit): cout_cache = 10000 * (400/1e6 * 0.15 + 1600/1e6 * 0.0375 + 500/1e6 * 0.60) # = $0.60 + $0.60 + $3.00 = $4.20/jour (-30%) # Avec Batch API: cout_batch = 10000 * (2000/1e6 * 0.075 + 500/1e6 * 0.30) # = $1.50 + $1.50 = $3.00/jour (-50%)
URL: aistudio.google.com
# Code genere par AI Studio (Python) from google import genai client = genai.Client(api_key="YOUR_API_KEY") response = client.models.generate_content( model="gemini-2.5-flash", contents="Explique le cloud computing" ) print(response.text)
# Installation pip install google-genai # Configuration avec API Key from google import genai client = genai.Client(api_key="GEMINI_API_KEY") # Ou via variable d'environnement # export GOOGLE_API_KEY=AIza... client = genai.Client() # Configuration Vertex AI (GCP) client = genai.Client( vertexai=True, project="my-project", location="us-central1" )
# Base URL https://generativelanguage.googleapis.com/v1beta # Endpoint principal POST /models/{model}:generateContent?key={API_KEY} # Streaming POST /models/{model}:streamGenerateContent?key={API_KEY} # Headers Content-Type: application/json # cURL exemple curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/\ models/gemini-2.5-flash:generateContent?key=$API_KEY" \ -H "Content-Type: application/json" \ -d '{"contents":[{"parts":[{"text":"Hello"}]}]}'
| Methode | Usage | Securite |
|---|---|---|
| API Key | Google AI Studio | Basique |
| OAuth 2.0 | Vertex AI / GCP | Elevee |
| Service Account | Production GCP | Elevee |
from google import genai from google.genai import types client = genai.Client(api_key="...") response = client.models.generate_content( model="gemini-2.5-flash", contents="Explique les design patterns", config=types.GenerateContentConfig( system_instruction="Tu es un architecte logiciel senior.", temperature=0.7, top_p=0.95, top_k=40, max_output_tokens=2048, candidate_count=1, ) ) print(response.text) print(response.usage_metadata)
| Parametre | Type | Defaut | Description |
|---|---|---|---|
| temperature | float | 1.0 | 0=deterministe, 2=creatif |
| top_p | float | 0.95 | Nucleus sampling |
| top_k | int | 40 | Nombre de tokens candidats |
| max_output_tokens | int | 8192 | Max tokens en sortie |
| stop_sequences | list | [] | Sequences d'arret (max 5) |
| candidate_count | int | 1 | Nombre de reponses |
| presence_penalty | float | 0 | -2.0 a 2.0 |
| frequency_penalty | float | 0 | -2.0 a 2.0 |
# Reponse generateContent response.text # Texte de la reponse response.candidates # Liste de candidats response.candidates[0].content.parts[0].text response.candidates[0].finish_reason response.candidates[0].safety_ratings # Usage metadata response.usage_metadata.prompt_token_count response.usage_metadata.candidates_token_count response.usage_metadata.total_token_count response.usage_metadata.cached_content_token_count
response = client.models.generate_content( model="gemini-2.5-flash", contents=[ types.Content( role="user", parts=[types.Part(text="Qu'est-ce que REST?")] ), types.Content( role="model", parts=[types.Part(text="REST est...")] ), types.Content( role="user", parts=[types.Part(text="Compare avec GraphQL")] ), ] )
# Streaming avec le SDK google-genai response = client.models.generate_content_stream( model="gemini-2.5-flash", contents="Ecris un article sur le cloud", ) for chunk in response: print(chunk.text, end="") # Chaque chunk contient: # chunk.text - texte partiel # chunk.candidates[0].content.parts # chunk.usage_metadata (dernier chunk)
# Endpoint streaming POST /models/gemini-2.5-flash:streamGenerateContent\ ?alt=sse&key={API_KEY} # Reponse SSE (Server-Sent Events) data: {"candidates":[{"content":{"parts": [{"text":"Voici"}],"role":"model"}}]} data: {"candidates":[{"content":{"parts": [{"text":" mon"}],"role":"model"}}]} data: {"candidates":[{"content":{"parts": [{"text":" article..."}],"role":"model"}, "finishReason":"STOP"}], "usageMetadata":{"promptTokenCount":8, "candidatesTokenCount":150}}
import asyncio from google import genai async def stream_response(): client = genai.Client(api_key="...") response = await client.aio.models\ .generate_content_stream( model="gemini-2.5-flash", contents="Explique Kubernetes" ) async for chunk in response: print(chunk.text, end="") asyncio.run(stream_response())
| Modele | RPM | RPD | TPM |
|---|---|---|---|
| Gemini 3 Pro | 5 | 50 | 250K |
| Gemini 3 Flash | 15 | 1500 | 1M |
| 2.5 Pro | 5 | 50 | 250K |
| 2.5 Flash | 10 | 500 | 250K |
| 2.5 Flash-Lite | 30 | 1500 | 1M |
| 1.5 Flash | 15 | 1500 | 1M |
RPM = Requetes/min | RPD = Requetes/jour | TPM = Tokens/min
| Modele | RPM | RPD | TPM |
|---|---|---|---|
| Gemini 3 Pro | 1000 | Illimite | 4M |
| Gemini 3 Flash | 2000 | Illimite | 4M |
| 2.5 Pro | 1000 | Illimite | 4M |
| 2.5 Flash | 2000 | Illimite | 4M |
| 2.5 Flash-Lite | 4000 | Illimite | 4M |
| 1.5 Flash | 2000 | Illimite | 4M |
import time from google.api_core import exceptions def call_with_retry(prompt, retries=5): for i in range(retries): try: return client.models.generate_content( model="gemini-2.5-flash", contents=prompt ) except exceptions.ResourceExhausted: wait = 2 ** i print(f"Rate limit, retry in {wait}s") time.sleep(wait) raise Exception("Max retries exceeded")
# Gemini 3 Pro avec Deep Think response = client.models.generate_content( model="gemini-3.0-pro", contents="Prouve que sqrt(2) est irrationnel", config=types.GenerateContentConfig( thinking_config=types.ThinkingConfig( thinking_budget=8192 ) ) )
# Budget configurable: 0 a 24576 tokens response = client.models.generate_content( model="gemini-2.5-flash", contents="Optimise cet algorithme...", config=types.GenerateContentConfig( thinking_config=types.ThinkingConfig( thinking_budget=8192 # 0-24576 ) ) )
| Budget | Usage | Cout relatif |
|---|---|---|
| 0 | Pas de thinking | Minimal |
| 1024 | Reflexion legere | Bas |
| 8192 | Raisonnement moyen | Moyen |
| 24576 | Raisonnement max | Eleve |
# Lire les thinking tokens for part in response.candidates[0].content.parts: if part.thought: print("THINKING:", part.text) else: print("RESPONSE:", part.text) # Usage metadata thinking meta = response.usage_metadata print(f"Thinking tokens: {meta.thinking_token_count}") print(f"Total tokens: {meta.total_token_count}")
response = client.models.generate_content( model="gemini-2.5-flash", config=types.GenerateContentConfig( system_instruction="""Tu es un architecte cloud senior specialise AWS et GCP. REGLES: - Reponds toujours en francais - Fournis des exemples de code - Cite les services cloud specifiques - Format: Markdown structure PERSONNALITE: - Ton professionnel mais accessible - Toujours justifier les choix techniques """ ), contents="Conçois une archi microservices" )
system_instruction (singulier) au lieu de system. Pas de cache_control inline - utiliser Context Caching API separement."""Tu es un assistant technique polyvalent.
TACHE 1 - CODE REVIEW:
Si l'utilisateur envoie du code, analyse:
- Bugs potentiels
- Performance
- Securite
- Suggestions d'amelioration
TACHE 2 - ARCHITECTURE:
Si l'utilisateur decrit un systeme, propose:
- Diagramme textuel
- Composants recommandes
- Points d'attention
TACHE 3 - DEBUGGING:
Si l'utilisateur a une erreur:
- Identifie la cause racine
- Propose un fix
- Explique la prevention
FORMAT: Toujours utiliser des sections Markdown.
LANGUE: Francais uniquement.
"""
| Type | Formats | Taille Max |
|---|---|---|
| Images | PNG, JPEG, WebP, HEIC, HEIF | 20 MB |
| Audio | MP3, WAV, AIFF, AAC, OGG, FLAC | Inline: 20MB |
| Video | MP4, MPEG, MOV, AVI, FLV, WebM | File API: 2GB |
| Documents | PDF, TXT, HTML, CSS, JS, PY... | File API: 2GB |
# Upload d'un fichier uploaded_file = client.files.upload( file="document.pdf", config={"display_name": "Mon Document"} ) # Verifier le statut print(uploaded_file.name) # files/abc123 print(uploaded_file.uri) # URI pour reference print(uploaded_file.state) # ACTIVE print(uploaded_file.expiration_time) # Utiliser dans une requete response = client.models.generate_content( model="gemini-2.5-flash", contents=[ types.Part.from_uri( file_uri=uploaded_file.uri, mime_type=uploaded_file.mime_type ), "Resume ce document" ] )
| Methode | Taille Max | Persistance | Usage |
|---|---|---|---|
| inline_data | 20 MB | Requete unique | Petits fichiers |
| File API | 2 GB | 48h TTL | Gros fichiers, reutilisation |
# Inline data (base64) import base64 with open("image.png", "rb") as f: data = base64.standard_b64encode(f.read()).decode() response = client.models.generate_content( model="gemini-2.5-flash", contents=[ types.Part.from_bytes( data=open("image.png", "rb").read(), mime_type="image/png" ), "Decris cette image" ] )
# Lister les fichiers uploades for f in client.files.list(): print(f"{f.name} | {f.state} | {f.size_bytes}") # Recuperer un fichier specifique file = client.files.get(name="files/abc123") # Supprimer un fichier client.files.delete(name="files/abc123") # Attendre que le fichier soit pret (video) import time while uploaded_file.state == "PROCESSING": time.sleep(5) uploaded_file = client.files.get( name=uploaded_file.name)
# Depuis un fichier local response = client.models.generate_content( model="gemini-2.5-flash", contents=[ types.Part.from_bytes( data=open("screenshot.png", "rb").read(), mime_type="image/png" ), "Analyse cette interface utilisateur. " "Identifie les problemes UX." ] ) # Depuis une URL (via File API) import urllib.request urllib.request.urlretrieve(url, "temp.jpg") file = client.files.upload(file="temp.jpg")
response = client.models.generate_content( model="gemini-2.5-flash", contents=[ types.Part.from_bytes( data=open("receipt.jpg", "rb").read(), mime_type="image/jpeg" ), """Extrais toutes les informations de ce recu: - Nom du magasin - Date - Liste des articles avec prix - Total Format: JSON""" ] )
# Comparer plusieurs images response = client.models.generate_content( model="gemini-2.5-flash", contents=[ types.Part.from_bytes( data=open("design_v1.png", "rb").read(), mime_type="image/png"), "Image 1: Design V1", types.Part.from_bytes( data=open("design_v2.png", "rb").read(), mime_type="image/png"), "Image 2: Design V2", "Compare ces deux designs. " "Liste les differences." ] )
Limite: Max 3600 images par requete. Tokens par image ~258 tokens (standard).
# Upload audio via File API audio = client.files.upload(file="meeting.mp3") response = client.models.generate_content( model="gemini-2.5-flash", contents=[ types.Part.from_uri( file_uri=audio.uri, mime_type="audio/mp3" ), """Transcris cet audio et fournis: 1. Transcription complete 2. Resume des points cles 3. Actions items identifies 4. Participants detectes""" ] )
Tokens audio: ~32 tokens/seconde. 1h audio = ~115K tokens.
# Upload video (peut prendre du temps) video = client.files.upload(file="demo.mp4") # Attendre le traitement import time while video.state == "PROCESSING": time.sleep(10) video = client.files.get(name=video.name) response = client.models.generate_content( model="gemini-2.5-flash", contents=[ types.Part.from_uri( file_uri=video.uri, mime_type="video/mp4" ), "Decris ce qui se passe a 0:30, 1:00 et 2:00" ] )
Tokens video: ~263 tokens/seconde (audio+visuel). Video max: 2GB ou ~2h.
response = client.models.generate_content( model="gemini-2.5-flash", contents=[ types.Part.from_uri( file_uri=video.uri, mime_type="video/mp4" ), """Analyse cette video et cree: 1. Un chapitrage avec timestamps 2. Un resume de chaque section 3. Les moments cles Format: MM:SS - Description""" ] )
# PDF via File API pdf = client.files.upload(file="rapport.pdf") response = client.models.generate_content( model="gemini-2.5-flash", contents=[ types.Part.from_uri( file_uri=pdf.uri, mime_type="application/pdf" ), """Analyse ce document PDF: 1. Resume executif (5 bullet points) 2. Chiffres cles extraits 3. Conclusions principales 4. Points d'action recommandes""" ] ) # PDF inline (petits fichiers < 20MB) response = client.models.generate_content( model="gemini-2.5-flash", contents=[ types.Part.from_bytes( data=open("small.pdf", "rb").read(), mime_type="application/pdf" ), "Extrais les donnees du tableau page 3" ] )
# Comparer deux documents pdf1 = client.files.upload(file="contrat_v1.pdf") pdf2 = client.files.upload(file="contrat_v2.pdf") response = client.models.generate_content( model="gemini-2.5-flash", contents=[ types.Part.from_uri( file_uri=pdf1.uri, mime_type="application/pdf"), "Document V1 ci-dessus.", types.Part.from_uri( file_uri=pdf2.uri, mime_type="application/pdf"), "Document V2 ci-dessus.", "Compare ces deux versions. " "Liste toutes les differences." ] )
# Mode JSON simple response = client.models.generate_content( model="gemini-2.5-flash", contents="Liste 3 langages de programmation", config=types.GenerateContentConfig( response_mime_type="application/json" ) ) import json data = json.loads(response.text) print(data)
response_mime_type="application/json", la sortie est TOUJOURS du JSON valide. Pas besoin de parsing defensif.# Schema structure avec types response = client.models.generate_content( model="gemini-2.5-flash", contents="Analyse le sentiment: J'adore ce produit!", config=types.GenerateContentConfig( response_mime_type="application/json", response_schema={ "type": "object", "properties": { "sentiment": { "type": "string", "enum": ["positif", "negatif", "neutre"] }, "score": {"type": "number"}, "keywords": { "type": "array", "items": {"type": "string"} } }, "required": ["sentiment", "score"] } ) ) # {"sentiment":"positif","score":0.95,"keywords":["adore"]}
# Enum pour contraindre les valeurs schema = { "type": "object", "properties": { "priority": { "type": "string", "enum": ["low", "medium", "high", "critical"] }, "category": { "type": "string", "enum": ["bug", "feature", "improvement"] }, "description": {"type": "string"} }, "required": ["priority", "category", "description"] }
from google.genai import types response = client.models.generate_content( model="gemini-2.5-flash", contents="Quelles sont les dernieres nouveautes " "de Google Cloud en 2026?", config=types.GenerateContentConfig( tools=[ types.Tool( google_search=types.GoogleSearch() ) ] ) )
# Acceder aux metadonnees de grounding candidate = response.candidates[0] grounding = candidate.grounding_metadata # Sources utilisees for chunk in grounding.grounding_chunks: print(f"Source: {chunk.web.title}") print(f"URL: {chunk.web.uri}") # Supports de grounding for support in grounding.grounding_supports: print(f"Texte: {support.segment.text}") print(f"Indices: {support.grounding_chunk_indices}") print(f"Confiance: {support.confidence_scores}") # Requete de recherche generee print(grounding.search_entry_point.rendered_content)
# Grounding conditionnel (seuil de confiance) response = client.models.generate_content( model="gemini-2.5-flash", contents="Quelle est la population de la France?", config=types.GenerateContentConfig( tools=[ types.Tool( google_search=types.GoogleSearch() ) ] ) )
| Aspect | Implicit | Explicit |
|---|---|---|
| Activation | Automatique | API cachedContents |
| Cout | Gratuit | -75% input + stockage |
| TTL | Non garanti | Configurable (min 1 min) |
| Min tokens | 1024 | 2048 (32K+ recommande) |
| Controle | Aucun | Total |
| Garantie | Best-effort | Garanti pendant TTL |
# Creer un cache explicite from google.genai import types # Upload du document a cacher doc = client.files.upload(file="codebase.zip") cache = client.caches.create( model="gemini-2.5-flash", config=types.CreateCachedContentConfig( display_name="Mon codebase", system_instruction="Tu es un expert de ce code.", contents=[ types.Content( role="user", parts=[types.Part.from_uri( file_uri=doc.uri, mime_type=doc.mime_type )] ) ], ttl="3600s" # 1 heure ) ) print(f"Cache: {cache.name}") print(f"Tokens caches: {cache.usage_metadata}")
# Requete avec cache response = client.models.generate_content( model="gemini-2.5-flash", contents="Explique la fonction main()", config=types.GenerateContentConfig( cached_content=cache.name ) ) # Verifier l'usage du cache meta = response.usage_metadata print(f"Cached: {meta.cached_content_token_count}") print(f"Non-cached: {meta.prompt_token_count}") # Gestion du cache cache = client.caches.update( name=cache.name, config={"ttl": "7200s"} # Prolonger a 2h ) client.caches.delete(name=cache.name) # Supprimer
# Few-shot dans system_instruction system = """Classifie les tickets de support. EXEMPLES: Input: "L'app crash au login" -> bug/critical Input: "Ajouter mode sombre" -> feature/low Input: "Page lente a charger" -> performance/medium FORMAT: categorie/priorite""" # Chain-of-Thought explicite prompt = """Analyse ce probleme etape par etape: 1. Identifie le probleme principal 2. Liste les causes possibles 3. Evalue chaque cause (probabilite) 4. Propose la solution la plus probable 5. Donne un plan d'action Probleme: [description]"""
# MAUVAIS "Analyse ce texte et fais-en quelque chose" # BON """Analyse le texte suivant et produis: 1. Resume en 3 phrases 2. 5 mots-cles 3. Sentiment (positif/negatif/neutre) 4. Score de confiance (0-1) Format: JSON"""
| Categorie | Constante | Description |
|---|---|---|
| Harassment | HARM_CATEGORY_HARASSMENT | Harcelement, intimidation |
| Hate Speech | HARM_CATEGORY_HATE_SPEECH | Discours haineux |
| Sexually Explicit | HARM_CATEGORY_SEXUALLY_EXPLICIT | Contenu sexuel |
| Dangerous Content | HARM_CATEGORY_DANGEROUS_CONTENT | Contenu dangereux |
| Civic Integrity | HARM_CATEGORY_CIVIC_INTEGRITY | Desinformation electorale |
from google.genai import types response = client.models.generate_content( model="gemini-2.5-flash", contents="Analyse ce contenu...", config=types.GenerateContentConfig( safety_settings=[ types.SafetySetting( category="HARM_CATEGORY_HARASSMENT", threshold="BLOCK_ONLY_HIGH" ), types.SafetySetting( category="HARM_CATEGORY_HATE_SPEECH", threshold="BLOCK_MEDIUM_AND_ABOVE" ), ] ) )
| Seuil | Comportement |
|---|---|
| BLOCK_NONE | Aucun blocage (Vertex AI uniquement) |
| BLOCK_ONLY_HIGH | Bloque seulement haute probabilite |
| BLOCK_MEDIUM_AND_ABOVE | Bloque medium+ (defaut) |
| BLOCK_LOW_AND_ABOVE | Bloque presque tout |
# Lire les safety ratings candidate = response.candidates[0] for rating in candidate.safety_ratings: print(f"{rating.category}: {rating.probability}") # NEGLIGIBLE, LOW, MEDIUM, HIGH # Gerer le blocage if candidate.finish_reason == "SAFETY": print("Reponse bloquee par filtres de securite") for rating in candidate.safety_ratings: if rating.blocked: print(f"Bloque par: {rating.category}")
# Utiliser Gemma 3 via Hugging Face from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "google/gemma-3-27b-it") tokenizer = AutoTokenizer.from_pretrained( "google/gemma-3-27b-it")
Workspace AI (Gemini for Workspace):
Jules:
Project Mariner: