When you are evaluating an AI image model for production integration, "supports image generation" is not a specification — it is a category. What you actually need to know is: what inputs does it accept, what outputs does it return, what configuration options exist at generation time, which capabilities are endpoint-specific, and what is explicitly not supported. Getting any of these wrong means refactoring weeks into a project when you hit the wall in staging.
This article covers every confirmed capability of Nano Banana 2 (gemini-3.1-flash-image-preview) — Google's Gemini 3.1 Flash image generation model, available on Wisdom Gate at $0.058/request with consistent 20-second generation from 0.5K to 4K base64 output. Each feature is explained at three levels: what it is, why it changes your integration architecture, and how to implement it with a production-ready code pattern.
What this article covers: every confirmed feature with spec-accurate detail, endpoint-specific availability notes, and working code for each. What it does not cover: head-to-head model comparisons (see the comparison sub-page), vertical-specific use case walkthroughs (see the architecture, beauty, and gaming guides), or pricing deep-dive (see the pricing guide). After reading this article, you will have a complete capability map to make an informed integration architecture decision for your specific workload.
🚀 Ready to map these features to your use case? Open Wisdom Gate AI Studio — no API key required — and start testing Nano Banana 2 while you read. Every feature in this article is testable in Studio before you write a single line of integration code.
Nano Banana 2 — Model Foundation & Architecture Context
Before covering individual features, it is worth establishing the architectural foundation that explains why Nano Banana 2's feature set is structurally different from diffusion-based models like Flux, Stable Diffusion, or Adobe Firefly's backend.
Nano Banana 2 is built on a unified transformer architecture. This is not a language model with an image generation module bolted on, and it is not a diffusion model with a text encoder feeding into a denoising loop. It is a single model backbone — the same transformer that processes language also processes image tokens, in one unified pass. Every feature in this article is a consequence of that design.
Two architectural consequences matter directly to developers making integration decisions:
Consequence 1 — Reasoning before generation. Because the model processes the full semantic content of a prompt before generating, it understands instructions the way a human would read a brief. Spatial constraints ("exactly four windows per floor"), multilingual text strings ("render the word SÉRUM in gold serif lettering"), and logical composition rules ("the sofa must be left of the window") are understood and executed — not approximated. Diffusion models hallucinate on these specifications because their CLIP encoders compress the prompt into a fixed embedding before any generation begins.
Consequence 2 — Bidirectional modality. Because the same transformer handles both text and image tokens natively, it can return text output describing the image it just generated — in the same API call. No diffusion model can do this architecturally. This is why Nano Banana 2 supports responseModalities: ["TEXT", "IMAGE"] as a first-class capability rather than a workaround.
The product facts that ground the rest of this article:
| Property | Value |
|---|---|
| Model ID | gemini-3.1-flash-image-preview |
| Platform Name | Nano Banana 2 |
| Speed Tier | Fast |
| Intelligence Tier | Medium |
| Price on Wisdom Gate | $0.058/request |
| Price (Google Official) | $0.068/request |
| Announced | 2026-02-26 |
| Image Edit Rank | #17 (score: 1,825) |
| Image Gen Rank | #5 |
For a complete overview of model positioning, see the [Nano Banana 2 developer overview guide]. For a direct comparison with Nano Banana Pro, see [Nano Banana 2 vs Nano Banana Pro].
With that architectural foundation established, here is every feature this model supports — and exactly how to use each one.
Nano Banana 2 Core Features — Complete Capability Map
This section maps every confirmed capability of Nano Banana 2. Each feature is covered in full detail in the sections that follow — use this table to jump directly to what you need. For [AI model performance & speed] benchmark data, see the dedicated benchmark comparison sub-page.
| Feature | What It Does | Endpoint Availability | Section |
|---|---|---|---|
| Bidirectional Multimodal I/O | Text+Image in → Text+Image out | All endpoints | Section 4 |
| Image Search Grounding | Integrates real-time web data into generation | Gemini-native only | Section 5 |
| Thinking Support | Pre-generation reasoning pass | Gemini-native only | Section 6 |
| 256K Context Window | Pass full brand guides, history, multi-image refs | All endpoints | Section 7 |
| Multi-Resolution Output | 0.5K / 1K / 2K / 4K base64 | Gemini-native only | Section 8 |
| 10+ Aspect Ratios | Platform-specific output framing | Gemini-native only | Section 8 |
| Multi-Turn Image Editing | Iterative refinement via conversation | All endpoints | Section 9 |
| Image-to-Image Generation | Image input → modified image output | All endpoints | Section 10 |
| Multi-Protocol Endpoints | OpenAI / Claude / Gemini compatible | — | Section 11 |
| Batch API | Asynchronous bulk processing | Gemini-native only | Section 12 |
⚠️ Critical integration note: Endpoint availability is not a minor footnote — it determines what your integration can do. Image Search Grounding, Thinking, resolution control (imageConfig), aspect ratio control, and Batch API are only available via the Gemini-native endpoint (/v1beta/models/...). OpenAI-compatible and Claude-compatible endpoints provide SDK convenience for migration — not full feature access. For any new production integration, always lead with the Gemini-native endpoint.
Feature 1 — Bidirectional Multimodal I/O in AI Image Generation
Most AI image generation APIs are output-image-only. You send a prompt, you receive an image. Nano Banana 2 is architecturally different: it can return both an image and text in the same response, and it accepts both text and images as inputs. This collapses multi-step pipelines into single API calls and eliminates entire infrastructure dependencies.
Inputs accepted: Plain text prompts, image files (JPEG, PNG, WebP, GIF), PDF documents as inline_data
Outputs returned: Image (base64-encoded inlineData), Text (JSON text part in the same response). Both can be returned simultaneously in one API call.
Three concrete workflows this enables:
Workflow A — Combined generation + captioning. Generate a product image and receive its SEO alt text in the same response. One call replaces two. The text part returns a description of the generated image automatically when responseModalities includes "TEXT".
# Returns both image and descriptive text in one response
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"]
}
Workflow B — Image editing with structured changelog. Submit a product photo with the instruction "make the background studio white, remove all shadows" — receive the edited image AND a text description of every change applied. The text output is not a caption; it is a structured account of the editing decisions the model made.
Workflow C — Visual question answering (VQA). Submit a user-uploaded floor plan with the question "What rooms are on the north side of this plan?" — receive a text answer AND an annotated version of the plan highlighting the relevant rooms. One API call, two deliverables, no secondary VQA request needed.
Force image-only output for pure generation pipelines where a text response is unwanted and would break downstream parsing:
# Prevents text-only responses on ambiguous prompts
"generationConfig": {
"responseModalities": ["IMAGE"]
}
Without this, the model may return a text-only response on prompts it interprets as ambiguous — causing your base64 image extraction to fail silently. Always set responseModalities explicitly.
Extracting the image from a Gemini-native response:
# Extract base64 image data and decode to PNG
curl -s -X POST \
"https://wisdom-gate.juheapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent" \
-H "x-goog-api-key: $WISDOM_GATE_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "A minimalist product photograph of a glass bottle on white marble, studio lighting"}]}],
"generationConfig": {"responseModalities": ["IMAGE"], "imageConfig": {"imageSize": "2K", "aspectRatio": "1:1"}}
}' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' \
| head -1 | base64 --decode > output.png
Developer implication: Bidirectional I/O reduces API call count in any workflow requiring both a generated asset and its metadata. For e-commerce platforms generating product images with alt text, or content platforms generating illustrations with captions, this is a direct cost and latency reduction — not a convenience feature.
Feature 2 — Image Search Grounding: The Gemini 3.1 Flash Differentiator
Image Search Grounding is the most architecturally distinctive feature of Nano Banana 2, and the one with the clearest competitive moat. No diffusion model — Flux, Stable Diffusion, Midjourney's backend, Adobe Firefly — supports anything functionally equivalent. It is exclusive to the gemini 3.1 flash unified transformer design.
The mechanism: When "tools": [{"google_search": {}}] is included in the request, the model executes a real-time web search as part of the generation process — before any pixels are generated. Retrieved visual and textual references are integrated into the generation context at the semantic level. The output is grounded in current web data, not solely in training data from the model's knowledge cutoff of January 2025.
⚠️ Endpoint constraint: Image Search Grounding is only available via the Gemini-native endpoint. It cannot be used through the OpenAI-compatible or Claude-compatible endpoints on Wisdom Gate. If this feature is required, the Gemini-native endpoint is mandatory.
Production-ready request with grounding enabled:
curl -s -X POST \
"https://wisdom-gate.juheapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent" \
-H "x-goog-api-key: $WISDOM_GATE_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [{
"text": "Da Vinci style anatomical sketch of a dissected Monarch butterfly. Detailed drawings of the head, wings, and legs on textured parchment with notes in English."
}]
}],
"tools": [{"google_search": {}}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"],
"imageConfig": {
"aspectRatio": "1:1",
"imageSize": "2K"
}
}
}' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' \
| head -1 | base64 --decode > butterfly.png
In this example, grounding retrieves current scholarly and visual references to Da Vinci's actual anatomical drawings — producing a more contextually accurate output than training data averaging alone.
Four production use cases where grounding changes output quality:
Use Case 1 — Trend-informed marketing creative. A prompt like "generate a campaign image in this season's runway color palette" without grounding produces outputs based on 2024–2025 training data averages. With grounding enabled, the model retrieves current fashion week references and integrates them into the generation. For marketing teams producing seasonal campaign assets, this is the difference between timely and generic — and the difference that justifies the capability over a diffusion-based alternative.
Use Case 2 — Current architectural references. For [Nano Banana 2 for architecture] workflows, grounding enables prompts like "generate a sustainable office facade in 2026 biophilic design style" — the model retrieves current certified project examples rather than approximating from pre-cutoff training data. For visualization tools used by architecture firms, this closes a meaningful accuracy gap.
Use Case 3 — Product category visual conventions. For e-commerce teams generating product imagery, grounding allows the model to reference current product pages, competitor visual conventions, or category photography standards from live web data. Outputs reflect what the current market looks like — not what it looked like 18 months ago.
Use Case 4 — News-adjacent and post-cutoff content. For media and publishing platforms, grounding enables illustrative image generation for current events and topics that postdate the model's January 2025 knowledge cutoff. Without grounding, these prompts produce outputs that reference pre-cutoff information or hallucinate recent context.
Important behavioral note: Grounding is supported both with Thinking mode on and Thinking mode off. The model determines internally whether and how to use search results; developers cannot specify which URLs are retrieved or control the search query. For workflows requiring deterministic, fully controlled generation, disable grounding and provide all context explicitly in the prompt.
Developer decision rule: Enable grounding when the prompt references current events, trends, or real-world visual references that may have evolved since January 2025. Disable it when determinism and prompt-controlled generation are required — grounding introduces controlled variability that is useful for trend-aware content and counterproductive for brand-specified batch generation.
Feature 3 — Thinking Support
Before Nano Banana 2 generates any pixels on a complex prompt, it can execute an internal reasoning pass — reading the full prompt, resolving ambiguities, planning the compositional layout, and verifying consistency across all constraints provided. This reasoning is not exposed in the response, but it directly improves output quality on prompts with multiple competing constraints or spatial requirements.
⚠️ Endpoint constraint: Thinking is only available via the Gemini-native endpoint.
When Thinking mode is worth the additional latency:
- Complex multi-element compositions requiring accurate spatial relationships — floor plans, infographic layouts, architectural elevation drawings with labeled components
- Prompts where ambiguity could produce inconsistent results across runs — brand imagery requiring exact compositional rules
- Data visualization prompts where structural accuracy is semantically critical — charts, diagrams, educational explainers
- Prompts with more than five simultaneous constraints — Thinking helps the model prioritize and reconcile them before committing to any image structure
When to skip Thinking and prioritize throughput:
- Simple single-subject generation where the prompt is unambiguous
- High-volume batch pipelines where p95 latency is the primary SLA
- Interactive real-time features where every second of response time is directly user-visible
Note to implementation teams: confirm the exact thinkingConfig parameter name and structure from the Gemini API documentation at ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-image-preview before including in production code. The behavior is confirmed; the exact parameter syntax should be verified against the current API reference before deployment.
Developer decision rule: Add Thinking for prompts with more than five compositional constraints or any prompt requiring verified spatial accuracy. Skip it for single-subject prompts or any pipeline where throughput and consistent latency are the primary engineering requirements.
Feature 4 — 256K Context Window
256K tokens. Nano Banana Pro's context window is 32K. Most competing image generation APIs offer 4K–16K. Nano Banana 2's 256K window is not a marginal improvement over competitors — it is a fundamentally different integration architecture that removes entire infrastructure dependencies for teams that have been working around context limits.
What 256K tokens actually fits:
| Content Type | Approximate Token Count | Fits in 256K? |
|---|---|---|
| Full brand style guide (detailed) | 3,000–8,000 tokens | ✅ Yes |
| Product catalog (500 items, name + description) | ~25,000 tokens | ✅ Yes |
| Multi-turn conversation (20 turns, text + image refs) | ~10,000 tokens | ✅ Yes |
| Complete system prompt with all negative constraints | 2,000–5,000 tokens | ✅ Yes |
| 10 reference image descriptions (detailed) | ~5,000 tokens | ✅ Yes |
| Full novel (80,000 words) | ~110,000 tokens | ✅ Yes |
| Typical image API context limit (competitor) | 4K–32K tokens | ❌ Would truncate |
Three application-layer implications that change integration architecture:
Implication 1 — Eliminate external context management. With most image generation APIs, teams managing multi-turn editing sessions must maintain an external session store (Redis, DynamoDB, or equivalent) and pass summarized conversation context in each request because the model context is too small for full history. At 256K, the complete conversation history fits natively. Remove the stateful session management infrastructure entirely — or evaluate whether it is still justified.
Implication 2 — Brand consistency without re-prompting. Embed a complete brand style guide in the system prompt — colors in HEX and Pantone, typography rules, prohibited visual elements, composition guidelines, reference image descriptions. At 256K, this guide fits alongside the full generation request without truncation. Every image in a batch session respects it without repeating the style parameters in each individual prompt. For brands with complex visual identity requirements, this is the difference between consistent output and statistically approximate output.
Implication 3 — Single-request product catalog generation. Pass a 500-product catalog with names, descriptions, and a generation template in one request. The model generates images for each item within the session context without chunking or session management overhead. For e-commerce teams with large SKU counts, this changes the architecture of a batch generation pipeline from a stateful orchestration problem to a single-request job.
Practical note on context consumption: All aspect ratios for Nano Banana 2 consume approximately 1,290 tokens per generated image. At 256K, a developer can include a 100K-token brand document and full conversation history and still have 140K+ tokens of working space for generation context and additional turns.
Nano Banana 2 Core Features — Multi-Resolution Output & Aspect Ratio Control
Resolution and aspect ratio are not cosmetic settings — they determine whether an output is usable for its intended destination without post-processing. An image generated at 1K for a social post cannot be cleanly upscaled to 4K for a print billboard. Generating at the wrong aspect ratio for a mobile Story means cropping and reframing in post. Configure both at generation time and eliminate post-processing entirely.
⚠️ Endpoint constraint: imageConfig (both imageSize and aspectRatio) is only available via the Gemini-native endpoint. OpenAI-compatible endpoint calls default to 1K resolution and cannot control aspect ratio through the imageConfig parameter.
Resolution Tiers — Developer Reference
| Resolution | Approx. Dimensions | Best For | Notes |
|---|---|---|---|
"0.5K" | ~512px short edge | Draft iteration, thumbnails, high-volume testing | Fastest iteration; same token cost as 4K |
"1K" | ~1024px short edge | Social media, web UI, app assets | Default — used when imageConfig is omitted |
"2K" | ~2048px short edge | Marketing collateral, product photography, landing pages | Sweet spot for quality-to-cost ratio |
"4K" | ~4096px short edge | Hero images, print, architectural visualization | Maximum quality; consistent 20 sec on Wisdom Gate |
Critical production note: Wisdom Gate delivers all four resolution tiers — 0.5K through 4K base64 — in a consistent 20-second generation time. This is a platform-level delivery guarantee, not a statistical average. There is no latency penalty for generating at 4K versus 0.5K. Configure resolution based on output quality requirements — not latency budget — because the latency budget is identical across all tiers.
Aspect Ratio Reference
| Platform / Context | Aspect Ratio | imageConfig Value |
|---|---|---|
| Square (social, e-commerce) | 1:1 | "1:1" |
| Landscape (web hero, YouTube thumbnail) | 16:9 | "16:9" |
| Landscape (photography, print) | 3:2 | "3:2" |
| Landscape (standard screen) | 4:3 | "4:3" |
| Landscape (social wide) | 5:4 | "5:4" |
| Portrait (mobile, Stories) | 9:16 | "9:16" |
| Portrait (pin, editorial) | 2:3 | "2:3" |
| Portrait (product) | 3:4 | "3:4" |
| Portrait (social) | 4:5 | "4:5" |
| Ultra-wide (cinema, banner) | 21:9 | "21:9" |
| Extreme portrait (new in NB2) | 1:4 | "1:4" |
| Extreme landscape (new in NB2) | 4:1 | "4:1" |
| Extreme portrait strip (new in NB2) | 1:8 | "1:8" |
| Extreme landscape strip (new in NB2) | 8:1 | "8:1" |
The four extreme aspect ratios — 1:4, 4:1, 1:8, 8:1 — are new in gemini-3.1-flash-image-preview and are not available in predecessor models. They unlock UI strip generation, vertical banner formats, and horizontal timeline graphics that previously required post-generation cropping.
Production imageConfig block:
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "2K"
}
}
Multi-platform asset generation pattern: Generate a primary hero image at 1:1 at 4K, then pass it back as inline_data with prompts requesting each platform variant. This preserves compositional consistency across all platform sizes while eliminating prompt repetition across the batch — and keeps all variants visually coherent because they derive from the same source generation.
Feature 6 — Multi-Turn Image Editing
Multi-turn image editing means maintaining a conversation context across multiple API calls, where each call includes the previously generated image plus a new editing instruction. The model treats the sequence as a coherent editing session — each turn builds on the last, without requiring the developer to re-establish the full visual context in every request.
The mechanism: Pass the previous model response — including the generated image as inlineData — back into the contents array alongside the new editing instruction. The model references the full conversation history when executing each incremental edit.
Why this matters architecturally: Without multi-turn editing, every incremental refinement requires re-specifying the complete visual brief from scratch — which is both token-inefficient and compositionally inconsistent (each regeneration starts fresh from the prompt, not from the previous visual state). Multi-turn editing preserves the visual state across refinements.
Condensed multi-turn code pattern:
# Turn 1: Generate initial image and save the response
curl -s -X POST \
"https://wisdom-gate.juheapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent" \
-H "x-goog-api-key: $WISDOM_GATE_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"role": "user",
"parts": [{"text": "A luxury skincare serum bottle on a white marble surface, warm directional studio lighting"}]
}],
"generationConfig": {"responseModalities": ["TEXT", "IMAGE"], "imageConfig": {"imageSize": "2K", "aspectRatio": "1:1"}}
}' > turn1_response.json
# Extract image data from Turn 1
IMAGE_DATA=$(jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' turn1_response.json)
# Turn 2: Pass Turn 1 image + new instruction
curl -s -X POST \
"https://wisdom-gate.juheapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent" \
-H "x-goog-api-key: $WISDOM_GATE_KEY" \
-H "Content-Type: application/json" \
-d "{
\"contents\": [
{
\"role\": \"user\",
\"parts\": [{\"text\": \"A luxury skincare serum bottle on a white marble surface, warm directional studio lighting\"}]
},
{
\"role\": \"model\",
\"parts\": [{\"inlineData\": {\"mimeType\": \"image/png\", \"data\": \"$IMAGE_DATA\"}}]
},
{
\"role\": \"user\",
\"parts\": [{\"text\": \"Add a small sprig of eucalyptus to the left of the bottle. Keep everything else identical.\"}]
}
],
\"generationConfig\": {\"responseModalities\": [\"TEXT\", \"IMAGE\"], \"imageConfig\": {\"imageSize\": \"2K\", \"aspectRatio\": \"1:1\"}}
}" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' \
| head -1 | base64 --decode > turn2_output.png
Four production workflows where multi-turn editing is the right architecture:
- Interior design client review ([Nano Banana 2 for architecture]): Generate an initial room render → receive client feedback → iterate without re-establishing the full scene description. The model holds the visual context; the developer holds only the conversation history
- Product photography variants: Base product image → "change background to clean studio white" → "add a soft drop shadow" → "zoom out 20%" — each edit preserves the previous visual state rather than regenerating from scratch
- Logo and brand asset refinement: Generate initial concept → typography iteration → color adjustment → spacing correction — the complete editorial cycle in a single session with coherent visual continuity
- Game concept art ([Nano Banana 2 for gaming]): Base character design → "add armor plating to the shoulders" → "shift the color palette to earth tones" → "add a faction emblem to the chest" — incremental iteration without losing character consistency across turns
Critical developer note: There is no server-side session state. The developer is entirely responsible for passing the full contents array — including all prior turns — in each request. For long editing sessions, monitor total token consumption against the 256K context limit using usageMetadata.totalTokenCount in the response.
Feature 7 — Image-to-Image Generation
Image-to-image generation means sending an existing image as part of the request alongside a text instruction. The model generates a new image that respects both the visual content of the input and the transformation specified in the text — a fundamentally different workflow from pure text-to-image generation.
Input mechanism: Images are passed as inline_data with mime_type and base64-encoded image data within the parts array.
Production-ready image-to-image request:
curl -s -X POST \
"https://wisdom-gate.juheapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent" \
-H "x-goog-api-key: $WISDOM_GATE_KEY" \
-H "Content-Type: application/json" \
-d "{
\"contents\": [{
\"role\": \"user\",
\"parts\": [
{\"text\": \"Transform this architectural sketch into a photorealistic exterior render. Maintain the floor plan layout exactly as shown. Contemporary Scandinavian style, timber and glass facade, overcast northern light.\"},
{
\"inline_data\": {
\"mime_type\": \"image/jpeg\",
\"data\": \"$(base64 -w 0 ./floor_plan_sketch.jpg)\"
}
}
]
}],
\"generationConfig\": {
\"responseModalities\": [\"IMAGE\"],
\"imageConfig\": {
\"aspectRatio\": \"16:9\",
\"imageSize\": \"4K\"
}
}
}\" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' \
| head -1 | base64 --decode > render.png
Five production use cases by vertical:
- Floor plan sketch → photorealistic 3D render — [Nano Banana 2 for architecture]: Upload an architect's hand-drawn sketch or exported plan image and receive a photorealistic exterior or interior perspective render in 20 seconds
- Product photo → studio white background variant — e-commerce: Pass a product photo taken against any background and receive a clean studio-white product shot without reshooting
- Reference art → style-consistent variants — [Nano Banana 2 for gaming]: Upload a reference character design and generate 50 consistent variants with different costumes, colorways, or environmental adaptations
- Rough packaging mockup → finished shelf render — [Nano Banana 2 for beauty and fashion]: Pass a flat packaging design and receive a three-dimensional shelf-ready product visualization
- UI wireframe → high-fidelity screen design — product design: Transform a low-fidelity wireframe into a polished screen mockup with realistic UI components and brand color application
Technical note on input image fidelity: The model does not guarantee pixel-perfect preservation of input image structure. For workflows requiring exact pixel-level inpainting (mask-based editing), evaluate the model's editing endpoint behavior separately against your quality requirements. For structure-preserving transformation — style, environment, texture, material changes — image-to-image mode performs well.
Feature 8 — Multi-Protocol Endpoint Compatibility
Most organizations building on AI APIs have existing infrastructure built on OpenAI's SDK, Anthropic's SDK, or both. Migrating to a new model typically means rewriting application code — not just changing a model string. Nano Banana 2 on Wisdom Gate supports all three protocol families, which means integration requires changing one configuration value, not one codebase.
Complete endpoint reference:
| Protocol | Available Endpoints | Auth Header | Base URL |
|---|---|---|---|
| Gemini-native | /v1beta/models/{model}:generateContent | x-goog-api-key | https://wisdom-gate.juheapi.com |
| OpenAI-compatible | /v1/chat/completions, /v1/images/generations, /v1/images/edits, /v1/responses | Authorization: Bearer | https://wisdom-gate.juheapi.com/v1 |
| Claude-compatible | /v1/messages | Authorization: Bearer | https://wisdom-gate.juheapi.com/v1 |
Capability matrix by endpoint — the most important table in this article:
| Capability | Gemini Native | OpenAI Compatible | Claude Compatible |
|---|---|---|---|
| Image Search Grounding | ✅ | ❌ | ❌ |
| Thinking Support | ✅ | ❌ | ❌ |
imageConfig (resolution + aspect ratio) | ✅ | ❌ | ❌ |
| Multi-turn image editing | ✅ | ✅ | ✅ |
Image-to-image via inline_data | ✅ | ✅ | ✅ |
| Bidirectional text + image output | ✅ | ✅ | ✅ |
| Batch API | ✅ | ❌ | ❌ |
Developer decision rule: For new integrations, always use the Gemini-native endpoint. OpenAI-compatible and Claude-compatible endpoints exist for teams migrating from existing codebases who cannot immediately refactor their SDK layer. They provide migration convenience — not full capability. Every new production integration that bypasses the Gemini-native endpoint is deliberately trading capability for SDK familiarity.
OpenAI SDK one-line migration to Wisdom Gate:
import openai
client = openai.OpenAI(
api_key="YOUR_WISDOM_GATE_KEY",
base_url="https://wisdom-gate.juheapi.com/v1" # Only this line changes
)
response = client.images.generate(
model="gemini-3.1-flash-image-preview",
prompt="A luxury product photograph, studio lighting, white background",
n=1,
size="1024x1024" # Maps to approximately 1K resolution
)
Note: when using the OpenAI-compatible endpoint, size maps to approximate resolution equivalent. imageConfig parameters for 2K, 4K, or specific aspect ratios are not available on this endpoint. Plan your migration path to the Gemini-native endpoint for any workload requiring resolution control or grounding.
Feature 9 — Batch API Support
Batch API is designed for asynchronous, high-volume image generation jobs where real-time response is not required. The primary use cases are nightly catalog refreshes, bulk asset generation pipelines, and processing large queues of user-submitted generation requests where results are acceptable within hours rather than seconds.
⚠️ Endpoint constraint: Batch API is only available via the Gemini-native endpoint.
When to use Batch API vs synchronous requests:
| Scenario | Recommended Approach |
|---|---|
| Real-time user-facing generation (live preview, on-demand) | Synchronous requests |
| Nightly catalog refresh (500+ images, overnight) | Batch API |
| Offline processing queue (high volume, no real-time SLA) | Batch API |
| Interactive multi-turn editing session | Synchronous multi-turn |
| Same-day marketing asset delivery required | Synchronous requests |
| 1,000+ image batch with flexible delivery window | Batch API |
Operational model: Instead of sending individual synchronous requests and managing responses one by one, Batch API allows submitting a collection of requests in a single call. The API processes them asynchronously and makes results available for retrieval when complete — eliminating the need for custom queue management, retry logic, and concurrent request throttling in the application layer.
For confirmed Batch API pricing on Wisdom Gate, check wisdom-gate.juheapi.com/pricing directly — confirm current rates before building batch cost models into production pricing assumptions.
Developer decision rule: For teams processing more than 1,000 images per run with a delivery window measured in hours rather than seconds, Batch API removes the orchestration burden of managing thousands of concurrent synchronous requests. For anything requiring results in under a minute, synchronous requests remain the correct approach.
Nano Banana 2 Core Features — Integration Architecture Recommendations
Now that every feature has been covered individually, here are the integration architecture decisions that emerge when all features are considered together. These four rules are the distillation of the entire article into actionable engineering decisions.
Rule 1 — Always use the Gemini-native endpoint for new production integrations. The OpenAI-compatible endpoint exists for migration convenience. It does not support Image Search Grounding, Thinking, imageConfig (resolution + aspect ratio control), or Batch API. Any new production integration built on the OpenAI-compatible endpoint as the permanent target is making a deliberate capability trade-off — and that trade-off should be explicit, not accidental.
Rule 2 — Set responseModalities explicitly on every request. If responseModalities is not set, the model may return text-only responses on prompts it interprets as conversational or ambiguous. Setting ["IMAGE"] for pure generation pipelines and ["TEXT", "IMAGE"] for combined workflows eliminates this unpredictability entirely. Make it a mandatory field in every request template, not an optional configuration.
Rule 3 — Match resolution to final output context at generation time. Upscaling a 1K image to 4K introduces quality artifacts. Generating 4K for a thumbnail wastes generation context unnecessarily. The 20-second consistent generation time on Wisdom Gate applies equally to all resolutions — there is no latency penalty for generating at the correct final size. Configure imageConfig.imageSize to match the final delivery destination, not the intermediate processing size.
Rule 4 — Use the 256K context window to eliminate external context management. If your current architecture includes a Redis session store, a chunking middleware layer, or a summarization step specifically to manage model context for image generation sessions — evaluate whether Nano Banana 2's 256K window eliminates the need for it. For most teams managing brand-consistent batch generation or multi-turn editing sessions, it does. Removing a stateful infrastructure dependency simplifies the stack and reduces operational surface area.
For a developer review of how these architecture decisions play out in production, see [Nano Banana 2 review].
With these architecture decisions made, the next step is access. Here is how to get started on Wisdom Gate in under five minutes.
Nano Banana 2 Core Features — Pricing, Access & Getting Started on Wisdom Gate
Pricing comparison:
| Factor | Wisdom Gate | Google Official |
|---|---|---|
| Price per image | $0.058 | $0.068 |
| Annual saving (10K images/mo) | $1,200 | Baseline |
| Annual saving (100K images/mo) | $12,000 | Baseline |
| Generation time (all resolutions) | Consistent 20 sec | Variable |
| Billing options | Subscription + PAYG | Per-product billing |
| Unified key (50+ models) | Yes | No |
On the free tier question: For the full breakdown of trial credits and what [nano banana 2 free] / [is Nano Banana 2 free] actually means in practice, see the dedicated pricing guide. The short version: new Wisdom Gate accounts receive trial API credits. At $0.058/request, these credits cover a complete technical evaluation of every feature in this article before any payment commitment. There is no permanent unlimited free tier — but trial credits are sufficient for production-quality evaluation.
Integration checklist — for developers finishing this article:
- Gemini-native endpoint:
https://wisdom-gate.juheapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent - Auth header:
x-goog-api-key: $WISDOM_GATE_KEY -
responseModalitiesset explicitly on every request -
imageConfigwithaspectRatioandimageSizeconfigured for final output destination -
tools: [{"google_search": {}}]enabled for prompts referencing current events or trends - Full
contentsconversation history passed in multi-turn editing requests -
inline_dataformat confirmed for image-to-image input - Batch API evaluated for offline, high-volume (1,000+ image) processing jobs
- Request timeout configured at 30–35 seconds (20-second generation + buffer)
Quick-start resources:
| Resource | Link |
|---|---|
| AI Studio (no-code testing) | wisdom-gate.juheapi.com/studio/image |
| Get API Key | wisdom-gate.juheapi.com/hall/tokens |
| Nano Banana 2 Model Page | wisdom-gate.juheapi.com/models/gemini-3.1-flash-image-preview |
| Nano Banana Pro Model Page | wisdom-gate.juheapi.com/models/gemini-3-pro-image-preview |
| Developer Documentation | wisdom-docs.juheapi.com/api-reference/image/nanobanana |
| Pricing | wisdom-gate.juheapi.com/pricing |
Nano Banana 2 Core Features — Conclusion
The nine features covered in this article form a coherent capability stack, not a feature checklist. The unified transformer foundation enables bidirectional I/O and Image Search Grounding — capabilities that diffusion models cannot replicate architecturally. The 256K context window enables brand-consistent batch production without external state management. The multi-resolution and aspect ratio controls deliver platform-ready output at generation time without post-processing. Multi-turn editing and image-to-image support complete the production lifecycle from initial concept through iterative refinement to final delivery. Together, these features cover every stage of what a production image generation pipeline actually needs to do.
The most important insight from this article is not any individual feature — it is the endpoint architecture decision. Every nano banana 2 core feature that differentiates the model from diffusion alternatives — grounding, Thinking, resolution control, Batch API — requires the Gemini-native endpoint. For full access to the complete capability set, the Gemini-native endpoint is the only correct choice for new production integrations. Every other endpoint path is a deliberate capability trade-off, and that trade-off should be made consciously.
At $0.058/request on Wisdom Gate, consistent 20-second generation across all resolution tiers, and a nano banana 2 core features stack that covers the full image generation lifecycle from single-image generation to asynchronous batch processing, Nano Banana 2 is engineered for exactly the workloads most production teams actually run. Not the edge cases — the 95% of production image generation that requires speed, affordability, context richness, and reliable output at volume.
The technical picture is complete. The only remaining step is to run the first request with your own prompt, your own API key, and your own use case in scope.
🛠️ Ready to build? You now have the full Nano Banana 2 core features reference. Two paths forward: test immediately in Wisdom Gate AI Studio — no API key, no setup, every feature available to explore — or get your production API key at wisdom-gate.juheapi.com/hall/tokens and run your first request in under five minutes. The Gemini-native endpoint is live, the $0.058 rate is active, and the 20-second clock starts the moment you hit send.