How Nano Banana Pro Handles Images: Editing, Captioning, and Visual Reasoning

Why Nano Banana Pro for Images

Visual AI teams need a single API that can caption, reason about, and modify images—without juggling multiple vendors. Nano Banana Pro, accessed through Wisdom Gate, offers a practical, unified path for image workflows: captioning, visual Q&A, editing, inpainting, style transfer, and image generation. With the gemini-3-pro-image-preview model, you get robust multi-modal capabilities tuned for production.

Nano Banana image API: one endpoint for analysis and generation
Gemini image analysis API: strong perception and reasoning on complex scenes
Image generation Nano Banana: creative synthesis with style control

Capabilities at a Glance

Captioning and Description

Generate concise or detailed captions
Extract attributes (objects, colors, brands) and scene context
Multi-caption variants for A/B testing

Visual Reasoning

Answer questions grounded in the image (VQA)
Step-by-step rationale (when requested) while keeping outputs concise
Detect inconsistencies or anomalies

Editing and Inpainting

Text-driven edits (remove/replace objects, adjust colors/lighting)
Inpainting for masked or described regions
Iterative refine loops: propose edit → preview → improve

Style Transfer via Wisdom Gate

Apply reference styles (artists, genres, brand palettes)
Balance style vs. content preservation
Batch processing for design pipelines

Architecture and Model Choice

Nano Banana Pro uses Wisdom Gate as its hosted runtime.

Base URL: https://wisdom-gate.juheapi.com/v1
Model: gemini-3-pro-image-preview
Endpoint: /chat/completions
Strengths: robust perception, flexible prompting, high-quality generation

The same chat endpoint supports text prompts that refer to images, enabling captioning, reasoning, and edit instructions in one place.

API Basics: Endpoint, Auth, and Request Shape

You can start with a minimal POST to the chat completions endpoint. The example below shows image generation using the Nano Banana image API.

curl --location --request POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
--header 'Authorization: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--header 'Accept: */*' \
--header 'Host: wisdom-gate.juheapi.com' \
--header 'Connection: keep-alive' \
--data-raw '{
    "model":"gemini-3-pro-image-preview",
    "messages": [
      {
        "role": "user",
        "content": "Draw a stunning sea world."
      }
    ]
}'

Practical notes:

Use production API keys with scoped permissions
Keep prompts deterministic with clear constraints
Store response metadata (request ID, timestamps) for tracing

Image Captioning Workflows

Captioning is foundational for search, accessibility, and analytics.

Single Caption with Key Attributes

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Describe this image: https://example.com/images/street-scene.jpg. Provide one sentence plus 5 key attributes (objects, colors, time of day, mood)."}
  ]
}'

Multi-Caption Variants for A/B Tests

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "From https://example.com/product.jpg, create 3 concise captions targeting e-commerce thumbnails: limit 60 chars, focus on benefits, avoid buzzwords."}
  ]
}'

Accessibility-Friendly Alt Text

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Create descriptive alt text for https://example.com/infographic.png. Prioritize clarity over style; mention axes, units, and trends."}
  ]
}'

Visual Reasoning: Grounded Q&A

Use visual Q&A to validate layouts, detect errors, or assist support workflows.

Direct Questions about an Image

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Answer: Is the warning icon visible in https://example.com/dashboard.png? If yes, where is it located relative to the header?"}
  ]
}'

Validate Brand Guidelines

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Check https://example.com/marketing-banner.jpg for brand compliance: colors within palette (blue/teal), logo clear at 24px minimum, no text below 14pt."}
  ]
}'

Structured Outputs (Text-Only)

When you need structured results (e.g., key-value facts), ask the model to format plain text with labels.

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "From https://example.com/receipt.jpg, extract: Merchant, Date, Total, Currency. Output as lines: Merchant: ..., Date: ..., Total: ..., Currency: ..."}
  ]
}'

Editing and Inpainting: Prompt Recipes

Wisdom Gate enables text-driven edits. For inpainting, describe the region and the desired replacement clearly.

Remove an Object (Text-Driven)

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Edit https://example.com/portrait.jpg: remove the microphone in the lower-right corner; reconstruct background naturally; keep lighting and skin tones consistent."}
  ]
}'

Inpaint a Region by Description

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Inpaint https://example.com/beach.jpg: replace the sky area with a warm sunset gradient; keep horizon straight; avoid halos around trees."}
  ]
}'

Color Grading and Style Harmonization

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Edit https://example.com/fashion.jpg: apply teal-orange cinematic grade; reduce highlights by 20%; preserve skin tones; output as a print-ready look."}
  ]
}'

Style Transfer via Wisdom Gate

Style transfer is a powerful way to align visuals with brand or creative goals.

Apply a Named Artistic Style

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Style transfer for https://example.com/city.jpg: emulate impressionist brushwork, soft edges, pastel palette; retain building geometry; avoid heavy distortion."}
  ]
}'

Reference Style by URL

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Transfer the style from https://example.com/style-reference.png onto https://example.com/content.jpg; keep faces recognizable; avoid artifacts on edges."}
  ]
}'

Image Generation with Constraints

The Nano Banana image API supports generation directly from text prompts via /chat/completions.

Controlled Generation

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Generate a product hero shot: matte-black wireless earbuds on neutral backdrop; soft rim light; minimal reflections; format suitable for 1200x800 crop."}
  ]
}'

Creative Scene Synthesis

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Draw a stunning sea world: coral reefs, colorful fish, volumetric light rays, balanced composition, crisp details, natural palette."}
  ]
}'

Prompt Engineering Patterns

Be Explicit about Constraints

Specify size, aspect ratio, lighting, and color intent in prompts
Describe forbidden elements (e.g., no watermarks, no logos)
Use measurable targets (e.g., "reduce highlights by 20%")

Iterate with Short Feedback Loops

Request a preview or summary of the planned edit before full synthesis
Ask for 2–3 variations to compare
Capture and version prompts and outputs for reproducibility

Balanced Detail

Enough detail for control, not so much that prompts contradict
Reserve style descriptors for dedicated sections of the prompt

Quality, Latency, and Cost Considerations

Throughput and Caching

Cache captions for duplicate images to reduce spend
Reuse embeddings of common assets when applicable
Batch operations where feasible

Latency

Keep prompts short; avoid unnecessary verbosity
Prefer concise attribute lists over long prose
Process non-critical tasks asynchronously

Cost Control

Use lightweight descriptions for internal tooling
Reserve high-detail edits for final outputs
Monitor high-volume endpoints with budgets

Safety and Compliance

Content Policies

Detect sensitive content before editing or publishing
Avoid style transfers that mimic protected artwork without rights
Include audit logs for compliance reviews

Privacy

Strip EXIF/location data from outputs as needed
Anonymize faces when required by policy
Document data retention periods and deletion flows

Evaluation and Benchmarking

Metrics to Track

Caption quality: human ratings, CTR on thumbnails
Visual reasoning accuracy: question-level correctness
Editing fidelity: user approval rates and rework counts

Reproducibility

Fixed prompt templates
Version control for test sets
Regular regression checks on new model releases

Production Tips

Robustness

Add retries for transient errors
Validate URLs before sending to the API
Sanitize prompt text (no hidden characters)

Monitoring

Capture request/response sizes and durations
Alert on elevated error codes
Track output drift against benchmarks

Governance

Role-based access for API keys
Approval workflows for style libraries
Incident playbooks for content issues

Troubleshooting

Common Pitfalls

Overly vague edit instructions → add explicit constraints
Conflicting style tags → pick 2–3 coherent descriptors
Hallucinated attributes → request a confidence note or a second pass

Debugging Steps

Start with a minimal prompt and grow complexity
Use captions first, then guide edits based on the model’s perception
Compare variants and gather feedback from target users

Roadmap and Getting Started

Nano Banana Pro with Wisdom Gate is designed for unified image tasks—from captioning and visual reasoning to editing, inpainting, style transfer, and generation. The gemini-3-pro-image-preview model brings strong perception and creative control.

Next steps for your team:

Pilot captioning and visual Q&A on a subset of assets
Design prompt templates for editing and style transfer
Build an experimentation harness to compare outputs and track metrics

When you’re ready, move the workflows into production with caching, monitoring, and cost controls—and continuously refine prompts using measurable goals.

How Nano Banana Pro Handles Images: Editing, Captioning, and Visual Reasoning

Why Nano Banana Pro for Images

Capabilities at a Glance

Captioning and Description

Visual Reasoning

Editing and Inpainting

Style Transfer via Wisdom Gate

Architecture and Model Choice

API Basics: Endpoint, Auth, and Request Shape

Image Captioning Workflows

Single Caption with Key Attributes

Multi-Caption Variants for A/B Tests

Accessibility-Friendly Alt Text

Visual Reasoning: Grounded Q&A

Direct Questions about an Image

Validate Brand Guidelines

Structured Outputs (Text-Only)

Editing and Inpainting: Prompt Recipes

Remove an Object (Text-Driven)

Inpaint a Region by Description

Color Grading and Style Harmonization

Style Transfer via Wisdom Gate

Apply a Named Artistic Style

Reference Style by URL

Image Generation with Constraints

Controlled Generation

Creative Scene Synthesis

Prompt Engineering Patterns

Be Explicit about Constraints

Iterate with Short Feedback Loops

Balanced Detail

Quality, Latency, and Cost Considerations

Throughput and Caching

Latency

Cost Control

Safety and Compliance

Content Policies

Privacy

Evaluation and Benchmarking

Metrics to Track

Reproducibility

Production Tips

Robustness

Monitoring

Governance

Troubleshooting

Common Pitfalls

Debugging Steps

Roadmap and Getting Started

Table of Contents