JUHE API Marketplace

How Nano Banana Pro Handles Images: Editing, Captioning, and Visual Reasoning

9 min read

Why Nano Banana Pro for Images

Visual AI teams need a single API that can caption, reason about, and modify images—without juggling multiple vendors. Nano Banana Pro, accessed through Wisdom Gate, offers a practical, unified path for image workflows: captioning, visual Q&A, editing, inpainting, style transfer, and image generation. With the gemini-3-pro-image-preview model, you get robust multi-modal capabilities tuned for production.

  • Nano Banana image API: one endpoint for analysis and generation
  • Gemini image analysis API: strong perception and reasoning on complex scenes
  • Image generation Nano Banana: creative synthesis with style control

Capabilities at a Glance

Captioning and Description

  • Generate concise or detailed captions
  • Extract attributes (objects, colors, brands) and scene context
  • Multi-caption variants for A/B testing

Visual Reasoning

  • Answer questions grounded in the image (VQA)
  • Step-by-step rationale (when requested) while keeping outputs concise
  • Detect inconsistencies or anomalies

Editing and Inpainting

  • Text-driven edits (remove/replace objects, adjust colors/lighting)
  • Inpainting for masked or described regions
  • Iterative refine loops: propose edit → preview → improve

Style Transfer via Wisdom Gate

  • Apply reference styles (artists, genres, brand palettes)
  • Balance style vs. content preservation
  • Batch processing for design pipelines

Architecture and Model Choice

Nano Banana Pro uses Wisdom Gate as its hosted runtime.

  • Base URL: https://wisdom-gate.juheapi.com/v1
  • Model: gemini-3-pro-image-preview
  • Endpoint: /chat/completions
  • Strengths: robust perception, flexible prompting, high-quality generation

The same chat endpoint supports text prompts that refer to images, enabling captioning, reasoning, and edit instructions in one place.

API Basics: Endpoint, Auth, and Request Shape

You can start with a minimal POST to the chat completions endpoint. The example below shows image generation using the Nano Banana image API.

curl --location --request POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
--header 'Authorization: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--header 'Accept: */*' \
--header 'Host: wisdom-gate.juheapi.com' \
--header 'Connection: keep-alive' \
--data-raw '{
    "model":"gemini-3-pro-image-preview",
    "messages": [
      {
        "role": "user",
        "content": "Draw a stunning sea world."
      }
    ]
}'

Practical notes:

  • Use production API keys with scoped permissions
  • Keep prompts deterministic with clear constraints
  • Store response metadata (request ID, timestamps) for tracing

Image Captioning Workflows

Captioning is foundational for search, accessibility, and analytics.

Single Caption with Key Attributes

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Describe this image: https://example.com/images/street-scene.jpg. Provide one sentence plus 5 key attributes (objects, colors, time of day, mood)."}
  ]
}'

Multi-Caption Variants for A/B Tests

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "From https://example.com/product.jpg, create 3 concise captions targeting e-commerce thumbnails: limit 60 chars, focus on benefits, avoid buzzwords."}
  ]
}'

Accessibility-Friendly Alt Text

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Create descriptive alt text for https://example.com/infographic.png. Prioritize clarity over style; mention axes, units, and trends."}
  ]
}'

Visual Reasoning: Grounded Q&A

Use visual Q&A to validate layouts, detect errors, or assist support workflows.

Direct Questions about an Image

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Answer: Is the warning icon visible in https://example.com/dashboard.png? If yes, where is it located relative to the header?"}
  ]
}'

Validate Brand Guidelines

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Check https://example.com/marketing-banner.jpg for brand compliance: colors within palette (blue/teal), logo clear at 24px minimum, no text below 14pt."}
  ]
}'

Structured Outputs (Text-Only)

When you need structured results (e.g., key-value facts), ask the model to format plain text with labels.

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "From https://example.com/receipt.jpg, extract: Merchant, Date, Total, Currency. Output as lines: Merchant: ..., Date: ..., Total: ..., Currency: ..."}
  ]
}'

Editing and Inpainting: Prompt Recipes

Wisdom Gate enables text-driven edits. For inpainting, describe the region and the desired replacement clearly.

Remove an Object (Text-Driven)

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Edit https://example.com/portrait.jpg: remove the microphone in the lower-right corner; reconstruct background naturally; keep lighting and skin tones consistent."}
  ]
}'

Inpaint a Region by Description

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Inpaint https://example.com/beach.jpg: replace the sky area with a warm sunset gradient; keep horizon straight; avoid halos around trees."}
  ]
}'

Color Grading and Style Harmonization

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Edit https://example.com/fashion.jpg: apply teal-orange cinematic grade; reduce highlights by 20%; preserve skin tones; output as a print-ready look."}
  ]
}'

Style Transfer via Wisdom Gate

Style transfer is a powerful way to align visuals with brand or creative goals.

Apply a Named Artistic Style

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Style transfer for https://example.com/city.jpg: emulate impressionist brushwork, soft edges, pastel palette; retain building geometry; avoid heavy distortion."}
  ]
}'

Reference Style by URL

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Transfer the style from https://example.com/style-reference.png onto https://example.com/content.jpg; keep faces recognizable; avoid artifacts on edges."}
  ]
}'

Image Generation with Constraints

The Nano Banana image API supports generation directly from text prompts via /chat/completions.

Controlled Generation

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Generate a product hero shot: matte-black wireless earbuds on neutral backdrop; soft rim light; minimal reflections; format suitable for 1200x800 crop."}
  ]
}'

Creative Scene Synthesis

curl -X POST 'https://wisdom-gate.juheapi.com/v1/chat/completions' \
-H 'Authorization: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
--data-raw '{
  "model": "gemini-3-pro-image-preview",
  "messages": [
    {"role": "user", "content": "Draw a stunning sea world: coral reefs, colorful fish, volumetric light rays, balanced composition, crisp details, natural palette."}
  ]
}'

Prompt Engineering Patterns

Be Explicit about Constraints

  • Specify size, aspect ratio, lighting, and color intent in prompts
  • Describe forbidden elements (e.g., no watermarks, no logos)
  • Use measurable targets (e.g., "reduce highlights by 20%")

Iterate with Short Feedback Loops

  • Request a preview or summary of the planned edit before full synthesis
  • Ask for 2–3 variations to compare
  • Capture and version prompts and outputs for reproducibility

Balanced Detail

  • Enough detail for control, not so much that prompts contradict
  • Reserve style descriptors for dedicated sections of the prompt

Quality, Latency, and Cost Considerations

Throughput and Caching

  • Cache captions for duplicate images to reduce spend
  • Reuse embeddings of common assets when applicable
  • Batch operations where feasible

Latency

  • Keep prompts short; avoid unnecessary verbosity
  • Prefer concise attribute lists over long prose
  • Process non-critical tasks asynchronously

Cost Control

  • Use lightweight descriptions for internal tooling
  • Reserve high-detail edits for final outputs
  • Monitor high-volume endpoints with budgets

Safety and Compliance

Content Policies

  • Detect sensitive content before editing or publishing
  • Avoid style transfers that mimic protected artwork without rights
  • Include audit logs for compliance reviews

Privacy

  • Strip EXIF/location data from outputs as needed
  • Anonymize faces when required by policy
  • Document data retention periods and deletion flows

Evaluation and Benchmarking

Metrics to Track

  • Caption quality: human ratings, CTR on thumbnails
  • Visual reasoning accuracy: question-level correctness
  • Editing fidelity: user approval rates and rework counts

Reproducibility

  • Fixed prompt templates
  • Version control for test sets
  • Regular regression checks on new model releases

Production Tips

Robustness

  • Add retries for transient errors
  • Validate URLs before sending to the API
  • Sanitize prompt text (no hidden characters)

Monitoring

  • Capture request/response sizes and durations
  • Alert on elevated error codes
  • Track output drift against benchmarks

Governance

  • Role-based access for API keys
  • Approval workflows for style libraries
  • Incident playbooks for content issues

Troubleshooting

Common Pitfalls

  • Overly vague edit instructions → add explicit constraints
  • Conflicting style tags → pick 2–3 coherent descriptors
  • Hallucinated attributes → request a confidence note or a second pass

Debugging Steps

  • Start with a minimal prompt and grow complexity
  • Use captions first, then guide edits based on the model’s perception
  • Compare variants and gather feedback from target users

Roadmap and Getting Started

Nano Banana Pro with Wisdom Gate is designed for unified image tasks—from captioning and visual reasoning to editing, inpainting, style transfer, and generation. The gemini-3-pro-image-preview model brings strong perception and creative control.

Next steps for your team:

  • Pilot captioning and visual Q&A on a subset of assets
  • Design prompt templates for editing and style transfer
  • Build an experimentation harness to compare outputs and track metrics

When you’re ready, move the workflows into production with caching, monitoring, and cost controls—and continuously refine prompts using measurable goals.

How Nano Banana Pro Handles Images: Editing, Captioning, and Visual Reasoning | JuheAPI Blog