What Is Nano Banana Pro? A Practical Guide to Multi-Image & Reference-Based Generation

With Nano Banana Pro, you can generate a strikingly realistic banana image — just describe what you want, and let it do the rest.

But generating the same banana, again and again, across variations. Different angles, different backgrounds, different ad sizes, different styles. That is where most image pipelines quietly fall apart.

Because the moment you need consistency, real consistency, you hit the usual problems:

The banana changes shape between shots.
The sticker disappears or morphs.
The color shifts from yellow to neon yellow to slightly green.
The scene looks right, but the subject drifts.
Or the subject is right, but the scene is a mess.

And in real products this is not a cute edge case. This is catalogs, ad creatives, avatars, mockups, game assets, “try it in my room” previews. Consistency reduces manual editing and review loops, which is basically the hidden tax on every “AI generated images” feature.

Nano Banana Pro is built for ultra-consistent, 4K-quality image generation. Feed it 2 reference images — your subject (the banana) and your scene or style (the kitchen counter, the studio lighting, the illustration look) — and it outputs a single, razor-sharp image that faithfully preserves every detail from both references.

This guide is practical. What it is, how it works, what you can build with it, and how to access nano banana on JUHE API with a simple banana example you can actually run.

What Is Nano Banana Pro? (Multi-image, reference-based image generation)

Nano Banana Pro is a model that generates a new image conditioned on two input images:

Subject reference (what must stay consistent)
Style or scene reference (the look, composition, lighting, environment)

Then you add a text prompt. The model synthesizes an output that tries to preserve identity and shape cues from the subject while adopting the composition and art direction from the second reference.

Developer translation: you stop asking a text only model to “please keep the same banana.” Instead, you give it the banana.

The core workflow developers care about

You provide:

Image A: subject reference (your banana photo)
Image B: style or scene reference (kitchen counter, studio backdrop, illustrated palette)
Text prompt: the intent and constraints

And you get:

A new image where the banana is still your banana, but placed and rendered in the second image’s vibe.

Why not just use text-only or single reference?

Text-only models: great at variety, bad at “keep this exact thing consistent.” Even with seeds, you often get drift.
Single-reference models: can keep a subject OR match a style, but blending subject + scene reliably is harder than it sounds.
Nano Banana Pro is designed for the two-image constraint use case. Subject plus scene. Two references, one output.

This innovative approach aligns with recent advancements in multi-image generation, allowing for more nuanced and contextually appropriate outputs.

Banana mental model (keep this in your head)

Image 1: your banana photo (subject)
Image 2: a kitchen counter scene (style, lighting)
Output: your banana, in that kitchen, with that lighting.

What "2 pictures / images" means in practice

It means two separate image inputs. Not a single collage you made in Photoshop. Usually not "two images glued together." The order matters because the roles matter.

You will typically pass something like:

subject_image: banana
reference_image: scene/style

If you swap them, expect weird results. Sometimes interesting. Usually not what you want.

How Multi-Image & Reference Generation works (in developer terms)

No fluff version: the model uses both images as constraints. Your prompt tells it what to change vs what must stay consistent.

Inputs (the three levers)

1. Subject reference image

This provides identity, shape, texture, and details. For a banana: silhouette, curvature, sticker, minor bruising, stem shape.

2. Style or scene reference image

This provides background, lighting direction, contrast, lens look, and composition. For a banana: "moody studio countertop with softbox lighting" or "flat illustration palette".

3. Text prompt

This conveys your intent and constraints:

Intent ("place the banana into the scene")
Constraints ("keep sticker", "no extra fruit", "match lighting", "natural shadow")
Negatives if supported (depends on the endpoint)

Typical output behavior you should expect

Subject fidelity gets stronger when the subject image is clear, high resolution, and uncluttered.
Style transfer gets stronger when the style image is distinctive and opinionated.
Your prompt controls tradeoffs, but it cannot fully rescue bad references.

A simple internal rule that actually works:

Keep the subject reference clean. Keep the style reference opinionated.

Practical constraints to plan for

These vary by API, but in production you should assume:

Resolution and file size limits (you will need pre-resize)
Aspect ratio constraints (square outputs are often easiest)
Content safety filters (you should surface moderation failures clearly)
Latency and throughput (batching, async jobs, retries)
Cost per generation (you will want caching and dedupe)

Key Capabilities (what you can reliably build with Nano Banana Pro)

1. Multi-Image & Reference Generation (2-image conditioning)

This is the differentiator. Accepting 2 pictures / images lets you combine subject + style/scene in a single generation step, instead of doing awkward multi-step workflows.

2. High fidelity to reference images

When developers say “fidelity,” they usually mean:

The banana stays the banana across variants.
Texture and color don’t drift randomly.
Fewer artifacts like melted edges, phantom objects, or “almost the same” shapes.

3. Practical compositing for product workflows

This is the bread and butter use case.

Banana in a lunchbox.
Banana on a countertop.
Banana in a clean ecommerce white sweep, but with brand lighting.
Banana in a “holiday campaign” scene without manually compositing.

4. Creative pipeline acceleration

You keep one stable subject reference and swap only the second image to test art direction fast.

Same banana, different scenes:

Studio product shot
Cozy kitchen
Outdoor picnic
Minimal flat illustration

5. App-ready integration

This works as:

A user-facing feature (upload two images, generate)
Internal tooling (marketing team generating variants)
Batch jobs (catalog refresh, campaign variants)

Where Nano Banana Pro fits in your architecture (common integration patterns)

Pattern A: User-driven generation

Flow:

User uploads 2 images (subject + style)
You validate, resize, normalize
Call API
Return result + store output

You will want guardrails here. File size, allowed formats, and basic abuse prevention.

Pattern B: Template-driven generation

This is cleaner for brand consistency.

You store approved style/scene references (templates).
User uploads only the subject (banana photo).
You pair it with a template scene image.

This reduces chaos. And it gives you consistent outputs that actually match your brand lighting and composition rules.

Pattern C: Batch generation for catalogs and ads

Typical flow:

Queue jobs per subject image
Cycle through a set of style references to generate variants
Store outputs, link to SKU, send to review

Operational notes you will eventually need:

Cache uploaded images (or store them in your own object storage)
Store outputs with metadata
Handle retries and timeouts
Use idempotency keys if supported (or implement your own job dedupe)

How to Access Nano Banana Pro via Wisdom Gate API

If you do not want to host models or stitch multiple vendors together, Wisdom Gate is the access layer here.

You can call Nano Banana Pro on Wisdom Gate API via: https://wisdom-gate.juheapi.com/

The basic developer steps look like this:

Create a Wisdom Gate account
Find the Nano Banana Pro endpoint in the models
Get your API key
Review pricing, rate limits, and request limits
Build a small test call with two images and a prompt

What you need before calling:

Wisdom Gate API Key
Two input images (subject + style/scene)
A prompt

Call flow overview:

Upload or encode images
POST to the model endpoint
Parse the response
Download or store the output image

API Call Example: Generating a New Banana Image Using 2 Input Images

Goal: Use Nano Banana Pro on JUHE API to generate a new image by combining 2 input pictures — a banana subject photo and a scene/style reference.

Image 1: banana subject photo (the object to place)
Image 2: scene or style reference (the environment/lighting to match)

All three examples below call the same endpoint on JUHE API. Your $WISDOM_GATE_KEY is the API key you get after registering at Wisdom Gate.

1. Basic Example: Generate a Banana Image from a Text Prompt

The simplest starting point. Describe the banana image you want and the model generates it directly — no input image needed.

curl -s -X POST \
  "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{
        "text": "Da Vinci style anatomical sketch of a dissected Monarch butterfly. Detailed drawings of the head, wings, and legs on textured parchment with notes in English."
      }]
    }],
    "tools": [{"google_search": {}}],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "imageConfig": {
        "aspectRatio": "1:1",
        "imageSize": "1K"
      }
    }
  }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | head -1 | base64 --decode > butterfly.png

2. Image-to-Image Generation: Restyle Your Banana Photo

Got an existing banana photo? Pass it as inline_data alongside a text prompt to transform it — change the style, background, or lighting while keeping the banana as the subject.

curl -s -X POST \
  "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [
        { "text": "cat" },
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "BASE64_DATA_HERE"
          }
        }
      ]
    }],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"]
    }
  }'

3. Multi-Image & Reference Generation: Combine 2 Pictures into One

This is the core capability of Nano Banana Pro. Pass 2 pictures as inline_data parts in the same request — one banana subject photo and one scene reference — and the model generates a new image that faithfully places the banana into the reference scene.

gemini-3-pro-image-preview supports up to 14 reference images in a single request:

Up to 6 images of objects with high-fidelity
Up to 5 images of humans to maintain character consistency

curl -s -X POST \
  "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [
        { "text": "An office group photo of these people, they are making funny faces." },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_1" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_2" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_3" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_4" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_5" } }
      ]
    }],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "imageConfig": {
        "aspectRatio": "5:4",
        "imageSize": "1K"
      }
    }
  }'

For our banana use case: replace BASE64_IMG_1 with your base64-encoded banana subject photo, and BASE64_IMG_2 with your scene/style reference image. Update the text prompt to describe the desired output — for example: "Place the banana from Image 1 into the scene from Image 2. Match the lighting and perspective. Natural shadow. No additional fruit."

Request format (what to send)

What you will typically set:

Headers

Authorization or whatever JUHE documents for the specific endpoint
Content-Type depends on multipart vs JSON

Body fields

prompt
subject_image (base64)
reference_image (base64)
size or aspect_ratio
n (number of outputs)
seed (optional, if supported)
Any "strength" or "style weight" controls, if the endpoint exposes them

Validation tips that save you time:

Enforce a max file size at upload time
Normalize to JPG or PNG
Pre-resize to the recommended dimensions so latency and cost are predictable
Strip metadata if you have privacy requirements

Response format (what you get back)

A success response often includes:

request_id
status
output image in base64.
metadata like dimensions and format

Example shape (illustrative):

json { "request_id": "req_abc123", "status": "succeeded", "outputs": [ { "url": "https://.../output/banana_in_kitchen.png", "width": 1024, "height": 1024, "format": "png" } ] }

Async vs sync: some endpoints return a job id first and you poll until complete. If JUHE's Nano Banana Pro endpoint is async, build a simple poller with a timeout budget.

Error handling you should plan for:

Invalid image or corrupted file
Unsupported format
Prompt blocked by moderation
Rate limited
Timeout

Map these to actionable logs. "400 invalid image" is not enough. Log sizes, mime types, and the pipeline step that produced the image.

Practical use case: “banana 2 pictures images” walkthrough (subject + scene/style)

Here’s the cleanest first test you can run. It tells you immediately if the model is doing what you need.

Step 1: Pick the two references

Image 1 (subject): a clear banana photo on a plain background.

Single banana. No hand. No fruit bowl. No clutter.

Image 2 (style/scene): a moody studio countertop scene. Or a kitchen counter with obvious window light. Something with strong lighting cues.

If you do not have your own images yet, grab placeholders from a stock site just for the test. In production you will obviously use your own.

Suggested image slots in your doc or internal demo:

(Those are just stand-ins so you can visualize the “2 pictures / images” concept. Replace with your own inputs when you actually test.)

Step 2: Use a prompt you would actually ship

Prompt template you can start with:

Use Image 1 as the banana subject reference. Use Image 2 as the scene/style reference. Generate a photorealistic image that places the banana into the scene, matching lighting and perspective. Keep the banana’s shape, texture, and color consistent with Image 1. Keep the sticker if present. Add a natural shadow on the surface. No additional objects unless specified. No extra fruit.

That one prompt covers most of what goes wrong.

Step 3: Run variants (same banana, different intent)

Packaging or mockup variant:

Use Image 1 as the banana subject reference. Use Image 2 as the scene/style reference. Place the banana on the cutting board. Add a small brand tag next to it with the text “BANANA CO”. Keep everything else identical. Match lighting and perspective. Natural shadow. No extra fruit.

Creative style variant (useful for games, icons, onboarding screens):

Use Image 1 as the banana subject reference. Use Image 2 as the style reference. Render as a clean flat illustration while preserving the banana silhouette from Image 1 and the color palette and lighting mood from Image 2. No extra objects. Plain background if needed.

If you are evaluating consistency, do one more thing: keep everything the same and change only the scene reference image. That tells you whether you can build a “swap templates, keep subject” workflow.

Why JUHE API

If you are building a product, you usually want to spend time on product logic. Upload UX, storage, queueing, review flows, permissions. Not model hosting.

JUHE is a practical choice because it gives you:

A single platform to procure and call models
Unified billing and keys (less vendor sprawl)
Docs and a stable integration surface
Faster time to first call than self-hosting

And for this specific case, it means you can access nano banana on JUHE API with minimal setup and focus on the part that matters. Your app.

Implementation notes: getting better results with 2 reference images (without guesswork)

A lot of "bad model output" is just bad inputs. Here are the rules that save you hours.

Choose better subject references

High resolution, in focus
Single subject, minimal occlusion
Consistent angle if you need a series
Avoid reflections and busy textures until you have the pipeline stable

For a banana: do not start with a banana in a fruit bowl with grapes and apples. You are basically inviting the model to invent extra fruit.

Choose better scene/style references

Strong lighting cues (directional light, clear shadows)
Clear perspective
Avoid clutter if you want clean placement

The scene image is your art director. Pick one that knows what it wants.

Common failure modes and fixes

Subject drift (banana shape changes)

Fix: cleaner subject reference, tighter prompt, increase subject prominence (crop closer), reduce clutter

Style overpowering (banana turns into something else)

Fix: tone down stylization words in prompt, choose a less extreme style reference, use a more realistic scene image

Mismatched perspective

Fix: pick a closer scene reference to your subject camera angle. If your banana is shot top-down, do not use a dramatic low-angle countertop reference.

Production hygiene (boring, but important)

Store:

Original inputs (both images)
Prompt text
Seed and parameters (if available)
Output image
Model version or endpoint version

Version your prompt templates. Seriously. The day you change one adjective and your catalog shifts, you will want to know why.

And for critical outputs, keep a human review step. At least until you have confidence in the failure modes.

To further enhance your results, consider implementing some proven strategies such as those found in this consistent character method. Additionally, utilizing structured templates like the one discussed in this ChatGPT image prompt template can significantly improve your outcomes.

Call to action: Try Nano Banana Pro via JUHE API

Your next step is simple:

Create an account at https://www.juheapi.com/
Find the Nano Banana Pro image generation endpoint
Run the banana two-image test with 2 pictures / images
Evaluate results against a minimal checklist

Minimal evaluation checklist:

Does the banana stay consistent with the subject reference?
Does the scene lighting and composition match the style reference?
What is the throughput and latency for your target size?
What is your cost per usable image after rejections?

If the answer is “yes, mostly” on the first two, you are already ahead of most text-only pipelines. That is the point of Multi-Image & Reference Generation. You stop fighting randomness and start building a repeatable system.

FAQs (Frequently Asked Questions)

What is Nano Banana Pro and how does it improve image generation?

Nano Banana Pro is a multi-image, reference-based image generation model that creates new images conditioned on two input images: a subject reference (e.g., a banana) and a style or scene reference (e.g., kitchen lighting). This approach ensures high consistency and fidelity across variations, overcoming common issues like shape changes, color shifts, or background inconsistencies in AI-generated images.

Why is using two reference images better than text-only or single-reference models?

Text-only models often struggle to maintain exact subject consistency, resulting in drift across variations. Single-reference models can preserve either the subject or the style but have difficulty blending both reliably. Nano Banana Pro's two-image conditioning allows simultaneous preservation of the subject’s identity and adoption of the desired style or scene, producing more consistent and contextually appropriate outputs.

How does Nano Banana Pro's workflow work for developers?

Developers provide three inputs: Image A as the subject reference (the object to keep consistent), Image B as the style or scene reference (background, lighting, environment), and a text prompt describing intent and constraints. The model synthesizes an output where the subject remains consistent but is rendered within the style or scene of the second image.

What practical considerations should I keep in mind when using Nano Banana Pro via API?

When integrating Nano Banana Pro through APIs like JUHE, consider resolution and file size limits requiring pre-resizing, aspect ratio constraints favoring square outputs, content safety filters necessitating clear moderation handling, latency and throughput factors such as batching and retries, and cost management strategies including caching and deduplication.

What kind of outputs can I expect from Nano Banana Pro regarding fidelity and style transfer?

Outputs typically maintain strong subject fidelity when the subject image is clear and high resolution. Style transfer effectiveness increases with distinctive and opinionated style images. While prompts help control tradeoffs between subject consistency and style adoption, they cannot fully compensate for poor-quality reference images. Clean subject references paired with opinionated style references yield the best results.

What applications can benefit from using Nano Banana Pro's multi-image generation capabilities?

Nano Banana Pro is ideal for creating consistent catalogs, ad creatives, avatars, mockups, game assets, and augmented reality previews like 'try it in my room.' Its ability to maintain subject consistency across various scenes reduces manual editing efforts and review loops commonly associated with AI-generated image features.