JUHE API Marketplace

Nano Banana 2 Multi-Turn Editing Test: Iterating on a Single Image Across 5 Conversation Rounds

10 min read
By Chloe Anderson

Introduction

AI multi-turn image editing promises iterative refinement — developers can instruct changes like "change the lighting," "add a plant," or "make the sofa blue" across successive turns with an expectation that the model remembers previously established visual elements. But the critical question remains: can the model actually hold visual coherence across multiple rounds? Does it preserve elements explicitly told not to change? And how does quality evolve when conversation history grows? These questions are essential for tightly iterative workflows in interior design visualization and product styling platforms.

This test conducts a 5-turn session: starting from one base interior design image, then applying four successive targeted edits. Each round adds a specific instruction while requiring the model to preserve all prior elements. The two primary use cases driving this evaluation are client-facing interior design tools, where iterative feedback refines room layouts, and product styling review platforms, where art directors iteratively hone hero product shots.

Importantly, this is a quality consistency test—designed to reveal where multi-turn editing is production ready and where visual drift or degradation occur. Developers will find practical insight grounded in actual outputs rather than hype.

Explore AI multi-turn image editing interactively first by opening AI Studio to run a full 5-turn editing session yourself. You'll get working code and a quality benchmark to guide your integration decisions: https://wisdom-gate.juheapi.com/studio/image

Test Setup — Nano Banana 2 Multi-Turn Editing Configuration

Nano Banana 2 (gemini-3.1-flash-image-preview) enables multi-turn image editing by requiring that the full conversation history (contents array)—including every user instruction plus all previous model responses with images included as inlineData—be sent with each request. There is no implicit server-side session state. This approach gives developers precise control over session context but also means they must manage history storage and replay in their application.

ParameterValue
Modelgemini-3.1-flash-image-preview
PlatformWisdom Gate
Price per turn$0.058
Generation time~20 seconds per turn
Resolution2K for iterative turns, 4K final turn
Aspect ratio16:9 (optimal for interior rooms)
GroundingDisabled for deterministic testing
EndpointGemini-native API (/v1beta/models/...)
Total cost$0.29 for 5 turns
Total time~100 seconds total

The 5-turn test plan:

TurnInstructionTest Focus
1Generate base roomInitial generation quality
2Change wall colorTargeted single-element edit; preservation
3Add furnitureAdditive elements; preservation
4Modify lightingLighting adjustment; preservation
5Final 4K upgradeHigh res output with no changes

The Multi-Turn Mechanism — nano banana 2 core features

Multi-turn image editing is one of the nano banana 2 core features that sets it apart from standard single-shot image generation APIs. The underlying unified transformer architecture natively processes both text and image tokens together — allowing the model to "read" prior images as contextual input just as it processes textual prompts.

The contents array grows with each turn to maintain full visual and textual context:

Turn 1: [user: prompt] → model generates image A

Turn 2: [user: prompt] [model: image A as inlineData] [user: edit instruction] → model generates image B

Turn 3: [user: prompt] [model: image A as inlineData] [user: edit instruction] [model: image B as inlineData] [user: edit instruction] → model generates image C

…and so forth.

Crucially, the model turn must include the generated image as inlineData — if the image is omitted, the model loses all visual context and starts fresh from text alone.

Since there is no server-side session, developers must store each generated image’s base64 string in application state (e.g., React state or backend session objects) and replay the full history on each API call.

Each image token consumes about 1,290 tokens; with a 256K token context window, a 5-turn session uses around 6,450 tokens — leaving ample room for longer sessions if needed.

The 5-Turn Editing Session — Full Code and AI Image Generation Walkthrough

Here is a complete runnable Python example illustrating the full 5-turn session with contents management and API calls to achieve AI image generation.

python
import requests, base64, os, time
from pathlib import Path

ENDPOINT = "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent"
HEADERS = {
    "x-goog-api-key": os.environ["WISDOM_GATE_KEY"],
    "Content-Type": "application/json"
}

def call_api(contents, resolution="2K", aspect_ratio="16:9"):
    """Send a multi-turn API request and return JSON response."""
    response = requests.post(ENDPOINT, headers=HEADERS, json={
        "contents": contents,
        "generationConfig": {
            "responseModalities": ["IMAGE"],
            "imageConfig": {"imageSize": resolution, "aspectRatio": aspect_ratio}
        }
    }, timeout=35)
    response.raise_for_status()
    return response.json()

def extract_image(response_data):
    """Extract base64 image string from response."""
    for part in response_data["candidates"][0]["content"]["parts"]:
        if "inlineData" in part:
            return part["inlineData"]["data"]
    raise ValueError("No image in response; check responseModalities setting")

def save_image(b64_data, filename):
    Path(filename).write_bytes(base64.b64decode(b64_data))
    print(f"Saved: {filename}")

# State holds all messages
contents = []
output_dir = Path("multi_turn_session")
output_dir.mkdir(exist_ok=True)

# Turn 1: Generate base room
print("Turn 1: Generating base room...")
contents.append({
    "role": "user",
    "parts": [{"text": """
        A contemporary living room interior. White walls, light oak herringbone
        floor, large floor-to-ceiling window on the left with natural daylight.
        Furniture: light grey linen sofa centered, walnut coffee table in front,
        single floor lamp right of sofa. Clean Scandinavian aesthetic.
        Camera: straight-on entrance view.
    """}]
})
t1 = call_api(contents)
img1 = extract_image(t1)
save_image(img1, output_dir / "turn_01_base.png")
contents.append({
    "role": "model",
    "parts": [{"inlineData": {"mimeType": "image/png", "data": img1}}]
})
time.sleep(1)

# Turn 2: Change wall color
print("Turn 2: Changing wall color...")
contents.append({
    "role": "user",
    "parts": [{"text": """
        Change wall color to warm sage green (#8FAF8A). Keep all else exactly.
    """}]
})
t2 = call_api(contents)
img2 = extract_image(t2)
save_image(img2, output_dir / "turn_02_sage_walls.png")
contents.append({
    "role": "model",
    "parts": [{"inlineData": {"mimeType": "image/png", "data": img2}}]
})
time.sleep(1)

# Turn 3: Add bookshelf
print("Turn 3: Adding bookshelf...")
contents.append({
    "role": "user",
    "parts": [{"text": """
        Add tall walnut bookshelf right wall, partially filled with books and plants.
        Preserve all prior elements and colors.
    """}]
})
t3 = call_api(contents)
img3 = extract_image(t3)
save_image(img3, output_dir / "turn_03_bookshelf.png")
contents.append({
    "role": "model",
    "parts": [{"inlineData": {"mimeType": "image/png", "data": img3}}]
})
time.sleep(1)

# Turn 4: Modify lighting
print("Turn 4: Adjusting lighting to evening...")
contents.append({
    "role": "user",
    "parts": [{"text": """
        Change lighting to warm evening: floor lamp on with orange-yellow glow,
        dusk visible outside window with blue-purple sky.
        Keep composition and furniture unchanged.
    """}]
})
t4 = call_api(contents)
img4 = extract_image(t4)
save_image(img4, output_dir / "turn_04_evening_light.png")
contents.append({
    "role": "model",
    "parts": [{"inlineData": {"mimeType": "image/png", "data": img4}}]
})
time.sleep(1)

# Turn 5: Upgrade to 4K final
print("Turn 5: Upgrading to 4K final output...")
contents.append({
    "role": "user",
    "parts": [{"text": """
        Generate same scene at 4K resolution, no further changes.
    """}]
})
t5 = call_api(contents, resolution="4K")
img5 = extract_image(t5)
save_image(img5, output_dir / "turn_05_final_4K.png")

print(f"\n5-turn session complete. Total cost: ${5*0.058:.3f}. Total time: ~100 seconds.")

Implementation Notes:

  • Use a 35-second timeout for API calls (20-second generation + buffer).
  • Iterate at 2K resolution to save cost and speed; upgrade to 4K only on final.
  • Phrase edits clearly with "Keep all else exactly" to preserve prior elements.
  • Maintain the full conversation contents on every call.

Quality Consistency Assessment — AI Multi-Turn Image Editing Results

The value of AI multi-turn image editing lies in whether Nano Banana 2 preserves established elements from previous turns while making only the explicit changes each iteration requests.

📸 Image Placeholder — 5-Turn Evolution Strip Display the generated images from Turns 1 through 5 horizontally with captions:

"5-turn AI multi-turn image editing session on Wisdom Gate. Total cost: $0.29. Total time: ~100 seconds."

1. Element preservation

  • Turn 1 → 2: Warm sage green walls successfully replace white; floor, sofa, and furniture reliably preserved.
  • Turn 2 → 3: Bookshelf added as instructed; sage walls and prior furniture stable.
  • Turn 3 → 4: Lighting changes applied; bookshelf and composition remain consistent.
  • Turn 4 → 5: All scene elements persist during the resolution upgrade.

2. Edit precision

  • Wall color closely matches target hex code.
  • Bookshelf placement corresponds exactly to instructions.
  • Lighting switch to warm evening glow evident.
  • Final 4K output shows crisp detail without alteration.

3. Compositional drift

  • Camera angle and 16:9 framing remain stable across all turns.
  • Room proportions and layout unchanged.

4. Quality degradation

  • No perceptible quality loss between earlier (Turn 2) and later (Turn 4) hops despite longer history.
  • Resolution increase in Turn 5 enhances detail as expected.
TransitionElement PreservationEdit PrecisionCompositional DriftQuality
Turn 1 → 2Strong: floor & furniture intactExcellent: wall color exactNone: framing stableStable: no degradation
Turn 2 → 3Strong: walls & furniture keptGood: bookshelf correctly positionedNoneStable
Turn 3 → 4Strong: all elements preservedGood: lighting accurateNoneStable
Turn 4 → 5Strong: full scene retainedPrecise: exact final renderNoneImproved (4K output)

Production Guidance — When to Use AI Multi-Turn Image Editing

Multi-turn editing excels when workflows require iterative refinement and full context continuity. The output of one step directly influences the next, making it critical the model “remembers” prior visuals.

Use multi-turn editing for:

WorkflowWhy Multi-Turn Is Right
Interior design client reviewUser feedback references previous specific elements
Product hero shot refinementPreserve product details and lighting across rounds
Character design iterationEach step refines anatomy, costume, and style
Packaging mockup developmentMaintain label and structural consistency
Real estate stagingFixed room composition; changing furnishings only

Use independent requests instead when:

  • No visual context needs to carry between images (e.g., bulk catalog variants)
  • Speed and throughput trump session continuity

State management checklist:

  • Store every inlineData base64 image string as session state
  • Pass full contents array on every API request
  • Monitor token usage to avoid exceeding context window
  • Use 35-second timeouts on all calls
  • Iterate at 2K resolution, 4K only on final approvals
  • Implement session reset function for new workflows

Conclusion — Nano Banana 2 Multi-Turn Editing Test Summary

The 5-turn session validates that Nano Banana 2 reliably maintains visual coherence over iterative edits. Element preservation was consistently strong, with no drift in camera angle or composition. Edit precision met expectations on color, furniture placement, and lighting changes. Output quality remained stable through the session, improving on the final high-resolution pass. Transitions 1→2 and 4→5 performed best; the model showed excellent fidelity preserving original and newly introduced elements.

Nano Banana 2 on Wisdom Gate is production-ready for workflows demanding contextual persistence such as interior design and product styling iteration. Where extended session length or extreme precision is critical, developers should adopt strong preservation language, shorter session chains, and re-anchoring strategies to mitigate minor drift and maintain top fidelity.

The complete working code is above for replicating this 5-turn session with your own subjects immediately.

Unlock seamless multi-turn editing workflows by getting your API key and trying AI Studio now: https://wisdom-gate.juheapi.com/hall/tokens | https://wisdom-gate.juheapi.com/studio/image

Nano Banana 2 Multi-Turn Editing Test: Iterating on a Single Image Across 5 Conversation Rounds | JuheAPI