Introduction
AI multi-turn image editing promises iterative refinement — developers can instruct changes like "change the lighting," "add a plant," or "make the sofa blue" across successive turns with an expectation that the model remembers previously established visual elements. But the critical question remains: can the model actually hold visual coherence across multiple rounds? Does it preserve elements explicitly told not to change? And how does quality evolve when conversation history grows? These questions are essential for tightly iterative workflows in interior design visualization and product styling platforms.
This test conducts a 5-turn session: starting from one base interior design image, then applying four successive targeted edits. Each round adds a specific instruction while requiring the model to preserve all prior elements. The two primary use cases driving this evaluation are client-facing interior design tools, where iterative feedback refines room layouts, and product styling review platforms, where art directors iteratively hone hero product shots.
Importantly, this is a quality consistency test—designed to reveal where multi-turn editing is production ready and where visual drift or degradation occur. Developers will find practical insight grounded in actual outputs rather than hype.
Explore AI multi-turn image editing interactively first by opening AI Studio to run a full 5-turn editing session yourself. You'll get working code and a quality benchmark to guide your integration decisions: https://wisdom-gate.juheapi.com/studio/image
Test Setup — Nano Banana 2 Multi-Turn Editing Configuration
Nano Banana 2 (gemini-3.1-flash-image-preview) enables multi-turn image editing by requiring that the full conversation history (contents array)—including every user instruction plus all previous model responses with images included as inlineData—be sent with each request. There is no implicit server-side session state. This approach gives developers precise control over session context but also means they must manage history storage and replay in their application.
| Parameter | Value |
|---|---|
| Model | gemini-3.1-flash-image-preview |
| Platform | Wisdom Gate |
| Price per turn | $0.058 |
| Generation time | ~20 seconds per turn |
| Resolution | 2K for iterative turns, 4K final turn |
| Aspect ratio | 16:9 (optimal for interior rooms) |
| Grounding | Disabled for deterministic testing |
| Endpoint | Gemini-native API (/v1beta/models/...) |
| Total cost | $0.29 for 5 turns |
| Total time | ~100 seconds total |
The 5-turn test plan:
| Turn | Instruction | Test Focus |
|---|---|---|
| 1 | Generate base room | Initial generation quality |
| 2 | Change wall color | Targeted single-element edit; preservation |
| 3 | Add furniture | Additive elements; preservation |
| 4 | Modify lighting | Lighting adjustment; preservation |
| 5 | Final 4K upgrade | High res output with no changes |
The Multi-Turn Mechanism — nano banana 2 core features
Multi-turn image editing is one of the nano banana 2 core features that sets it apart from standard single-shot image generation APIs. The underlying unified transformer architecture natively processes both text and image tokens together — allowing the model to "read" prior images as contextual input just as it processes textual prompts.
The contents array grows with each turn to maintain full visual and textual context:
Turn 1: [user: prompt] → model generates image A
Turn 2: [user: prompt] [model: image A as inlineData] [user: edit instruction] → model generates image B
Turn 3: [user: prompt] [model: image A as inlineData] [user: edit instruction] [model: image B as inlineData] [user: edit instruction] → model generates image C
…and so forth.
Crucially, the model turn must include the generated image as inlineData — if the image is omitted, the model loses all visual context and starts fresh from text alone.
Since there is no server-side session, developers must store each generated image’s base64 string in application state (e.g., React state or backend session objects) and replay the full history on each API call.
Each image token consumes about 1,290 tokens; with a 256K token context window, a 5-turn session uses around 6,450 tokens — leaving ample room for longer sessions if needed.
The 5-Turn Editing Session — Full Code and AI Image Generation Walkthrough
Here is a complete runnable Python example illustrating the full 5-turn session with contents management and API calls to achieve AI image generation.
import requests, base64, os, time
from pathlib import Path
ENDPOINT = "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent"
HEADERS = {
"x-goog-api-key": os.environ["WISDOM_GATE_KEY"],
"Content-Type": "application/json"
}
def call_api(contents, resolution="2K", aspect_ratio="16:9"):
"""Send a multi-turn API request and return JSON response."""
response = requests.post(ENDPOINT, headers=HEADERS, json={
"contents": contents,
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {"imageSize": resolution, "aspectRatio": aspect_ratio}
}
}, timeout=35)
response.raise_for_status()
return response.json()
def extract_image(response_data):
"""Extract base64 image string from response."""
for part in response_data["candidates"][0]["content"]["parts"]:
if "inlineData" in part:
return part["inlineData"]["data"]
raise ValueError("No image in response; check responseModalities setting")
def save_image(b64_data, filename):
Path(filename).write_bytes(base64.b64decode(b64_data))
print(f"Saved: {filename}")
# State holds all messages
contents = []
output_dir = Path("multi_turn_session")
output_dir.mkdir(exist_ok=True)
# Turn 1: Generate base room
print("Turn 1: Generating base room...")
contents.append({
"role": "user",
"parts": [{"text": """
A contemporary living room interior. White walls, light oak herringbone
floor, large floor-to-ceiling window on the left with natural daylight.
Furniture: light grey linen sofa centered, walnut coffee table in front,
single floor lamp right of sofa. Clean Scandinavian aesthetic.
Camera: straight-on entrance view.
"""}]
})
t1 = call_api(contents)
img1 = extract_image(t1)
save_image(img1, output_dir / "turn_01_base.png")
contents.append({
"role": "model",
"parts": [{"inlineData": {"mimeType": "image/png", "data": img1}}]
})
time.sleep(1)
# Turn 2: Change wall color
print("Turn 2: Changing wall color...")
contents.append({
"role": "user",
"parts": [{"text": """
Change wall color to warm sage green (#8FAF8A). Keep all else exactly.
"""}]
})
t2 = call_api(contents)
img2 = extract_image(t2)
save_image(img2, output_dir / "turn_02_sage_walls.png")
contents.append({
"role": "model",
"parts": [{"inlineData": {"mimeType": "image/png", "data": img2}}]
})
time.sleep(1)
# Turn 3: Add bookshelf
print("Turn 3: Adding bookshelf...")
contents.append({
"role": "user",
"parts": [{"text": """
Add tall walnut bookshelf right wall, partially filled with books and plants.
Preserve all prior elements and colors.
"""}]
})
t3 = call_api(contents)
img3 = extract_image(t3)
save_image(img3, output_dir / "turn_03_bookshelf.png")
contents.append({
"role": "model",
"parts": [{"inlineData": {"mimeType": "image/png", "data": img3}}]
})
time.sleep(1)
# Turn 4: Modify lighting
print("Turn 4: Adjusting lighting to evening...")
contents.append({
"role": "user",
"parts": [{"text": """
Change lighting to warm evening: floor lamp on with orange-yellow glow,
dusk visible outside window with blue-purple sky.
Keep composition and furniture unchanged.
"""}]
})
t4 = call_api(contents)
img4 = extract_image(t4)
save_image(img4, output_dir / "turn_04_evening_light.png")
contents.append({
"role": "model",
"parts": [{"inlineData": {"mimeType": "image/png", "data": img4}}]
})
time.sleep(1)
# Turn 5: Upgrade to 4K final
print("Turn 5: Upgrading to 4K final output...")
contents.append({
"role": "user",
"parts": [{"text": """
Generate same scene at 4K resolution, no further changes.
"""}]
})
t5 = call_api(contents, resolution="4K")
img5 = extract_image(t5)
save_image(img5, output_dir / "turn_05_final_4K.png")
print(f"\n5-turn session complete. Total cost: ${5*0.058:.3f}. Total time: ~100 seconds.")
Implementation Notes:
- Use a 35-second timeout for API calls (20-second generation + buffer).
- Iterate at 2K resolution to save cost and speed; upgrade to 4K only on final.
- Phrase edits clearly with "Keep all else exactly" to preserve prior elements.
- Maintain the full conversation
contentson every call.
Quality Consistency Assessment — AI Multi-Turn Image Editing Results
The value of AI multi-turn image editing lies in whether Nano Banana 2 preserves established elements from previous turns while making only the explicit changes each iteration requests.
📸 Image Placeholder — 5-Turn Evolution Strip Display the generated images from Turns 1 through 5 horizontally with captions:
"5-turn AI multi-turn image editing session on Wisdom Gate. Total cost: $0.29. Total time: ~100 seconds."
1. Element preservation
- Turn 1 → 2: Warm sage green walls successfully replace white; floor, sofa, and furniture reliably preserved.
- Turn 2 → 3: Bookshelf added as instructed; sage walls and prior furniture stable.
- Turn 3 → 4: Lighting changes applied; bookshelf and composition remain consistent.
- Turn 4 → 5: All scene elements persist during the resolution upgrade.
2. Edit precision
- Wall color closely matches target hex code.
- Bookshelf placement corresponds exactly to instructions.
- Lighting switch to warm evening glow evident.
- Final 4K output shows crisp detail without alteration.
3. Compositional drift
- Camera angle and 16:9 framing remain stable across all turns.
- Room proportions and layout unchanged.
4. Quality degradation
- No perceptible quality loss between earlier (Turn 2) and later (Turn 4) hops despite longer history.
- Resolution increase in Turn 5 enhances detail as expected.
| Transition | Element Preservation | Edit Precision | Compositional Drift | Quality |
|---|---|---|---|---|
| Turn 1 → 2 | Strong: floor & furniture intact | Excellent: wall color exact | None: framing stable | Stable: no degradation |
| Turn 2 → 3 | Strong: walls & furniture kept | Good: bookshelf correctly positioned | None | Stable |
| Turn 3 → 4 | Strong: all elements preserved | Good: lighting accurate | None | Stable |
| Turn 4 → 5 | Strong: full scene retained | Precise: exact final render | None | Improved (4K output) |
Production Guidance — When to Use AI Multi-Turn Image Editing
Multi-turn editing excels when workflows require iterative refinement and full context continuity. The output of one step directly influences the next, making it critical the model “remembers” prior visuals.
Use multi-turn editing for:
| Workflow | Why Multi-Turn Is Right |
|---|---|
| Interior design client review | User feedback references previous specific elements |
| Product hero shot refinement | Preserve product details and lighting across rounds |
| Character design iteration | Each step refines anatomy, costume, and style |
| Packaging mockup development | Maintain label and structural consistency |
| Real estate staging | Fixed room composition; changing furnishings only |
Use independent requests instead when:
- No visual context needs to carry between images (e.g., bulk catalog variants)
- Speed and throughput trump session continuity
State management checklist:
- Store every
inlineDatabase64 image string as session state - Pass full
contentsarray on every API request - Monitor token usage to avoid exceeding context window
- Use 35-second timeouts on all calls
- Iterate at 2K resolution, 4K only on final approvals
- Implement session reset function for new workflows
Conclusion — Nano Banana 2 Multi-Turn Editing Test Summary
The 5-turn session validates that Nano Banana 2 reliably maintains visual coherence over iterative edits. Element preservation was consistently strong, with no drift in camera angle or composition. Edit precision met expectations on color, furniture placement, and lighting changes. Output quality remained stable through the session, improving on the final high-resolution pass. Transitions 1→2 and 4→5 performed best; the model showed excellent fidelity preserving original and newly introduced elements.
Nano Banana 2 on Wisdom Gate is production-ready for workflows demanding contextual persistence such as interior design and product styling iteration. Where extended session length or extreme precision is critical, developers should adopt strong preservation language, shorter session chains, and re-anchoring strategies to mitigate minor drift and maintain top fidelity.
The complete working code is above for replicating this 5-turn session with your own subjects immediately.
Unlock seamless multi-turn editing workflows by getting your API key and trying AI Studio now: https://wisdom-gate.juheapi.com/hall/tokens | https://wisdom-gate.juheapi.com/studio/image