JuheAPI Blog

Build a CLI AI Chatbot in 20 Lines: WisGate API Tutorial

11 min buffer
By Olivia Bennett

Build a CLI AI Chatbot in 20 Lines: WisGate API Tutorial

If you want to see how little code it takes to build a multi-turn CLI chatbot, this tutorial walks you from terminal input to streamed output in one clean Python script. You'll also see how to switch models without changing the app structure.

What You'll Build

This tutorial guides you through building a Python CLI chatbot that reads prompts directly from your terminal, maintains conversation history across multiple turns, and streams responses token by token as they arrive. The chatbot remembers what you've said before, so context flows naturally from one exchange to the next. You'll type a question, hit enter, and watch the assistant's reply appear in real time on your screen.

The final script is roughly 20 lines of Python—small enough to understand at a glance, yet functional enough to handle real multi-turn conversations. No complex frameworks, no external dependencies beyond the WisGate Python client. Just straightforward code that demonstrates how to wire up terminal input, API calls, and streamed output.

What makes this build special is that the same script works with different AI models. You can run it with Claude, GPT, or any other model available through WisGate, simply by passing a --model flag. All your usage gets billed in one place, and you benefit from WisGate's pricing, which is typically 20–50% lower than official model pricing.

Why Use WisGate for a CLI Chatbot

Building a CLI chatbot is straightforward, but managing which model you're calling and how you're billed can get messy fast. WisGate solves this by offering one unified API that routes to multiple top-tier models and consolidates all billing into a single dashboard.

When you build with WisGate, you're not locked into one model provider. You can test your chatbot against claude-opus-4-6 one moment and gpt-5 the next, all without rewriting your application logic. This flexibility is invaluable during development—you can compare model outputs, find the best fit for your use case, and optimize cost without architectural changes.

Because WisGate handles routing and billing centrally, you don't need separate API keys for each provider or separate invoices to reconcile. Everything flows through one account. This simplicity scales whether you're running a personal project or managing multiple chatbots across a team.

Multi-model access with one codebase

The power of WisGate becomes clear when you realize that your chatbot code doesn't need to know which model it's calling. You pass the model name as a parameter—either hardcoded or via a command-line flag—and the same script works everywhere.

For example, you might start with claude-opus-4-6 for its reasoning capabilities, then switch to gpt-5 to compare speed or cost. Your Python code stays identical. The only thing that changes is the model string you send to the API. This means you can A/B test models, migrate between providers, or support multiple models simultaneously without duplicating code or maintaining separate branches.

Developers often spend weeks refactoring when they want to switch models. With WisGate, it's a one-line change in your configuration or a flag at runtime. That's the practical advantage of a unified API layer.

Cost-conscious routing and billing

Model pricing varies widely, and keeping track of costs across multiple providers is tedious. WisGate consolidates this by offering transparent pricing on the WisGate Models page, where rates are typically 20–50% lower than official pricing. When you run your CLI chatbot through WisGate, every token you consume is billed at these optimized rates.

For a typical CLI chatbot session—a few exchanges with modest context—you're looking at an estimated cost of about $0.001 per session. That's negligible for personal projects and scales affordably for production use. Because all usage is billed in one place, you can track spending easily and set budgets without juggling multiple vendor accounts.

Prerequisites

Before you start coding, make sure you have the basics in place. This tutorial assumes you're comfortable with Python and have a terminal open. You'll need Python installed locally and API access through WisGate.

Python environment

You'll need Python 3.8 or later installed on your machine. Check your version by running:

python --version

If you don't have Python installed, download it from python.org. Once installed, you can verify the WisGate Python client is available by running:

pip install wisgate

This installs the official WisGate Python library, which handles API communication, streaming, and error handling for you.

WisGate API access

You'll need a WisGate API key to authenticate your requests. Sign up for a free trial at https://wisgate.ai/ to get started. Once you have an account, generate an API key from your dashboard and store it as an environment variable:

export WISGATE_API_KEY="your-api-key-here"

On Windows, use:

set WISGATE_API_KEY=your-api-key-here

The script will read this variable automatically, so you don't need to hardcode credentials in your code.

The 20-Line Python CLI Chatbot

Here's the complete script. It's minimal, readable, and fully functional:

python
import os
import sys
from wisgate import WisGate

api_key = os.getenv("WISGATE_API_KEY")
model = sys.argv[2] if len(sys.argv) > 2 and sys.argv[1] == "--model" else "claude-opus-4-6"
client = WisGate(api_key=api_key)
messages = []

print(f"Chat with {model}. Type 'exit' to quit.\n")

while True:
    user_input = input("You: ").strip()
    if user_input.lower() == "exit":
        break
    
    messages.append({"role": "user", "content": user_input})
    
    response_text = ""
    for chunk in client.messages.create(model=model, messages=messages, stream=True):
        if chunk.type == "content_block_delta":
            token = chunk.delta.text
            print(token, end="", flush=True)
            response_text += token
    
    print()
    messages.append({"role": "assistant", "content": response_text})

That's it. Twenty lines of Python that give you a fully functional multi-turn CLI chatbot with streaming output and model switching.

Reading user input from the terminal

The script starts by importing the necessary modules and setting up the WisGate client. The input() function reads a line from your terminal, and .strip() removes any trailing whitespace. This is your entry point for every user message.

When you run the script and type a prompt, that text becomes the user's message. The script captures it, adds it to the conversation history, and sends it to the API. This happens in a loop, so you can keep typing prompts one after another without restarting the script.

Maintaining conversation history

Conversation history is stored in the messages list. Each message is a dictionary with a role (either "user" or "assistant") and content (the actual text). Before you send a new prompt to the API, you append it to this list. After the assistant responds, you append that response too.

This means every API call includes the full conversation so far. The model sees your previous questions and its own previous answers, which allows it to maintain context and provide coherent, relevant responses. Without this history, each exchange would be isolated, and the chatbot would have no memory of what came before.

Streaming assistant responses to stdout

Streaming is where the real-time feel comes from. Instead of waiting for the entire response to arrive before printing it, the script prints each token as it arrives. The stream=True parameter tells the API to send tokens incrementally.

The loop iterates over each chunk, checks if it's a content block delta (the actual text), extracts the token, and prints it immediately with flush=True to ensure it appears on screen right away. This creates the illusion of the assistant "thinking" in real time, which feels more natural than a long pause followed by a wall of text.

Switching models with --model

Model selection happens via a command-line flag. When you run the script, you can pass --model claude-opus-4-6 or --model gpt-5 to choose which model to use. The script checks sys.argv to see if the flag is present and extracts the model name.

If no flag is provided, it defaults to claude-opus-4-6. This means you can run the script as-is for a quick test, or specify a different model when you want to compare outputs or optimize for cost.

How the Chat Loop Works

Understanding the flow of a single exchange helps you see why the script is so compact yet effective. Let's trace one complete interaction from start to finish.

User prompt in, assistant response out

You type a question and press enter. The input() function captures your text, and it's added to the messages list with the role "user". The script then calls client.messages.create() with the full message history and the selected model.

The API processes your prompt in context of the conversation so far and begins streaming tokens back. Each token is printed to stdout as it arrives, building up the response word by word. Once the stream ends, the complete response is appended to messages with the role "assistant".

Now the loop repeats. You type another prompt, and the API sees both your first question and the assistant's first answer, plus your new question. This is how context flows through the conversation.

Why history matters in multi-turn chat

Without history, a chatbot is just a one-shot question-answering tool. With history, it becomes a conversation partner. If you ask "What is Python?" and then follow up with "How do I install it?", the chatbot understands that "it" refers to Python because it has seen the previous exchange.

This is especially important for complex topics where you're building on previous answers. You might ask for an explanation, then ask for an example, then ask how to optimize it. Each question makes sense only in the context of what came before. By keeping the full message history and sending it with every API call, the chatbot maintains this continuity.

Running the Script with Different Models

Once you've saved the script, you can run it in different ways depending on which model you want to use. Here are concrete examples.

Using claude-opus-4-6

To run the chatbot with Claude Opus, use:

python chatbot.py --model claude-opus-4-6

The script will print a confirmation message showing which model is active, then drop you into the chat loop. Type your prompts and watch the responses stream in real time. Claude Opus is known for strong reasoning and detailed explanations, making it a good choice for technical questions or complex problem-solving.

Using gpt-5

To switch to GPT-5, run:

python chatbot.py --model gpt-5

Same script, same interface, different model. GPT-5 may have different strengths or pricing characteristics. By running the same chatbot against both models, you can compare their outputs and decide which works best for your use case.

You can also run without specifying a model:

python chatbot.py

This defaults to claude-opus-4-6, so you get a working chatbot immediately without any flags.

Cost Expectations for a Session

One of the biggest questions developers ask is: how much will this cost? For a CLI chatbot, the answer is reassuringly small.

What affects the total cost

API cost depends on two factors: the number of tokens you send and receive, and the pricing of the model you're using. A typical CLI chat session might involve a few exchanges—maybe five to ten prompts and responses—with modest context length.

For example, if you have a conversation with five user prompts averaging 50 tokens each and five assistant responses averaging 100 tokens each, you're looking at roughly 750 tokens total. At WisGate's pricing for claude-opus-4-6, which is typically 20–50% lower than official pricing, this works out to approximately $0.001 per session.

If you run the chatbot for an hour with dozens of exchanges, costs will accumulate, but they remain predictable and low. Because all usage is billed in one place through WisGate, you can track spending easily and set alerts if needed.

Where to Check Model Pricing

For the most current pricing and a complete list of available models, visit the WisGate Models page at https://wisgate.ai/models. This page shows real-time rates for every model WisGate supports, including claude-opus-4-6, gpt-5, and many others.

You'll see that WisGate's pricing is typically 20–50% lower than official model pricing, which adds up quickly if you're running multiple chatbots or high-volume applications. The Models page also lets you filter by capability, cost, or provider, so you can find the best model for your specific needs.

Wrap-Up and Next Steps

You now have a working multi-turn CLI chatbot in 20 lines of Python. It reads from your terminal, maintains conversation history, streams responses in real time, and lets you switch models with a simple flag. The code is minimal, readable, and ready to extend.

From here, you could add features like saving conversations to a file, implementing custom system prompts, or building a web interface on top of the same API calls. But the core is solid and production-ready.

Try the WisGate free trial at https://wisgate.ai/ and test the same chatbot with different models from the WisGate Models page: https://wisgate.ai/models. You'll see firsthand how easy it is to build, deploy, and scale AI-powered applications when you have one unified API and transparent pricing.

Tags:Python API CLI
Build a CLI AI Chatbot in 20 Lines: WisGate API Tutorial | JuheAPI