Why I Built the Auto-Regressive Playground

I teach AI strategy to business leaders at IIM Kozhikode. In every session, I ask the same question: "Can someone explain why ChatGPT sometimes makes things up?"

The room goes quiet. These are smart people—CEOs, VPs, functional heads—who use AI daily to draft emails, analyze data, and make decisions. But when I ask them to explain the mechanics, I get vague answers about "algorithms" and "training data."

That's a problem. You can't make sound judgments about when to trust AI, how to prompt it effectively, or where it's likely to fail if you don't understand the basic mechanism of how it generates text.

So I built the Auto-Regressive Playground—a visual tool that lets you watch an AI model think, one word at a time.

The Problem: Black Box Thinking

When you type a question into ChatGPT or Gemini, you see a response appear. It looks intelligent. It sounds confident. But you have no idea what's happening underneath.

Most people assume the AI "understands" their question, "thinks" about the answer, and then writes it out. That mental model is completely wrong—and it leads to bad decisions.

I've seen companies:

Trust AI to write legal contracts without checking for hallucinated clauses
Use AI-generated financial projections in board presentations without understanding when the model was guessing
Deploy customer service chatbots that confidently give wrong information because no one knew how to detect low-confidence responses

The root cause? People treat AI like magic instead of understanding it as a probability engine.

What the Playground Shows You

The Auto-Regressive Playground does one simple thing: it slows down AI text generation so you can see what's actually happening.

Instead of seeing a full response appear instantly, you watch the model generate one word (technically, one "token") at a time. For each word, you see:

What the model is choosing between – The top 5 most likely next words, with their probability scores
How confident it is – A bar chart showing which options are strong candidates versus weak guesses
What it actually picks – The chosen word flies from the chart back into the text
When it's guessing – A warning appears if the model picks a low-probability word (often a sign of hallucination)

You're not learning technical jargon. You're seeing the decision process.

A Real Example

Type in: "The capital of France is"

Watch what happens. The model will show high confidence (maybe 95%) for the word "Paris." That's a solid prediction backed by millions of training examples.

Now try: "The first president of Mars was"

Watch the probabilities. None of the top candidates have high scores. The model is forced to guess because this information doesn't exist. You'll see it pick something that sounds plausible—but the low confidence scores warn you it's making things up.

This is the difference between knowledge and hallucination, made visible.

How to Use the Playground

The tool is built for experimentation. Here's how to get the most out of it:

Start with Atypical Prompts

Don't start with boring questions like "Write an email about quarterly results." That won't teach you anything.

Instead, try prompts that force the model into interesting territory:

"The most surprising thing about Indian business culture is" – Watch how the model navigates a topic with multiple valid completions
"In 1850, the iPhone" – See it struggle with an impossible scenario
"According to the research paper published yesterday" – Watch it try to reference something it can't possibly know

These edge cases reveal the model's limitations more clearly than standard queries.

Experiment with Temperature

Temperature controls how "creative" or "conservative" the model is. But those terms are vague. Here's what's actually happening:

Temperature 0: The model always picks the highest-probability word. Responses are predictable and consistent.
Temperature 1: The model samples from the probability distribution. Lower-probability words get a chance. Responses vary.

Try generating the same prompt at both settings. Watch how the probability bars change and notice which words get selected.

This teaches you when to use each setting in real work—temperature 0 for extracting data from documents, temperature 1 for brainstorming product names.

Look for Low-Confidence Warnings

When the model selects a word with less than 40% probability, you'll see a warning: "Low Probability Detected. High Hallucination Risk."

This is the model saying: "I'm not sure what comes next, but I have to pick something."

Pay attention to when these warnings appear. They often cluster around:

Specific dates, numbers, or names the model doesn't know
Technical jargon from niche domains
Requests for recent information beyond the model's training data

Understanding this pattern changes how you prompt. Instead of asking, "What did the CEO say in yesterday's earnings call?" you learn to provide context: "Here's a transcript of yesterday's earnings call. Summarize the CEO's comments on Q3 guidance."

Why This Matters for Your Work

I'm not building this tool for AI researchers. I'm building it for people who use AI in their work and need to make smart decisions about when to trust it.

After spending 20 minutes with the Playground, you'll develop an intuition that most users don't have:

You'll spot hallucinations before they become problems
You'll write better prompts because you understand what increases model confidence
You'll know when to use deterministic settings versus creative settings
You'll stop treating AI outputs as facts and start treating them as probability distributions

That shift in mindset is the difference between blindly trusting AI and strategically deploying it.

Three Experiments to Try Right Now

Experiment 1: The Impossible Fact

Prompt: "The 2025 Nobel Prize in Economics was awarded to"

Watch the model generate a name. Then check the confidence scores. They'll be low across the board. The model has no idea who won (the announcement hasn't happened yet or wasn't in its training data), but it will confidently complete the sentence anyway.

Lesson: AI doesn't say "I don't know." It guesses.

Experiment 2: The Confident Error

Prompt: "Mumbai is the capital of"

Watch what happens. The model might pick "India" with relatively high confidence—because that's a common phrase pattern—even though the correct answer is "Maharashtra."

Lesson: High confidence doesn't equal correctness. The model learns patterns, not facts.

Experiment 3: The Temperature Swing

Prompt: "The biggest risk facing Indian startups is"

Generate this at temperature 0, then reset and try it at temperature 1. Compare the responses.

At temperature 0, you'll get the most statistically common answer—probably something about funding or regulations.

At temperature 1, you might get something unexpected—maybe talent retention, global competition, or infrastructure challenges.

Lesson: The "best" answer depends on what you're optimizing for—consensus or novelty.

Ready to see how AI really works?

Open the Playground →

What You'll Walk Away With

Using the Playground won't make you an AI engineer. That's not the goal.

The goal is to give you operational intuition—the ability to look at an AI-generated response and immediately sense whether it's drawing from strong patterns or making things up.

You'll learn to ask better questions:

Not "Is this AI answer correct?" but "How confident was the model when it generated this?"
Not "Why did I get a different answer this time?" but "What temperature setting am I using, and is that appropriate for this task?"
Not "Can AI do this job?" but "Under what conditions will this AI system fail, and can I design around those failure modes?"

That's the skillset that separates AI users from AI strategists.

And the best part? It takes 20 minutes to build that intuition. Just open the Playground, try a few weird prompts, and watch what happens.

The gap between using AI and understanding AI is smaller than you think. You just need to see what's actually happening under the hood.