Agentic Architectures

June 20, 2025 · Jerome Gill

The Misconception: ChatGPT Is “Just an LLM”

A common misconception is that ChatGPT is an LLM — that when you chat with it, you’re interacting directly with a raw large language model like GPT-4 or GPT-5.
In reality, ChatGPT is not the model itself, but a chatbot interface built around one. It wraps a large language model with layers of preprocessing, orchestration, and safety systems that transform a raw model into a reliable conversational product.

Understanding the distinction between an LLM and a chatbot is crucial to understanding why ChatGPT behaves the way it does.

What an LLM Actually Is

A large language model (LLM) is a statistical engine trained to predict text.
Given a sequence of tokens (words, subwords, or symbols), it predicts the next one based on patterns learned from enormous datasets.

If you run a base model like GPT-J, Llama, or Mistral locally, you’ll see this in action:
you type a prompt, and the model generates text, token by token, according to its learned probabilities.

There’s no understanding of who you are, no memory, no safety filters, and no structured reasoning pipeline.
An unwrapped LLM is a text completion engine — powerful, but is best thought of as a highly compressed form of the training data. The illusion of ‘intelligence’ is produced by pre and post processing on the prompt and output the LLM.

What ChatGPT Actually Does

ChatGPT, by contrast, is a chatbot product built on top of a language model.
It presents a conversational interface, but behind the scenes, several layers of preprocessing, reasoning, and agentic control operate around each user input.

When you send a message to ChatGPT:

Input Preprocessing
Your text is normalized, context windows are managed, and sometimes additional hidden system instructions (the “system prompt”) are prepended to guide behavior — e.g., tone, format, or safety constraints. There’s a lot going on that is unclear. For example, the prompt may be translated to and from english.
Context Assembly
The system dynamically retrieves relevant history, tools, or resources (like code interpreters, search, or plug-ins) to form a complete “context package” before sending it to the core model.
Model Inference (LLM Step)
The actual GPT model (e.g., GPT-5) generates an output given this curated prompt and context.
Post-Processing with Agentic Logic
The response may be filtered, formatted, or augmented — for example, by adding citations, trimming unsafe outputs, or inserting markdown. Facts and dates may be parsed and updated using API calls and other tricks.

What you see — a clean, conversational reply — is the end result of a pipeline of steps, not the raw output of an unconstrained model.

The Local Model Analogy

To illustrate the difference, imagine two setups:

Scenario A: Running an LLM Locally
You load a model like llama-3-8b on your computer, open a terminal, and type:

“Write a story about a robot on Mars.”
The model immediately begins generating tokens. It may ramble, ignore formatting, or produce unpredictable results. It does exactly what the model probabilities dictate — nothing more.
Scenario B: ChatGPT
You type the same request into ChatGPT. Before the model even runs, your message is structured, your conversation history is retrieved, and system instructions shape the behavior (“be concise,” “avoid sensitive content,” etc.).
The system may even access extra tools — for instance, to retrieve data or execute code.
The model is one part of a much larger agentic and orchestrated system.

Why ChatGPT Uses a Chatbot Interface

The chat interface is familiar and intuitive. It hides the underlying complexity of orchestration — the preprocessing, tool calls, and safety layers — making the interaction feel conversational and natural.

Underneath that simplicity, ChatGPT operates more like a semi-autonomous agent:
it reasons, decides when to act, formats outputs, and maintains structured memory (within a session or persistent memory, if enabled).

That’s why calling ChatGPT “an LLM” misses the point.
It’s powered by an LLM, but what you experience is an engineered system — a composite of the model, prompt engineering, parsing of the user’s prompt, post processing.