How Large Language Models Work

Most people use ChatGPT, Claude, or Gemini daily without understanding what is actually happening when they type a message. That gap matters, because the way LLMs work directly explains why certain prompts succeed and others fail.

You do not need to understand the mathematics. But understanding the mechanics will make you a significantly better prompt engineer.

What an LLM is actually doing

At its core, an LLM is a very sophisticated autocomplete system. Given a sequence of words, it predicts what word should come next, based on patterns it has seen across enormous amounts of text.

Unlike the autocomplete on your phone, which suggests the next word in a message, an LLM can do this across thousands of words in a row, maintaining context and coherence throughout. That extended prediction capability is what makes it appear to reason, explain, translate, and write.

It is worth stating clearly: the model is not understanding your text the way a human does. It is recognizing patterns and predicting statistically likely continuations. That distinction explains a lot about when LLMs are reliable and when they are not.

A useful analogy: chefs and kitchens

Think of a standard computer program as a traditional chef working from a fixed recipe. You give the chef a recipe and ingredients, and they follow it exactly every time. Same recipe, same dish. The output is perfectly predictable but limited to what the recipe specifies.

A machine learning model is more like an apprentice chef who was never given recipes, but instead tasted thousands of finished dishes and worked backward to figure out how they were made. The apprentice develops their own intuitions about what goes with what, which techniques produce which results, and how to adapt when ingredients are different.

An LLM extends this further. Imagine that apprentice chef had access to the complete written culinary knowledge of the entire world: every cookbook, every food blog, every restaurant review, every recipe in every language. The scale of that exposure is what gives LLMs their breadth. The limitation is that it is still pattern recognition, not understanding. Ask the chef to explain why salt enhances flavor and they will give you a plausible-sounding answer, but it is drawn from texts they have read, not from chemical knowledge.

Pre-training and fine-tuning

LLMs are built in two stages, and understanding them explains the difference between a general-purpose model and a specialized one.

Pre-training: learning language at scale

During pre-training, the model processes a massive corpus of text, ranging from books and news articles to websites, scientific papers, and forums. It learns by predicting the next word in each sequence, adjusting its internal parameters each time it gets the prediction wrong. After billions of these adjustments, the model has internalized the structure of language: grammar, syntax, factual associations, and a rough model of how concepts relate to each other.

Fine-tuning: specializing for specific uses

After pre-training, the model can be fine-tuned on a smaller, targeted dataset. A legal research assistant might be fine-tuned on legal documents. A medical inquiry tool might be fine-tuned on clinical journals. Fine-tuning adjusts the model's outputs to better fit a specific domain without starting from scratch. ChatGPT, for example, was also fine-tuned using human feedback to make it behave more helpfully and avoid harmful outputs, a process called RLHF (Reinforcement Learning from Human Feedback).

What LLMs are good at, and where they fail

Because LLMs work by pattern matching across text, they are genuinely strong at tasks that are primarily linguistic:

-Summarizing, rewriting, and translating existing content
-Generating structured text: reports, emails, code, outlines
-Answering questions that have well-established written answers
-Explaining concepts by drawing on explanations in their training data
-Adapting tone and format to a specified audience

They are less reliable when the task requires: precise arithmetic, real-time information they were not trained on, logical deduction chains longer than a few steps, or knowledge of facts that are rare or underrepresented in their training data.

Crucially, LLMs can sound confident even when they are wrong. The model does not know what it does not know. It generates the most statistically plausible continuation of your prompt, which can be a fluent and authoritative-sounding answer that is factually incorrect. For any output where accuracy matters, verify against a primary source.

Why this changes how you should prompt

Knowing that LLMs predict text based on patterns explains several things that confuse new users:

Vague prompts produce vague outputs.
If your prompt could plausibly continue in ten directions, the model picks a statistically common one, which is rarely the specific one you wanted.
Context shifts the prediction.
Adding context to your prompt changes what patterns the model draws on. "Write a summary" and "Write a 150-word summary for a CEO who has 30 seconds to read it" produce very different outputs because they activate different patterns.
Examples are more reliable than instructions.
Showing the model what you want with a concrete example is often more effective than describing it in the abstract, because the model can pattern-match to the example directly.

The context window: what the model can see

Every LLM has a context window: the maximum amount of text it can process in a single interaction. Everything inside that window, your prompt, any text you paste in, and the conversation history, is what the model has access to. Everything outside it does not exist from the model's perspective.

Context windows are measured in tokens, roughly three-quarters of a word each. A model with a 128,000-token context window can process roughly 96,000 words in one session. That is enough to hold a short book, an entire codebase, or many hours of conversation.

In practice, the context window shapes several things you need to know about:

Long conversations degrade at the edges.

As a conversation grows, earlier messages get less attention from the model. If you are doing a complex multi-turn task, restating key instructions partway through the session produces more consistent results than assuming the model is tracking everything from message one.

Pasting large documents requires care.

If you paste a 30-page document and ask a question about it, the model will attempt to answer from the full text. But performance can degrade for questions whose answers appear early in a very long document, because the model's attention is not perfectly uniform across the context. For critical extractions, it is worth asking the model to quote the relevant passage alongside its answer.

Each new chat session starts empty.

There is no memory between separate conversations by default. The model has no knowledge of what you discussed yesterday. If continuity matters, you need to include the relevant context in the new session's first message, or use a tool that provides persistent memory.

Common misconceptions about LLMs

Myth: The model understands me.

Reality: It predicts statistically likely continuations of your input. When the output seems understanding, it is because your input matched patterns from text written by humans who understood similar situations. The process is predictive, not comprehending.

Myth: More detail always helps.

Reality: Detail helps when it narrows the model's range of possible responses. Irrelevant detail adds noise. A 500-word prompt full of background the model does not need will not produce a better output than a 50-word prompt with only the necessary context.

Myth: If the model is confident, it is correct.

Reality: Confidence in generated text is a stylistic property, not an epistemic one. The model produces authoritative-sounding text because that is what authoritative text looks like in its training data. It has no mechanism to flag uncertainty about factual claims unless prompted to do so explicitly.

Myth: The model has opinions.

Reality: Responses that look like opinions are pattern matches to opinionated text in the training data. When you ask "What do you think?", the model generates text that plausibly follows from what a thinking entity would say in that context. This is not the same as having a viewpoint.

What to do now

The next time ChatGPT gives you a response that is off-target, ask yourself: given what the model is actually doing, which direction did my prompt probably point it? Then rewrite the prompt to narrow the range of plausible continuations. You will find the problem immediately.

Practical Prompt Engineering by Vajo Lukic covers LLM capabilities and limitations in detail, with practical techniques for working with the model's pattern-matching nature rather than against it, including zero-shot, few-shot, and chain-of-thought prompting, plus 250+ ready-to-use templates. Get the book here or read a free sample.

Get the Book Read Free Sample

Related guides

What Is Prompt Engineering?- The core skill that builds on this foundation
ChatGPT for Beginners: Getting Better Results- Practical starting point for new users
Zero-Shot vs Few-Shot Prompting Explained- The two foundational prompt strategies