What is Large language model (LLM)?

A large language model is a neural network trained on a massive corpus of text to predict the next token given the previous ones. Modern LLMs have billions to trillions of parameters and can generate coherent prose, answer questions, write code, and perform reasoning tasks well beyond simple text completion.

Also known as: LLM, language model, foundation model

How LLMs work in one paragraph

An LLM takes a sequence of tokens as input and outputs a probability distribution over the next token. Generation works by repeatedly sampling the next token and feeding it back as input. The architecture is almost universally a transformer — a stack of attention layers that let the model weight relationships between every pair of tokens in the input. Training optimizes the model to predict text from massive datasets (typically trillions of tokens).

Training stages

(1) Pre-training — the model learns from raw text on the open internet, books, code, and curated datasets. This produces a base model that can complete text but isn't aligned to be helpful. (2) Supervised fine-tuning (SFT) — train on demonstrations of good assistant behavior. (3) Reinforcement learning from human feedback (RLHF) or similar — train on human or AI preferences to shape the model's outputs toward useful, safe, and honest responses. Modern training pipelines blend many additional stages.

What LLMs are and aren't

They are next-token predictors that learned to model a huge slice of human language and reasoning. They are NOT databases, knowledge bases, or symbolic reasoners. They sometimes appear to know facts because facts are statistically common in training data; they will confidently generate plausible-sounding wrong answers when training data was thin (hallucinations). For factual accuracy, pair LLMs with retrieval (RAG, web search) and verification.

Frontier vs commodity LLMs

Frontier-tier models from the major labs push the state of the art and cost the most. Commodity models (small open-weights, distilled variants) are 10-100× cheaper and often good enough for narrow tasks. Production AI engineering in 2026 increasingly means picking the cheapest model that meets the quality bar, not always reaching for the frontier.

Last updated · First published

Related terms

Try Large language model (LLM) in vMira

Open the workspace and explore — no credit card required.

Open vMira