What is Embedding?

An embedding is a fixed-length vector of floating-point numbers that represents the semantic meaning of a piece of text (or image, audio, etc.). Two embeddings are similar (small cosine distance) when their source texts are semantically similar — even if the words differ.

Also known as: vector embedding, text embedding, semantic embedding

Why embeddings matter

Embeddings let you compare text by meaning, not by exact word match. "Order status" and "Where is my package?" have very different words but their embeddings are close — so a search system using embeddings will find the right FAQ entry even when the user phrases the question differently than the entry's title.

How they're produced

A dedicated embedding model (separate from the chat LLM) reads the input text and emits a vector — typically 768, 1536, or 3072 dimensions. The model has been trained on a contrastive objective so that semantically related text pairs produce similar vectors. Embedding models are cheap and fast compared to chat models.

Where they're used

(1) Semantic search — index documents as vectors, query with a vector, return nearest neighbors. (2) RAG — the retrieval half of retrieval-augmented generation. (3) Clustering and recommendation — group similar items. (4) Deduplication — find near-duplicate content. (5) Classification — train a small classifier on top of embeddings for fast routing decisions.

Choosing an embedding model

Trade-offs: larger embeddings (more dimensions) are more accurate but more expensive to store and search. Multilingual embeddings work across languages but are larger. Domain-specific embeddings (code, legal, medical) can beat general embeddings for specialized corpora. Benchmark on your actual data — MTEB rankings are a starting point, not a verdict.

Last updated 2026-05-18 · First published 2026-05-18

What is Embedding?

Why embeddings matter

How they're produced

Where they're used

Choosing an embedding model

Related terms

Retrieval-Augmented Generation (RAG)

Large language model (LLM)

Context window

Fine-tuning

Try Embedding in vMira