vMira AI Glossary

Plain-English definitions for the terms used inside vMira and across the broader AI ecosystem. Each entry opens with a one- sentence answer for quick reference, then goes deeper with context, trade-offs, and how the concept actually shows up in production.

Core concepts

AI concepts

Retrieval-Augmented Generation (RAG)

Retrieval-augmented generation (RAG) is a technique where an LLM, instead of relying solely on its training data, retrieves relevant text from an external source (documents, a vector database, the web) at query time and includes those retrieved passages in its prompt before generating an answer.

AI agent

An AI agent is an LLM-driven system that can take multiple actions toward a goal — invoking tools, reading results, deciding next steps, and iterating — rather than producing a single chat response. The defining feature is autonomy: the agent decides what to do next based on what it has seen so far.

Multimodal AI

Multimodal AI is an LLM that natively processes more than one input or output modality — typically text plus images, but increasingly also audio, video, and structured data. The model can answer questions about a photo, transcribe speech, or generate an image from a description without separate specialized systems.

Prompt engineering

Prompt engineering is the practice of designing the text input given to an LLM to maximize the quality, reliability, and specificity of the output. It covers everything from a one-line instruction to a multi-thousand-token system prompt with examples, constraints, and structured output schemas.

Large language model (LLM)

A large language model is a neural network trained on a massive corpus of text to predict the next token given the previous ones. Modern LLMs have billions to trillions of parameters and can generate coherent prose, answer questions, write code, and perform reasoning tasks well beyond simple text completion.

Models

For developers

OpenAI-compatible API

An OpenAI-compatible API exposes endpoints and request/response shapes that match OpenAI's REST API (chat/completions, embeddings, models). Pointing the OpenAI SDK at the alternative provider's base URL with a new API key works without code changes.

Tool use

Tool use is the LLM capability where, instead of producing a normal text response, the model emits a structured request to invoke an external function or API. The host application executes the function, returns the result to the model, and the model continues the conversation using that result.

MCP server

An MCP server is a process that exposes tools, resources, and prompts to LLM clients over the Model Context Protocol — an open standard (introduced by Anthropic, widely adopted across the industry by 2026) for connecting AI assistants to external systems without bespoke per-vendor integrations.

Embedding

An embedding is a fixed-length vector of floating-point numbers that represents the semantic meaning of a piece of text (or image, audio, etc.). Two embeddings are similar (small cosine distance) when their source texts are semantically similar — even if the words differ.

Fine-tuning

Fine-tuning is the process of further training a pre-trained LLM on a smaller, task-specific dataset to specialize its behavior — for a domain (legal, medical), a style (brand voice), a task (classification, structured extraction), or a language.

LLM evaluation

LLM evaluation is the process of measuring how well a model — or a prompt, RAG pipeline, or agentic system — performs on a defined task. Good evaluations combine automated metrics, model-graded scoring ("LLM-as-judge"), and human review, and are run regularly to detect regressions.