vMira AI Glossary
Plain-English definitions for the terms used inside vMira and across the broader AI ecosystem. Each entry opens with a one- sentence answer for quick reference, then goes deeper with context, trade-offs, and how the concept actually shows up in production.
Core concepts
AI workspace
An AI workspace is a single application where chat, code generation, document analysis, design, image creation, and other AI tools share one conversation, one account, and one billing relationship — replacing the need to stitch separate AI products together.
Web search AI
Web search AI is an LLM that retrieves live information from the public web during a conversation and cites the source URLs in its answer. It replaces the model's frozen training-data knowledge with current information from the moment of the query.
AI concepts
Retrieval-Augmented Generation (RAG)
Retrieval-augmented generation (RAG) is a technique where an LLM, instead of relying solely on its training data, retrieves relevant text from an external source (documents, a vector database, the web) at query time and includes those retrieved passages in its prompt before generating an answer.
AI agent
An AI agent is an LLM-driven system that can take multiple actions toward a goal — invoking tools, reading results, deciding next steps, and iterating — rather than producing a single chat response. The defining feature is autonomy: the agent decides what to do next based on what it has seen so far.
Multimodal AI
Multimodal AI is an LLM that natively processes more than one input or output modality — typically text plus images, but increasingly also audio, video, and structured data. The model can answer questions about a photo, transcribe speech, or generate an image from a description without separate specialized systems.
Prompt engineering
Prompt engineering is the practice of designing the text input given to an LLM to maximize the quality, reliability, and specificity of the output. It covers everything from a one-line instruction to a multi-thousand-token system prompt with examples, constraints, and structured output schemas.
Large language model (LLM)
A large language model is a neural network trained on a massive corpus of text to predict the next token given the previous ones. Modern LLMs have billions to trillions of parameters and can generate coherent prose, answer questions, write code, and perform reasoning tasks well beyond simple text completion.
Models
Thinking mode
Thinking mode is an LLM operating mode where the model spends extra compute on internal reasoning steps before producing a final answer. It significantly improves accuracy on math, science, logic, and analytical tasks at the cost of higher latency and price per response.
Context window
The context window is the maximum number of tokens an LLM can consider in a single request — including the system prompt, conversation history, retrieved documents, and the user's current question. Anything beyond the limit must be truncated, summarized, or otherwise handled before being sent to the model.
For developers
OpenAI-compatible API
An OpenAI-compatible API exposes endpoints and request/response shapes that match OpenAI's REST API (chat/completions, embeddings, models). Pointing the OpenAI SDK at the alternative provider's base URL with a new API key works without code changes.
Tool use
Tool use is the LLM capability where, instead of producing a normal text response, the model emits a structured request to invoke an external function or API. The host application executes the function, returns the result to the model, and the model continues the conversation using that result.
MCP server
An MCP server is a process that exposes tools, resources, and prompts to LLM clients over the Model Context Protocol — an open standard (introduced by Anthropic, widely adopted across the industry by 2026) for connecting AI assistants to external systems without bespoke per-vendor integrations.
Embedding
An embedding is a fixed-length vector of floating-point numbers that represents the semantic meaning of a piece of text (or image, audio, etc.). Two embeddings are similar (small cosine distance) when their source texts are semantically similar — even if the words differ.
Fine-tuning
Fine-tuning is the process of further training a pre-trained LLM on a smaller, task-specific dataset to specialize its behavior — for a domain (legal, medical), a style (brand voice), a task (classification, structured extraction), or a language.
LLM evaluation
LLM evaluation is the process of measuring how well a model — or a prompt, RAG pipeline, or agentic system — performs on a defined task. Good evaluations combine automated metrics, model-graded scoring ("LLM-as-judge"), and human review, and are run regularly to detect regressions.