Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    AI Terms A-Z

    Discover the most important terms in Artificial Intelligence – from Machine Learning to Deep Learning to Large Language Models. Each term is explained clearly with practical marketing examples.

    Machine Learning
    Deep Learning
    Large Language Models
    Neural Networks
    Prompt Engineering
    Computer Vision
    646 terms in Artificial Intelligence

    A

    A* Search

    A* (pronounced "A-star") is a classical search algorithm that finds the shortest path between a start and a goal node in a graph by minimizing the total cost f(n) = g(n) + h(n) at every node — the sum of actual path cost so far and an estimated remaining distance (heuristic).

    Abductive Logic Programming (ALP)

    A framework in logic programming that allows certain premises to be left unspecified and then infers plausible explanations for observations.

    Abductive Reasoning

    A form of logical inference that starts from an observation and seeks the simplest and most likely explanation for it.

    Ablation

    In AI research, an ablation refers to the removal or disabling of a component of a system to assess that component's impact on the overall performance.

    Action Language

    A formal language used to describe state changes in a system – how actions affect the state of the world over time.

    Action Model Learning

    A machine learning approach focused on enabling an AI agent to learn the outcomes and requirements of its actions within an environment.

    Action Selection

    The process by which an intelligent agent decides "what to do next," choosing the next action from a set of possible actions.

    Activation Function

    A mathematical function used in artificial neural networks to determine the output of a node (neuron) given an input or set of inputs.

    Active Learning

    ML strategy where the model selects the most informative samples for labeling.

    Adam Optimizer

    Adaptive optimization algorithm with momentum and adaptive learning rates.

    Adaptive Algorithm

    An algorithm that changes its behavior or parameters in response to the problem instance or environment as it runs, aiming to improve performance on the fly.

    Adaptive Learning

    An educational methodology (often implemented with AI) that customizes learning content and pace to the individual needs and performance of each learner.

    Adaptive Neuro-Fuzzy Inference System

    A hybrid system that combines neural networks and fuzzy logic principles to create a model capable of learning from data while employing human-like reasoning.

    Admissible Heuristic

    A heuristic h(n) is called admissible if it never overestimates the true remaining cost from node n to the goal — i.e. it always provides an optimistic lower bound. This property guarantees that search algorithms like A* find an optimal path.

    Agent Architecture

    The underlying structure and components of an intelligent agent system, describing how the agent is organized internally to sense, think, and act.

    AgentBench

    A benchmark for evaluating LLM agents in 8 different interactive environments like websites, databases, games, and operating systems.

    Agentic AI

    AI systems that can autonomously pursue goals, make decisions, and take actions in the real world.

    Agentic RAG

    Agentic RAG is an evolution of retrieval-augmented generation in which an AI agent dynamically decides when, which and how many sources to query — instead of following a rigid retrieval pipeline with fixed top-k vector search.

    AI Agent

    An autonomous software system that uses AI to independently plan and execute tasks.

    AI Ethics

    The field and practice concerned with ensuring that artificial intelligence systems are developed and used in a morally responsible, fair, and safe manner.

    AI Slop

    Pejorative term for low-quality, mass-produced AI-generated content flooding the internet that provides no real value.

    AI Watermarking

    Techniques for embedding invisible markers in AI-generated content to prove its origin and enable detection of deepfakes.

    AI-Complete

    A problem is termed AI-complete if solving it by machine would essentially require general human-level intelligence.

    Aider Polyglot Benchmark

    Coding benchmark testing LLMs on real-world multi-file edits across multiple programming languages.

    Algorithmic Discrimination

    Algorithmic discrimination refers to the systematic disadvantage of certain groups by algorithmic decision systems – often as a result of biased training data or unbalanced model architectures.

    Algorithmic Efficiency

    Algorithmic efficiency measures how economically an algorithm uses computation time, memory, and energy – typically expressed in Big-O notation for scaling behavior.

    Algorithmic Probability

    A theoretical measure that assigns a probability to an observation by considering all possible algorithms that could produce it, weighted by their simplicity.

    Alpha-Beta Pruning

    An optimization technique for the minimax algorithm that prunes parts of the game tree without affecting the result.

    Ant Colony Optimization

    A probabilistic optimization technique inspired by the behavior of ants foraging for food and their use of pheromone trails.

    Anytime Algorithm

    An anytime algorithm is an algorithm that can return a valid — though not yet optimal — solution at any intermediate stage and monotonically improves solution quality with additional compute time.

    Approximation Error

    The difference between an exact, true value and an approximate value that is used or obtained by an algorithm or model.

    ARC (AI2 Reasoning Challenge)

    A multiple-choice benchmark with natural science questions at elementary and middle-school level in Easy and Challenge sets.

    ARC-AGI-2

    Benchmark by the ARC Prize Foundation that measures general reasoning ability of AI systems via abstract pattern tasks.

    Artificial General Intelligence (AGI)

    A hypothetical form of AI that possesses human-like cognitive abilities across all domains and can learn and adapt autonomously.

    Artificial Neural Network (ANN)

    An Artificial Neural Network (ANN) is a computational model inspired by the biological brain, consisting of layers of connected neurons that can learn to extract complex patterns from data by adjusting weights.

    Assessment

    Assessment is the measurement of knowledge, skill, or performance—used to diagnose current ability, provide feedback, and certify learning outcomes.

    Attention Mechanism

    Neural network module that weights relevant parts of the input.

    Attributional Calculus

    A logical framework combining predicate logic with multi-valued (fuzzy) logic to represent attributes of entities in an intuitive, human-readable way.

    Audio Generation

    The creation of audio content through AI models – from music to sound effects to speech and ambient sounds.

    Autoencoder

    A type of neural network designed to learn a compressed representation (encoding) of input data and then reconstruct the original data from this encoding.

    Automated Machine Learning

    The process of automating the end-to-end process of applying machine learning to real-world problems, including data preprocessing, model selection, and hyperparameter tuning.

    Automated Planning

    Automated planning is the AI subfield concerned with algorithms that, given an initial state, a goal state, and a set of possible actions, automatically find a sequence of actions (a plan) that achieves the goal.

    AutoML (Automated Machine Learning)

    AutoML automates parts of the machine learning lifecycle such as model selection, feature preprocessing, hyperparameter tuning, and validation.

    B

    Backpropagation

    An algorithm for computing gradients in neural networks that propagates errors backwards through the network to adjust weights.

    Backtracking

    An algorithmic technique that systematically explores all possible solutions and returns to the last decision point when hitting dead ends.

    Backward Chaining

    An inference strategy that starts from the goal and works backward to find the facts and rules that would prove the goal.

    Bagging

    An ensemble learning method that trains multiple models on bootstrap samples and aggregates their predictions.

    Batch Normalization

    A technique for normalizing the inputs of each layer of a neural network across a mini-batch to stabilize training.

    Batch Size

    Number of training examples per gradient update.

    Bayesian Optimization

    Bayesian optimization is an approach to optimizing expensive black-box functions (e.g., model hyperparameters) using a probabilistic surrogate model and an acquisition function.

    Beam Search

    Beam search is a heuristic search algorithm that, at every search step, keeps only the k best partial solutions ("beam width") — a compromise between exhaustive breadth-first search (high quality, high cost) and greedy search (low quality, low cost).

    BERT

    BERT (Bidirectional Encoder Representations from Transformers) is a language model developed by Google that processes text bidirectionally, enabling deep contextual understanding.

    BERT (Google)

    Google's Transformer model for bidirectional language understanding.

    BERTScore

    A semantic evaluation metric that uses BERT embeddings to measure similarity between generated and reference text.

    BGE Embedding

    BGE (BAAI General Embedding) is a family of open-source embedding models from Beijing Academy of AI that achieve top results on MTEB.

    Bi-Encoder

    An encoder architecture that transforms query and document independently into embeddings – enabling fast similarity search over pre-computed vectors.

    Bias (AI Bias)

    Systematic distortions in AI systems that can lead to unfair or inaccurate outcomes for certain groups.

    Bias-Variance Tradeoff

    Fundamental ML dilemma between underfitting (high bias) and overfitting (high variance).

    BIG-Bench

    A collaborative benchmark with 200+ tasks created by 400+ researchers to test LLM capabilities beyond existing benchmarks.

    BLEU Score

    Metric for automatic evaluation of translation quality.

    Boosting

    An ensemble learning method that sequentially combines weak learners to create a strong classifier.

    Breadth-First Search (BFS)

    Breadth-First Search (BFS) traverses a graph level by level, exploring all neighbors of a node before moving deeper.

    C

    Calibration

    The process of adjusting a model's predicted probabilities so they reflect actual event probabilities.

    Chatbot

    A software program that simulates conversations with humans, typically through text or voice interfaces.

    Chatbot Arena

    A public Elo-based leaderboard where users blindly choose between two LLMs – the most important benchmark for LLM ranking.

    ChatGPT

    A conversational AI system built on large language models that generates human-like responses to user prompts.

    ChatGPT Agent

    Autonomous mode of ChatGPT that independently executes multi-step tasks in browsers, apps, and files.

    Chinchilla Optimal

    The finding that for compute-optimal LLM training, the number of training tokens should scale proportionally to parameter count.

    Classification

    A supervised ML algorithm that assigns data to predefined categories or classes.

    Claude Computer Use

    Claude's capability to operate a desktop computer: mouse, keyboard, screenshots, and applications like a human user.

    Claude Cowork

    Collaborative multi-user mode of Claude for joint project work with shared context and role distribution.

    Claude Design

    Visual design mode of Claude for UI mockups, brand asset generation, and layout iteration via natural language.

    Claude Opus 4.6

    Anthropic's 2026 flagship LLM with extended reasoning, 1M-token context, and native computer-use capabilities.

    Claude Skills

    Modular system by Anthropic that bundles reusable capabilities (prompt + tools + data) for Claude.

    CLIP (Contrastive Language–Image Pretraining)

    A multimodal model approach that learns aligned representations of images and text by training them to match corresponding image–caption pairs.

    Clustering

    An unsupervised learning technique that groups data points into clusters such that items in the same cluster are more similar to each other.

    Codex 5.3

    OpenAI's specialized 2026 coding model for agentic software development and long-running tasks in repositories.

    Cohere Embed

    Cohere's commercial embedding API with special optimization for retrieval and distinction between query and document embeddings.

    ColBERT

    ColBERT is a late-interaction retrieval architecture that creates token-level embeddings for query and document, aggregating them via MaxSim during search.

    Cold Start Problem

    The problem when a system has insufficient data about a new user, item, or context to make accurate predictions or recommendations.

    Collaborative Filtering

    A recommendation approach that predicts a user's preferences based on the behavior of similar users or similarities between items.

    Computer Vision

    The AI subfield that enables computers to understand and interpret visual information.

    Content-Based Filtering

    Recommendations based on properties of items a user liked.

    Context Engineering

    The practice of designing, selecting, and structuring the information an LLM receives so it produces more reliable and relevant outputs.

    Context Window

    The portion of text (tokens) an LLM uses to generate its next output—typically consisting of instructions, conversation history, and retrieved content.

    Contextual Bandit

    A decision-making algorithm that chooses among actions using current context features, while learning from feedback to balance exploration and exploitation.

    Contrastive Learning

    A representation learning approach that trains models to pull similar pairs closer and push dissimilar pairs apart in embedding space.

    Convolutional Neural Network (CNN)

    A neural network architecture that uses convolution operations to learn hierarchical feature representations from grid-like data such as images.

    Cross-Encoder

    An encoder architecture that processes query and document together and outputs a relevance score – more precise than bi-encoders but slower.

    Cross-Entropy Loss

    Loss function for classification tasks based on information theory.

    Cross-Validation

    A technique for evaluating model performance by training and testing on different data subsets.

    Curriculum Learning

    Training strategy where samples are presented in a meaningful order – from easy to hard, similar to a curriculum.

    Custom GPT

    GPT tailored to a specific use case with its own prompt, knowledge base, and tool set, hosted by OpenAI.

    D

    Data Augmentation

    Techniques for artificially expanding training data through transformations.

    Decision Making

    Decision making is the process of selecting an action (or non-action) among alternatives based on goals, evidence, constraints, and uncertainty.

    Decision Theory

    Decision theory studies how agents should make choices under uncertainty, often by maximizing expected utility subject to constraints.

    Decision Tree

    An ML model that represents decisions as a tree structure with branches based on feature values.

    Decoder

    The part of a model that transforms a compressed representation back to the original format.

    Decoding

    The process of converting encoded data or signals back to their original or usable form, in ML specifically the token-by-token generation of outputs.

    Decoding Strategy

    A decoding strategy is the method used to convert a model token probability distribution into an actual output sequence.

    Deductive Reasoning

    A form of logical inference where specific conclusions are drawn from general premises—if the premises are true, the conclusion is guaranteed to be true.

    Deep Learning

    A subfield of machine learning that uses deep neural networks with many layers to learn complex patterns from data.

    Deep Reinforcement Learning

    Reinforcement learning that uses deep neural networks to learn policies that choose actions to maximize long-term reward.

    DeepSeek

    Chinese AI startup developing powerful open-source language models, competing with Western providers at significantly lower costs.

    DeepSeek R1

    An open-source reasoning model from DeepSeek that competes with GPT-4 and Claude on complex thinking and coding tasks.

    DeepSeek V4

    Open-weight flagship by DeepSeek that reaches comparable benchmarks at 1/10 the training cost of Western models.

    DeepWalk

    A graph embedding algorithm that combines random walks on graphs with Word2Vec to learn node representations.

    Default Reasoning

    Default reasoning draws conclusions using 'defaults' that hold in typical cases, while allowing exceptions when new information arrives.

    Denoising

    Denoising is the process of removing noise from a signal; in diffusion models, it's the iterative transformation from noisy latents to a clean sample.

    Dense Passage Retrieval

    A retrieval approach using bi-encoder embeddings for query and passages – the foundation of modern semantic search.

    Dense Retrieval

    Retrieval method that uses dense vector representations (embeddings) to find semantically similar documents.

    Depth-First Search (DFS)

    Depth-First Search (DFS) traverses a graph by going as deep as possible along one path before backtracking.

    Dialogue Management

    Component of a conversational AI system that controls the conversation flow.

    Diffusion Model

    Diffusion models are generative AI models that learn to gradually remove noise from data to produce high-quality samples (images, audio, video).

    Disclosure UX

    Disclosure UX is the set of interface patterns that transparently communicate important system facts to users (e.g., AI involvement, limitations, data use, confidence, and provenance).

    DPO (Direct Preference Optimization)

    A simplified alternative to RLHF that optimizes models directly on preference data, without separate reward model or RL training.

    DROP (Discrete Reasoning Over Paragraphs)

    A reading comprehension benchmark that requires numerical reasoning over text passages (counting, sorting, arithmetic).

    Dropout

    A regularization technique that randomly deactivates neurons during training.

    E

    E5 Embedding

    E5 is a family of embedding models from Microsoft Research created through text-to-text contrastive training.

    Early Stopping

    Regularization technique that stops training when validation loss increases.

    Elo Rating

    A rating system for measuring relative abilities, originally from chess – now standard for LLM leaderboards.

    Embedding

    An embedding is a dense vector representation of discrete data (words, images, users, products) where semantically similar objects lie close together in vector space.

    Embeddings

    Vector representations of data (words, sentences, images) in a lower-dimensional space that capture semantic similarity.

    Emergent Abilities

    Capabilities that suddenly appear in LLMs only above a certain model size, without being observable in smaller models.

    Encoder

    The part of a model that transforms input data into a compressed representation.

    Encoder-Decoder

    Architecture that encodes input into a representation and decodes output from it.

    Ensemble Learning

    Combining multiple models to achieve better predictions than any single model alone.

    Entity Extraction

    The automatic identification and classification of named entities in text.

    Epoch (Machine Learning)

    In machine learning, an epoch refers to one complete pass of a learning algorithm through the entire training dataset — i.e. the moment in which every training example has been used exactly once to update the model weights.

    Error Analysis

    Systematic examination of model errors to identify patterns and improvement opportunities.

    Evaluation Harness

    A framework for systematically evaluating model performance across various metrics and test cases.

    Explainability

    The ability to make an AI model's decisions or predictions understandable to humans.

    Explainability UX Patterns

    Explainability UX patterns are interface patterns that help users understand why an AI system produced an output, what evidence it used, and what actions it took (or refused).

    Explainable AI (XAI)

    AI systems whose decision-making processes are comprehensible and interpretable to humans.

    Explainable AI (XAI)

    Explainable AI (XAI) comprises methods and product practices that make AI outputs understandable, traceable, and auditable.

    G

    G-Eval

    An LLM evaluation framework that uses chain-of-thought reasoning and weighted probabilities for more nuanced scoring.

    Gaussian Mixture Model (GMM)

    A probabilistic model representing data as a mixture of Gaussian distributions.

    Gemini 3.1 Pro

    Google's 2026 flagship LLM with natively multimodal architecture and 2M-token context.

    Gemma 4

    Open-weight model family by Google for on-device and edge inference, ranging from 2B to 27B parameters.

    Generalization

    A model's ability to perform well on new, unseen data.

    Generative Adversarial Network (GAN)

    Architecture with two competing networks for generating realistic data.

    Generative AI

    AI models that create new content – text, images, audio, code, or structured data.

    GloVe

    GloVe (Global Vectors for Word Representation) is a word embedding method that uses global co-occurrence statistics of a text corpus to generate semantic word vectors.

    Governance

    Governance is the set of roles, rules, processes, and controls that ensure a system is used responsibly and predictably—aligned with risk, compliance, and business objectives.

    GPQA (Graduate-Level Google-Proof Q&A)

    A benchmark with 448 expert-level questions from physics, biology, and chemistry, so difficult that even PhDs without expertise only achieve 30%.

    GPQA Diamond

    High-difficulty science benchmark with PhD-level questions in biology, physics, and chemistry.

    GPT (Generative Pre-trained Transformer)

    A family of large language models by OpenAI based on the Transformer architecture.

    GPT-5.4

    OpenAI's 2026 flagship LLM with thinking mode, multimodal processing, and agent-native architecture.

    Gradient Descent

    An optimization algorithm that iteratively adjusts parameters in the direction of steepest descent of the loss function.

    Graph Attention Network

    A GNN architecture that uses attention mechanisms to adaptively weight the importance of neighboring nodes.

    Graph Classification

    The task of assigning an entire graph to a class based on its structure and node properties.

    Graph Convolutional Network

    A GNN variant that generalizes convolution operations to graphs to learn node representations.

    Graph Isomorphism Network

    A GNN with maximum discriminative power among message-passing architectures, theoretically grounded by the Weisfeiler-Leman test.

    Graph Neural Network

    A class of neural networks that operate directly on graph structures, learning node, edge, and graph-level properties.

    Graph Search

    Graph search is the process of exploring a graph to find a target node, a path, or an optimal solution under a defined objective (e.g., shortest path, lowest cost).

    Graph Transformer

    An architecture that applies Transformer attention to graph structures, enabling global node interactions.

    GraphSAGE

    An inductive GNN framework that learns scalable node representations by sampling and aggregating neighborhoods.

    Greedy Algorithm

    An algorithm that makes the locally optimal choice at each step.

    Greedy Best-First Search

    Greedy Best-First Search expands the node that appears closest to the goal using only a heuristic score h(n), ignoring the cost accumulated so far.

    Greedy Decoding

    A decoding strategy that always selects the token with the highest probability – deterministic, but often repetitive.

    Ground Truth

    The actual, correct data or labels used as reference for model training and evaluation.

    Grounding

    Connecting AI outputs to real, verifiable facts and sources.

    GRU (Gated Recurrent Unit)

    A simplified RNN architecture with gates to control information flow.

    GSM8K

    A benchmark with 8,500 grade-school math problems that require multi-step reasoning.

    Guardrails

    Safety mechanisms that prevent AI systems from producing harmful, inappropriate, or erroneous outputs.

    Guidance Scale

    Guidance scale is a parameter (commonly in classifier-free guidance) that controls how strongly a diffusion model follows the text prompt versus generating more diverse outputs.

    H

    Hallucination

    When an LLM generates information not supported by input context or reliable sources.

    Hallucination Rate

    The percentage of AI-generated outputs containing information not supported by facts or sources.

    HellaSwag

    A benchmark for common-sense reasoning where LLMs must choose the most plausible continuation of a scenario.

    HELM (Holistic Evaluation of Language Models)

    A comprehensive evaluation framework from Stanford that assesses LLMs on dozens of dimensions like accuracy, fairness, robustness, and efficiency simultaneously.

    Heterogeneous Graph

    A graph with different types of nodes and/or edges, modeling various entity types and relationships.

    Heuristic

    A heuristic is a practical scoring rule or estimate that guides search or decision-making toward promising options without guaranteeing optimality.

    Heuristic Search

    Heuristic search is a family of search algorithms that use a heuristic (a guiding estimate) to explore a problem space more efficiently than uninformed search.

    High-Level Representation

    A high‑level representation abstracts raw data into more meaningful structures (symbols, concepts, latent variables, or summaries).

    HNSW

    Hierarchical Navigable Small World – a graph-based algorithm for efficient approximate nearest neighbor search.

    Human Evaluation

    The evaluation of AI outputs by human annotators – the gold standard for quality measurement, but expensive and slow.

    Human-in-the-Loop

    An AI design approach where humans are actively involved in the decision-making or training process of an AI system.

    HumanEval

    A benchmark for code generation with 164 Python programming tasks, evaluated by Pass@k (code must pass tests).

    Hybrid AI System

    A hybrid AI system combines multiple AI paradigms—typically symbolic/rule-based methods with statistical/ML models (including LLMs).

    Hybrid Search

    Combining keyword retrieval (BM25) with vector retrieval (embeddings) for better search results.

    Hyperparameter

    Configuration settings chosen before training that influence how a model learns.

    Hyperparameter Optimization

    The systematic process of finding the best hyperparameter settings for an ML model.

    Hypothesis Generation

    Hypothesis generation is producing candidate explanations (or candidate solutions) that could plausibly account for observed evidence.

    I

    Identity-Preference Optimization

    An alignment method that extends DPO for more stable training.

    IFEval (Instruction Following Evaluation)

    A benchmark that tests how well LLMs follow explicit format instructions (e.g., "Answer in exactly 3 paragraphs", "Start each sentence with a capital letter").

    Image Captioning

    Automatic generation of text descriptions for images.

    Image Generation

    Image generation is the automatic creation of images by AI models based on text prompts, other images, or other inputs.

    Image Segmentation

    Dividing an image into meaningful regions or objects at the pixel level.

    Image-to-Image

    Models that transform an input image into a modified or transformed output image.

    ImageBind

    Meta's multimodal embedding model that unifies six modalities (image, text, audio, video, depth, thermal) in a shared vector space.

    Imitation Learning

    An ML approach where an agent learns by observing and imitating expert behavior.

    In-Context Learning

    The ability of LLMs to learn from examples in the prompt without changing the model weights.

    Inductive Reasoning

    A form of logical inference where general rules or patterns are derived from specific observations—the conclusion is probable but not guaranteed.

    Inference

    Using a trained model to make predictions on new, unseen data.

    Inference Engine

    The core component of an expert system that applies logical rules to a knowledge base to derive new facts or make decisions.

    Information Retrieval

    Finding relevant documents or information from a large collection.

    Inpainting

    Filling in missing or masked regions of an image with plausible content.

    Instruction Tuning

    Fine-tuning an LLM on instruction-response pairs to better follow instructions.

    Instructor Embedding

    An embedding model that uses task-specific instructions in the prompt to optimize embeddings for different tasks.

    Intelligent Tutoring System

    An Intelligent Tutoring System (ITS) is an AI-driven learning system that personalizes instruction, feedback, and practice to a learner's needs.

    Intent Classification

    Determining the intention or goal behind a user query.

    Intent Recognition

    AI capability to recognize the intent behind a user utterance.

    Interpretability

    The degree to which humans can understand how a model arrives at its decisions.

    Inverse Reinforcement Learning

    Inferring the reward function from observed expert behavior.

    Iterative Deepening

    Iterative deepening is a search strategy that repeatedly runs depth-limited search with increasing depth limits until it finds a solution or exhausts a budget.

    Iterative Prompting

    A prompting approach that refines results through multiple successive prompts.

    K

    K-Armed Bandit

    The k-armed bandit problem models choosing among k options to maximize reward while balancing exploration vs exploitation.

    K-Fold Cross-Validation

    K-fold cross-validation is an evaluation method where data is split into k parts; the model trains on k−1 folds and is tested on the remaining fold.

    K-Means Clustering

    K-means is an unsupervised algorithm that partitions data into k clusters by minimizing within-cluster distance to cluster centroids.

    K-Means++

    K-means++ is an initialization method for k-means that chooses starting centroids to improve convergence and cluster quality.

    K-Shot Prompting

    K-shot prompting provides k examples in the prompt to guide the model's behavior (format, reasoning pattern, tone).

    Kernel (ML)

    In ML, a kernel is a function that measures similarity between data points, enabling algorithms to operate in implicit high-dimensional feature spaces.

    Kernel Trick

    The kernel trick allows algorithms to compute dot products in an implicit higher-dimensional space without explicitly transforming the data.

    KNN (k-Nearest Neighbors)

    KNN is a method that predicts outcomes based on the k most similar examples in a dataset.

    KNN Search

    KNN search retrieves the k closest vectors to a query vector under a distance metric.

    Knowledge Base (KB)

    A knowledge base is a curated repository of information (articles, FAQs, policies) designed for retrieval and reuse.

    Knowledge Cutoff

    Knowledge cutoff is the point in time after which a model's training data does not include new information.

    Knowledge Distillation

    Transferring knowledge from a large model (teacher) to a smaller model (student).

    Knowledge Distillation

    Training a smaller model (student) using a larger model (teacher).

    Knowledge Graph Embedding

    Methods that embed entities and relations of a knowledge graph into a low-dimensional vector space.

    Knowledge Tracing

    Knowledge tracing models a learner's evolving mastery of skills over time using their interactions (answers, attempts, time, hints).

    KTO (Kahneman-Tversky Optimization)

    An alignment method that only needs binary feedback (good/bad) instead of pairwise preferences, inspired by Prospect Theory.

    KV Cache (Key-Value Cache)

    KV cache stores attention key/value tensors from previous tokens during transformer inference so the model doesn't recompute them each step.

    L

    L1 Regularization (Lasso)

    L1 regularization adds a penalty proportional to the absolute value of model weights, encouraging sparsity (many weights become exactly zero).

    L2 Regularization (Ridge)

    L2 regularization adds a penalty proportional to the square of model weights, encouraging smaller weights without forcing exact zeros.

    Label Leakage

    Label leakage describes the situation in which a machine-learning model's training dataset contains features that carry direct or indirect information about the target variable (the label) — information that simply would not be available at inference time in production.

    Label Smoothing

    Label smoothing is a training technique that replaces hard labels (0 or 1) with slightly softened targets (e.g., 0.9 and 0.1).

    Language Model (LM)

    A language model is a model that estimates the probability of sequences of tokens, enabling tasks like prediction, generation, and scoring.

    Large Language Model (LLM)

    A large neural network trained on vast amounts of text to understand and generate human-like text.

    Large Language Model (LLM)

    A large neural network trained on massive amounts of text that can understand and generate human-like text.

    Late Interaction

    A retrieval paradigm where query and document tokens are encoded independently but interact via token-level similarity only at search time.

    Latent Space

    A compressed, lower-dimensional space where a model stores internal representations of data.

    Latent Variable

    A latent variable is an unobserved variable inferred from observed data, used to explain hidden structure.

    Layer Normalization

    Layer normalization is a technique that normalizes activations within a layer to stabilize and speed up training in deep networks.

    Learning Objectives

    Learning objectives are clear, measurable statements of what a learner should be able to do after instruction.

    Learning Rate

    A hyperparameter that determines how much to adjust model weights at each training step.

    Learning Rate Schedule

    A learning rate schedule changes the learning rate over training (warmup, decay, cosine, step, exponential).

    Learning-to-Rank (LTR)

    Learning-to-rank trains models to order results (documents, products, answers) by relevance for a given query.

    Length Penalty

    Length penalty is a decoding adjustment that prevents generation algorithms (especially beam search) from unfairly preferring overly short sequences.

    LIME (Local Interpretable Model-agnostic Explanations)

    LIME (Local Interpretable Model-agnostic Explanations) explains an individual model prediction by fitting a simple, interpretable surrogate model around that specific input.

    Link Prediction

    The task of predicting missing or future edges in a graph.

    LiveCodeBench

    Contamination-free coding benchmark that continuously adds new programming tasks from competitions.

    LLM-as-a-Judge

    LLM-as-a-judge uses a model to evaluate other model outputs against rubrics like correctness, groundedness, style, and safety.

    LLM-as-Judge

    An evaluation method where an LLM evaluates the quality of outputs from another (or the same) model.

    LLMOps

    The operational discipline of building, deploying, and governing LLM-based systems end-to-end—covering prompts, retrieval, tools, evaluation, safety, and cost.

    LMSYS

    LMSYS (Large Model Systems Organization) is a research organization that operates the famous Chatbot Arena benchmark and enables LLM performance comparisons through human evaluations.

    Log-Likelihood

    Log-likelihood is the logarithm of the likelihood that a probabilistic model assigns to observed data.

    Log-Sum-Exp

    Log-sum-exp is a numerical trick for computing log(∑ᵢ eˣⁱ) stably without overflow/underflow.

    Logit

    A logit is the raw, unnormalized score a model outputs before converting to probabilities (e.g., via softmax).

    Logit Bias

    Logit bias is a technique to increase or decrease the likelihood of specific tokens during generation by adjusting their logits.

    Long Context

    Long context refers to an LLM's ability to accept and use a large number of input tokens in a single request.

    LoRA vs Full Fine-Tuning

    A comparison between adapting a model via LoRA adapters versus updating all parameters (full fine-tuning).

    Loss Function

    A mathematical function that measures how good or bad a model's predictions are.

    M

    Machine Learning

    A subfield of AI where systems learn from data to make predictions or decisions without being explicitly programmed.

    Mamba

    Mamba is a neural network architecture built on selective state space models (SSMs) designed to model long sequences efficiently with linear scaling in sequence length.

    Manus AI

    An autonomous general-purpose AI agent capable of independently executing complex tasks like research, coding, and data analysis.

    Masked Language Modeling (MLM)

    MLM is a training objective where a model predicts masked-out tokens in a text sequence (e.g., replacing words with a special [MASK] token).

    Mastery Learning

    Mastery learning is an instructional approach where learners progress only after demonstrating mastery of a skill or objective, with targeted remediation as needed.

    MATH Benchmark

    A benchmark with 12,500 competition mathematics problems (from algebra to number theory) that tests advanced mathematical reasoning.

    Matrix Factorization

    A technique for decomposing a matrix into the product of smaller matrices.

    Matryoshka Embedding

    An embedding training approach where the first N dimensions of a vector are already usable – enabling flexible compression without quality loss.

    Matryoshka Representation Learning (MRL)

    Matryoshka Representation Learning (MRL) is an embedding approach that encodes information at multiple granularities so a single embedding can be truncated to smaller dimensions while remaining useful for downstream tasks.

    Max Tokens

    An API parameter that limits the maximum number of tokens an LLM can generate in a response.

    MBPP (Mostly Basic Python Problems)

    A benchmark with 974 simple Python programming tasks that test basic programming abilities of LLMs.

    Mechanistic Interpretability

    Mechanistic interpretability is the effort to reverse engineer neural networks by identifying internal mechanisms (features, circuits, algorithms) that produce outputs.

    Message Passing Neural Network

    A unifying framework for GNNs where nodes receive messages from neighbors, aggregate them, and update their representations.

    Meta-Learning

    Meta-learning ("learning to learn") aims to train models or systems that adapt quickly to new tasks with limited data or few examples.

    Metaprompt

    A metaprompt is a higher-level prompt that defines the rules, structure, and constraints for generating other prompts or for a whole class of outputs.

    METEOR

    An evaluation metric for machine translation that combines unigram matching with stemming, synonyms, and word order.

    Metric Learning

    Metric learning trains models to learn a distance function (embedding space) where "similar items are close" and "dissimilar items are far apart."

    Minimum Description Length

    Minimum Description Length (MDL) is a principle for model selection that prefers the model that yields the shortest total description of the model plus the data encoded under it.

    Mixed Precision Training

    Mixed precision training uses a mix of lower-precision (e.g., FP16/BF16) and single-precision (FP32) representations to speed up training while preserving accuracy.

    Mixture of Experts (MoE)

    MoE is a model architecture where different "expert" sub-networks specialize, and a router selects which experts handle each token/input.

    MLCommons

    Industry consortium developing open benchmarks (MLPerf), datasets, and best practices for ML performance.

    MMLU (Massive Multitask Language Understanding)

    A multiple-choice benchmark with 57 subject areas (STEM, humanities, social sciences) for measuring LLM world knowledge.

    MMLU-Pro

    Extended MMLU benchmark with more challenging multiple-choice questions and reduced guessing advantage.

    MMR (Maximal Marginal Relevance)

    MMR is a retrieval diversification method that selects items that are both relevant to the query and non-redundant with each other.

    Model Card

    A model card is a standardized documentation artifact describing a model's intended use, limitations, training data context, evaluation results, and ethical/safety considerations.

    Model Collapse

    Model collapse is a degradation phenomenon where training on synthetic/model-generated data (especially repeatedly) can reduce diversity and quality, causing the model to "collapse" toward narrower outputs.

    Model Compression

    Techniques for reducing the size of ML models while maintaining performance.

    Model Drift

    Model drift is performance degradation over time due to changes in data distributions, user behavior, environment, or upstream systems.

    Model Monitoring

    Continuous monitoring of ML model performance and behavior in production.

    Model Simplification

    Model simplification reduces complexity to improve interpretability, efficiency, robustness, or deployment feasibility.

    Model Spec

    A model spec is a written specification describing how a model should behave—including intended behavior, constraints, and principles—often used to guide training, alignment, and deployment policy.

    Model-Based Learning

    Model‑based learning learns a model of the environment (dynamics) and uses it for planning, prediction, or control.

    Monte Carlo Dropout (MC Dropout)

    Monte Carlo Dropout estimates model uncertainty by keeping dropout active at inference time and performing multiple stochastic forward passes, then aggregating results.

    MT-Bench

    A multi-turn conversation benchmark for LLMs with 80 questions across 8 categories, evaluated by GPT-4-as-Judge.

    MTEB

    The Massive Text Embedding Benchmark – a comprehensive benchmark for text embedding models across 56+ datasets in 8 tasks.

    Multi-Armed Bandit

    An algorithm for sequential decision-making that balances exploration and exploitation.

    Multi-Objective Optimization

    Multi-objective optimization (Pareto optimization) is optimization with multiple objectives that often conflict, where you typically seek Pareto-optimal solutions rather than one single optimum.

    Multi-Turn Conversation

    A multi-turn conversation is an interaction where context and intent evolve across multiple exchanges rather than a single query-response.

    Multimodal

    AI systems that can process and understand multiple data types (text, image, audio, video) simultaneously.

    Multimodal AI

    AI systems that can process multiple modalities (text, image, audio, video).

    Multimodal AI

    AI systems that jointly process text, image, audio, and video and can respond in any modality.

    Multimodal Model

    A multimodal model can process and/or generate across multiple data types (e.g., text, images, audio, video).

    N

    N-gram Blocking

    N-gram blocking is a decoding constraint that prevents a model from generating an n-gram (sequence of n tokens) that has already appeared in the generated text.

    N-Shot Prompting

    N-shot prompting provides N examples in the prompt to teach the model the desired pattern (0-shot = instructions only; few-shot = small N).

    N+1 Tool Call Problem

    The N+1 tool call problem happens when an AI workflow makes one initial tool call and then makes N additional tool calls (often one per retrieved item), causing unnecessary latency and cost.

    Named Entity Canonicalization

    Entity canonicalization is standardizing different surface forms of the same entity into one canonical representation (e.g., "OpenAI Inc.", "OpenAI", "Open AI").

    Named Entity Linking (NEL)

    Named Entity Linking connects an entity mention in text (e.g., "OpenAI", "Apple", "Paris") to a specific canonical entity ID in a knowledge base (internal or external).

    Named Entity Recognition (NER)

    Identifying and classifying named entities in text (people, places, organizations).

    Named Entity Recognition (NER)

    NLP task for identifying and classifying named entities in text.

    Nano Banana

    Codename for Google's image editing model (Gemini 2.5 Flash Image) enabling pixel-precise edits via prompt.

    Nano Banana 2

    Google's second-generation AI image generation model, based on Gemini 3.1 Flash Image, combining Pro quality with Flash speed.

    Narrow AI / Weak AI

    Narrow AI (also "weak AI") is AI designed to perform a specific task or a limited set of tasks, rather than general-purpose reasoning across domains.

    Natural Gradient

    Natural gradient is an optimization approach that accounts for the geometry of parameter space, often leading to more efficient steps than standard gradient descent in some probabilistic models.

    Natural Language Generation

    Natural Language Generation (NLG) is the process of producing human-readable text from data, intent, or internal representations (rules, templates, or neural models).

    Natural Language Processing (NLP)

    The field of AI concerned with the interaction between computers and human language.

    Natural Questions (NQ)

    A question answering benchmark from Google with real search queries and Wikipedia articles as answer sources.

    Negative Cycle

    A negative cycle is a cycle in a weighted graph whose total weight is negative, allowing path cost to be reduced indefinitely by looping.

    Negative Prompting

    Negative prompting is explicitly telling a generative model what to avoid (content, style, formatting, claims) during generation.

    Negative Transfer

    Negative transfer occurs when transferring knowledge from a pretrained model or source task hurts performance on the target task.

    Negative Weights

    Negative weights are negative edge costs in a weighted graph (i.e., an action/transition reduces total cost).

    NeRF (Neural Radiance Fields)

    NeRFs are neural methods for representing 3D scenes by learning a function that maps spatial coordinates and viewing direction to color and density, enabling novel view synthesis.

    Neural Architecture Search

    Automatic search for optimal neural network architectures.

    Neural Code Search

    Neural code search retrieves relevant code snippets or files using embeddings and semantic matching rather than exact keyword search.

    Neural Collapse

    Neural collapse is a phenomenon observed in deep classifiers near the end of training where learned representations and classifier weights exhibit a highly structured geometry (classes become tightly clustered and symmetrically arranged).

    Neural Embeddings

    Neural embeddings are learned vector representations of items (text, users, products, documents) such that distance in vector space reflects similarity.

    Neural Index Rebuild

    A neural index rebuild is re-generating embeddings and rebuilding vector (or hybrid) indexes after changes to content, chunking, or the embedding model.

    Neural Indexing

    Neural indexing is using learned representations and neural methods to build or optimize an index for retrieval (often in vector search or learned sparse retrieval).

    Neural IR (Neural Information Retrieval)

    Neural IR is the use of neural models (embeddings, cross-encoders, rerankers) to retrieve and rank documents based on semantic relevance.

    Neural Network

    A computational model inspired by the structure of biological neurons, consisting of interconnected nodes (neurons) in layers.

    Neural Ordinary Differential Equation (Neural ODE)

    Neural ODEs model transformations as continuous-time dynamics defined by a neural network, enabling certain efficiency and modeling properties.

    Neural Pruning

    Neural pruning removes weights, neurons, attention heads, or entire structures from a model to reduce compute/memory while trying to preserve performance.

    Neural Reranking

    Neural reranking uses a model (often a cross-encoder) to re-score and reorder an initial set of retrieved candidates based on deeper query–candidate understanding.

    Neural Retrieval

    Neural retrieval is retrieving relevant items using learned representations (dense embeddings and similarity search) instead of relying purely on keyword matching.

    Neural Scaling Laws

    Scaling laws describe empirical relationships showing how model performance tends to improve predictably as you increase compute, data, and/or model parameters—often following power-law-like trends.

    Neural Style Transfer (NST)

    Neural style transfer is a technique that applies the "style" of one image (textures, patterns) to the "content" of another, using neural representations.

    Neural Topic Routing

    Neural topic routing is using ML/embeddings to classify or route an input (query, pageview, conversation) into a topic, workflow, or handler based on semantic meaning.

    Neuro-Symbolic "Verification Layer"

    A neuro-symbolic verification layer is a system component that checks neural outputs against symbolic constraints (rules, schemas, policies) before acting or publishing.

    Neuro-Symbolic AI

    Neuro-symbolic AI combines neural methods (LLMs, embeddings) with symbolic methods (rules, logic, knowledge graphs) to improve reliability, interpretability, and constraint satisfaction.

    Next Best Question (NBQ)

    Next Best Question is a conversational design and decisioning pattern where a system asks the single most valuable clarifying question to progress toward a correct outcome.

    Next Sentence Prediction (NSP)

    Next Sentence Prediction is a training objective where a model predicts whether one sentence likely follows another in the original text.

    NL2SQL (Natural Language to SQL)

    NL2SQL converts natural language questions into SQL queries that can be executed against a database.

    NLP (Natural Language Processing)

    Natural Language Processing (NLP) is the subfield of AI concerned with the machine processing, interpretation, and generation of natural language.

    No Free Lunch Theorem

    The No Free Lunch theorem (in optimization/learning) states that averaged over all possible problems, no one algorithm performs better than all others—performance depends on the problem distribution.

    Node2Vec

    An algorithm that learns dense vector representations for graph nodes through biased random walks.

    Noise Injection

    Noise injection is deliberately adding noise during training or processing to improve robustness, generalization, or privacy.

    Noise Schedule

    A noise schedule defines how much noise is added (and later removed) at each step in a diffusion model's forward and reverse processes.

    Noisy Student Training

    Noisy Student Training is a semi-supervised learning approach where a "teacher" model labels unlabeled data, and a "student" model is trained on a mix of labeled + pseudo-labeled data with noise/augmentation.

    Nomic Embed

    Open-source embedding models from Nomic AI with full reproducibility – all training data and code are public.

    Non-Maximum Suppression (NMS)

    Non-maximum suppression is a post-processing step in object detection that removes redundant overlapping bounding boxes, keeping only the most confident ones.

    Non-Monotonic Logic

    A logical system where conclusions can be retracted when new information arrives that contradicts previous assumptions.

    Nonlinear Activation Function

    A nonlinear activation function introduces nonlinearity into neural networks (e.g., ReLU, GELU, tanh), enabling them to model complex relationships beyond linear transformations.

    Normalization

    Normalization is the transformation of numerical data to a unified value range (often 0–1 or mean 0 / standard deviation 1) to improve the training stability of machine learning models.

    Normalization Layer

    A normalization layer is a neural network component that normalizes activations to improve training stability and convergence (e.g., LayerNorm, RMSNorm).

    Normalizing Flow

    A normalizing flow is a generative modeling approach that transforms a simple distribution (e.g., Gaussian) into a complex one via a sequence of invertible transformations with tractable likelihoods.

    Novel Class Discovery (NCD)

    Novel class discovery finds previously unknown categories in unlabeled data while leveraging knowledge from known classes.

    NT-Xent Loss (Normalized Temperature-Scaled Cross-Entropy)

    NT-Xent is a contrastive learning loss used to train embeddings by pulling positive pairs together and pushing negatives apart, with a temperature term controlling distribution sharpness.

    O

    Object Detection

    Identification and localization of objects in images or videos.

    Observability for LLM Apps

    LLM observability extends classic observability with AI-specific signals: prompt/version tracking, retrieval evidence, tool traces, token usage, and quality/safety metrics.

    Off-Policy Evaluation (OPE)

    Estimates how a new decision policy would perform using data collected from a different (existing) policy—without deploying the new policy.

    Offline Evaluation

    Measures model/system performance using predefined datasets and metrics before production rollout.

    On-Device Inference

    Runs a model locally on a user's device (phone, laptop, edge hardware) instead of calling a cloud API.

    One-Shot Learning

    Ability to learn and generalize from a single example.

    One-Shot Prompting

    Provides a single example in the prompt to demonstrate the desired output pattern.

    Online Evaluation

    Measures performance on real user traffic (A/B tests, canaries, interleaving, holdouts) after deployment.

    Online Learning

    Updates a model incrementally as new data arrives, rather than retraining from scratch in large batches.

    Ontology

    A formal representation of concepts and relationships in a domain (entities, classes, properties, constraints).

    Open-Weight Model

    A model whose trained weights are publicly available, enabling self-hosting and deeper customization.

    OpenAI Embeddings

    OpenAI's commercial embedding API with text-embedding-3-small and text-embedding-3-large – the easiest path to high-quality embeddings.

    OpenAI o1

    OpenAI's first o-series model that uses explicit reasoning with chain-of-thought for complex problem-solving.

    OpenAI o3

    Advanced reasoning model from OpenAI with improved performance in mathematics, coding, and scientific reasoning.

    OpenLLM Leaderboard

    A public leaderboard by Hugging Face that compares open-source LLMs on standardized benchmarks (MMLU, HellaSwag, etc.).

    Operationalization

    Turning a concept, model, or prototype into a repeatable, reliable, governed production capability with clear ownership, monitoring, and change control.

    Optimization

    The process of finding parameter values that minimize a loss function or maximize an objective under constraints.

    Optimizer

    The algorithm that updates model parameters during training (e.g., SGD, Adam), based on gradients and configuration.

    Orchestration

    Coordinates multiple steps, services, and tools into a reliable workflow—often with state, retries, and observability.

    Orchestrator

    The system component that implements orchestration logic—deciding the next step, calling tools, managing state, and enforcing budgets/guardrails.

    ORPO (Odds Ratio Preference Optimization)

    An evolution of DPO that combines SFT and preference alignment in a single training step.

    Out-of-Distribution (OOD) Detection

    Identifies inputs that differ significantly from what a model was trained on, signaling increased uncertainty and risk.

    Output Guardrails

    Controls applied to model outputs to enforce safety, policy, formatting, and correctness constraints before displaying or acting.

    Output Length Control

    The set of techniques used to shape response length and structure (token limits, section caps, templates, validators).

    Output Parsing

    Extracting structured fields from model output (JSON, YAML, XML, or patterns) so downstream systems can reliably use it.

    Output Token

    A token generated by a language model as part of its response.

    Over-Generation

    Producing more output than needed (too long, too verbose, too many steps), increasing cost and reducing user clarity.

    Over-Retrieval

    Retrieving too many documents/chunks for a query, increasing cost and often reducing answer quality due to noise and context dilution.

    Overfitting

    When a model learns training data too well and generalizes poorly to new data.

    Overlapping Chunks

    A chunking strategy where consecutive text chunks share some repeated content (overlap) to preserve context across chunk boundaries.

    P

    Paged Attention

    An inference optimization that manages the KV cache in "pages" (blocks) to reduce memory fragmentation and improve throughput for serving LLMs.

    Parallel Tool Calls

    Executing multiple tool/API calls concurrently rather than sequentially, reducing end-to-end latency.

    Parameter Count

    The number of learned weights in a model, often used as a rough proxy for capacity and compute needs.

    Parameter Sharing

    A modeling technique where multiple parts of a neural network reuse the same weights instead of having separate parameters.

    Parameter-Efficient Fine-Tuning (PEFT)

    A set of techniques that adapt a pretrained model to a task by training only a small subset of parameters (or additional small modules).

    Passage Reranking

    Reorders retrieved passages using a stronger relevance model (often a cross-encoder) to improve precision before generation.

    Passage Retrieval

    Finds relevant passages (chunks) of text rather than whole documents, improving precision for question answering and RAG.

    Pathfinding

    Pathfinding is the process of finding a route between nodes in a graph that optimizes an objective (shortest, cheapest, safest, fastest).

    PDDL (Planning Domain Definition Language)

    A standardized language for describing planning problems in AI that formally defines states, actions, and goals.

    Perceptron

    The Perceptron is the simplest form of an artificial neuron and the foundation of modern neural networks – a linear classifier that weighted-sums inputs and passes them through an activation function.

    Perplexity

    A language model metric derived from the average negative log-likelihood; measures how "surprised" a model is by text.

    Perplexity

    Metric for evaluating how well a language model predicts text.

    Planning

    An AI field concerned with the automatic generation of action sequences to get from an initial state to a goal state.

    Poisoning Attack

    An attack when an adversary manipulates training data, retrieval corpora, or feedback signals to degrade model behavior.

    Policy

    A policy is a rule or strategy that determines what actions are taken under which conditions.

    Policy Engine

    A component that enforces rules and constraints (who can do what, which tools are allowed, what outputs are permitted) at runtime.

    Policy Gradient

    Methods that optimize a policy directly by adjusting parameters in the direction that improves expected reward.

    Positional Encoding

    How a transformer model represents token order (position) so it can distinguish "A then B" from "B then A."

    Positional Interpolation

    A technique to extend a model's usable context length by rescaling how positions are represented.

    Post-Training

    Any training stage applied after pretraining to shape a model for desired behaviors—helpfulness, safety, instruction-following.

    Post-Training Quantization (PTQ)

    Reduces model precision (e.g., FP16 → INT8/INT4) after training to lower memory use and speed up inference.

    Preference Optimization

    Training or adjusting models using preference signals (A preferred to B) to improve alignment with desired outputs.

    Prefill

    The inference stage where the model processes the prompt to build the initial internal state before generating output tokens.

    Prefill Latency

    The time spent processing the input prompt before the model can start generating tokens.

    Prefix Cache

    Reuses computed model state (often KV cache) for repeated prompt prefixes, avoiding repeated prefill computation.

    Prefix Tuning

    A parameter-efficient adaptation technique where you learn small "prefix" vectors that steer attention layers, instead of fine-tuning all model weights.

    Pretraining

    Training a model on large-scale data (often self-supervised) to learn general representations before task-specific adaptation.

    Privacy-Preserving Machine Learning

    A set of techniques that reduce privacy risk when training or serving models.

    Product Quantization (PQ)

    A vector compression technique that approximates high-dimensional vectors using compact codes, enabling faster approximate nearest neighbor search.

    Prompt

    The input (instructions + context + examples + constraints) provided to a language model to elicit a desired output.

    Prompt A/B Testing

    Comparing two prompt versions on real traffic to measure differences in outcomes and guardrails.

    Prompt Budget

    An explicit allocation of tokens for instructions, context, retrieved evidence, and examples.

    Prompt Caching

    Stores reusable prompt components or responses to reduce repeated compute, cost, and latency.

    Prompt Chaining

    A pattern where multiple prompts are run sequentially, where output from one step becomes input to the next.

    Prompt Compression

    Reduces prompt length while preserving essential constraints and context.

    Prompt Engineering

    The art and science of designing input prompts to obtain desired outputs from LLMs.

    Prompt Hardening

    Strengthening prompts and surrounding controls to resist misuse, injection, and unsafe outputs.

    Prompt Injection

    An attack where malicious or untrusted text attempts to override instructions or manipulate an LLM system.

    Prompt Leakage

    Unintended exposure of system prompts, hidden instructions, or sensitive context—through model outputs, logs, or UI/debug tools.

    Prompt Linting

    Automated static analysis of prompts to detect issues before deployment (conflicts, missing constraints, unsafe phrasing).

    Prompt Registry

    A system for storing, versioning, testing, and governing prompts as production artifacts.

    Prompt Regression Testing

    Running a stable evaluation suite against prompt changes to detect quality, safety, format, and cost regressions.

    Prompt Router

    Selects the best prompt template (or workflow) for a request based on intent, difficulty, risk, and context.

    Prompt Sandbox

    A safe environment to test prompts with controlled data, tools, and logs before production.

    Prompt Template

    A reusable prompt structure with variables (placeholders) that can be filled dynamically.

    Prompt Tokens

    The tokens consumed by the model's input (system instructions, user message, retrieved context, tool schemas, examples).

    Prompt Tuning

    Parameter-efficient method where only learnable token embeddings at the input are trained while the entire model stays frozen.

    Proximal Policy Optimization (PPO)

    A reinforcement learning algorithm that updates policies in a constrained way to avoid overly large, unstable changes.

    Pruning

    The removal of unnecessary or unimportant components from a model or search tree to increase efficiency or reduce overfitting.

    Q

    Q-Former

    A Q-Former is a query-based transformer module used in some multimodal systems to extract and compress information from one modality.

    Q-Function

    The Q-function (action-value function) maps a state-action pair to expected return: Q(s, a).

    Q-Learning

    Q-learning is a reinforcement learning method that learns a value function Q(s, a) estimating the expected return of taking action a in state s.

    QAT (Quantization-Aware Training)

    Quantization-aware training trains a model while simulating quantization effects, improving accuracy after quantization compared to PTQ.

    QKV (Query–Key–Value)

    QKV refers to the Query (Q), Key (K), and Value (V) matrices used in transformer attention mechanisms.

    QLoRA (Quantized Low-Rank Adaptation)

    QLoRA is a fine-tuning approach that combines quantization with LoRA to adapt large models with lower memory usage.

    Quadratic Attention Cost

    Quadratic attention cost refers to the classic computational scaling of full self-attention, which grows roughly with the square of sequence length (O(n²)).

    Quality-of-Answer Score

    A quality-of-answer score is a composite metric that estimates how good an AI answer is (usefulness, correctness, clarity, groundedness, safety).

    Quantization

    Reducing numerical precision of model weights to decrease memory and compute requirements.

    Quantum Machine Learning (QML)

    Quantum machine learning explores using quantum computing concepts (qubits, superposition, entanglement) to accelerate or enhance certain ML computations.

    Quarantine

    Quarantine is isolating content, inputs, or events that are suspicious, unsafe, or low-trust so they cannot affect production outputs.

    Query Embeddings

    Query embeddings are vector representations of search queries used for semantic similarity matching against embedded documents/passages.

    Query Expansion

    Query expansion augments a query with additional terms or semantic signals to improve retrieval recall.

    Query Fan-Out

    Query fan-out is when one request triggers many downstream queries/tool calls to gather context or results.

    Query Federation

    Query federation executes a query across multiple systems/sources (databases, services, indexes) and combines results.

    Query Likelihood Model

    A query likelihood model is an information retrieval approach where documents are ranked by the probability that the document's language model would generate the query.

    Query Reranking

    Query reranking reorders search/retrieval results using a stronger scoring function (often a cross-encoder or LLM-based scorer) to improve relevance at the top.

    Query Rewrite

    Query rewrite is modifying a search query to improve retrieval quality (recall/precision), often by clarifying intent, expanding terms, or normalizing vocabulary.

    Query Rewriting

    Transforming a user query into a form that yields better retrieval results.

    Query Routing

    Query routing sends a query to the most appropriate engine, model, index, or workflow based on intent, confidence, and constraints.

    Query Understanding Evaluation

    Query understanding evaluation measures how well your system interprets user intent, entities, constraints, and risk level from queries.

    Query-Time Filtering

    Query-time filtering applies constraints during retrieval—such as permissions, tenant boundaries, recency windows, language, or document type.

    Question Answering (QA)

    Question Answering is a task where a system answers questions based on a corpus, knowledge base, or model knowledge.

    Question Decomposition

    Question decomposition breaks a complex question into smaller sub-questions that can be answered more reliably.

    Quota-Aware Routing

    Quota-aware routing chooses models/workflows based on remaining quota and cost budgets (e.g., route simple queries to cheaper modes when budget is low).

    R

    RAG (Retrieval-Augmented Generation)

    Retrieval-Augmented Generation (RAG) is an architecture where an LLM generates an answer using retrieved external information (documents/chunks) as evidence, rather than relying only on its internal parameters.

    RAG Chunking Strategy

    A RAG chunking strategy defines how source documents are split into retrievable units (chunk size, overlap, structure preservation, metadata).

    RAG Evaluation

    The systematic evaluation of RAG systems across retrieval quality, answer relevancy, groundedness, and faithfulness.

    RAG Poisoning

    RAG poisoning is an attack or failure mode where the retrieval corpus is manipulated so that malicious or misleading content is retrieved as "evidence," degrading outputs or steering the system.

    Ragas

    Ragas is a popular evaluation approach/library for RAG systems that provides practical metrics and workflows to assess retrieval + generation quality.

    Re-Embedding

    Re-embedding is regenerating embeddings for a corpus (documents/chunks) using the same or a new embedding model, then updating the vector index accordingly.

    ReAct (Reason + Act)

    ReAct is an agentic pattern where a model alternates between reasoning and taking actions (tool calls), incorporating observations before continuing.

    Recall@k

    Recall@k measures how often the needed relevant item(s) appear within the top-k retrieved results.

    Recency Bias

    Recency bias is a tendency to overweight more recent information—either in human judgment or in system behavior (ranking, context usage).

    Reciprocal Rank Fusion (RRF)

    RRF combines multiple ranked result lists into one by summing reciprocal ranks, improving robustness when different retrieval methods excel on different queries.

    Recommendation Engine

    System that generates personalized recommendations based on user behavior.

    Red Teaming

    Red teaming is adversarial testing that intentionally tries to break a system to discover vulnerabilities and failure modes (security, safety, reliability).

    Regression

    ML method for predicting continuous numerical values.

    Regression Testing

    Regression testing ensures that changes (code, prompts, retrieval config, model versions) don't break existing behavior or quality.

    Regularization

    Techniques that prevent overfitting by constraining model complexity.

    Reinforcement Learning

    A learning paradigm where an agent learns by interacting with an environment to maximize rewards.

    Reinforcement Learning (RL)

    Reinforcement learning is a paradigm where an agent learns to make decisions by interacting with an environment and optimizing cumulative reward.

    Reproducibility

    Reproducibility is the ability to recreate the same (or equivalent) outputs and behavior given the same inputs, versions, and configuration.

    Reranker

    A reranker is a model that re-scores and reorders retrieved candidates (documents/chunks) to improve relevance at the top.

    Reranking

    Reordering retrieval results with a more powerful model for better relevance.

    Response Generation

    AI process for generating natural language responses.

    Responsible AI

    Responsible AI is designing and operating AI systems to be safe, fair, privacy-aware, transparent, and accountable throughout their lifecycle.

    Retrieval Confidence

    Retrieval confidence is a signal estimating whether retrieved results contain sufficient, relevant evidence to answer the query reliably.

    Retrieval Drift

    Retrieval drift is a change in retrieval behavior/quality over time due to corpus updates, embedding model changes, indexing settings, query distribution shifts, or metadata changes.

    Retrieval-Augmented Generation (RAG)

    A technique that combines LLM generation with external knowledge retrieval to provide more grounded and current responses.

    Retrieval-First Policy

    A retrieval-first policy forces the system to retrieve evidence before generating substantive answers, especially for factual or high-risk queries.

    Retriever

    A retriever is the component that selects candidate documents/chunks relevant to a query (keyword, vector, hybrid, or federated).

    Retriever-Reranker Cascade

    A retriever–reranker cascade is a two-stage retrieval approach: a fast retriever generates candidates, then a slower, more accurate reranker selects the best top-k.

    Reward Hacking

    Reward hacking occurs when a model/agent finds ways to maximize reward without actually achieving the intended real-world goal.

    Reward Model

    A reward model scores model outputs according to a preference objective (helpfulness, safety, format compliance), often used in alignment-style training or evaluation.

    RLAIF (Reinforcement Learning from AI Feedback)

    RLAIF uses AI-generated critiques or preferences (often from a judge model) as feedback signals to improve model behavior, reducing reliance on human labeling.

    RLHF (Reinforcement Learning from Human Feedback)

    RLHF is a post-training approach that uses human preference data to align model behavior toward desired outputs.

    RNN (Recurrent Neural Network)

    A Recurrent Neural Network (RNN) is a neural network architecture for sequential data where neurons use their own output as additional input for the next time step — preserving context across sequences.

    Robustness Testing

    Robustness testing evaluates how reliably a model or system performs under perturbations, edge cases, noise, or distribution shifts.

    RoPE (Rotary Positional Embeddings)

    RoPE is a positional encoding method that applies rotations to query/key vectors, enabling models to represent token positions in a way that supports relative position behavior.

    ROUGE Score

    Metrics for evaluating automatic text summarization.

    Routing Policy

    A routing policy is the rule set that decides which model/workflow/tools to use for a request based on intent, risk, confidence, and budgets.

    S

    Safety

    Safety in AI systems is the set of measures that prevent harmful, insecure, or policy-violating outputs and actions—especially under adversarial or ambiguous inputs.

    Safety Alignment

    Safety alignment is shaping model/system behavior so it reliably follows safety constraints (refusals, safe defaults, policy adherence) across normal and adversarial inputs.

    Safety Case

    A safety case is a structured argument—supported by evidence—that a system is acceptably safe for a specific context and risk profile.

    Safety Classifier

    A safety classifier is a model/rule system that detects unsafe content or risky intent (e.g., self-harm, hate, data exfiltration attempts, policy violations).

    Safety Evaluation

    Safety evaluation is the systematic testing of an AI system for harmful, policy-violating, insecure, or privacy-risk behavior—across normal and adversarial inputs.

    Safety Filters

    Safety filters detect and block or transform unsafe outputs (or unsafe inputs) based on policy (e.g., sexual content, violence, hate, self-harm, illegal instructions).

    Safety Guardrails

    Safety guardrails are mechanisms that constrain an AI system's behavior to reduce harm (policies, validators, permission boundaries, rate limits, refusals).

    Safety Incident Taxonomy

    A safety incident taxonomy is a structured classification system for AI safety incidents (what happened, severity, impact, root cause, mitigation).

    Sampling Steps

    Sampling steps are the number of iterative denoising iterations used during diffusion inference to generate an output.

    Sampling Temperature

    Sampling temperature scales the model's output distribution: lower temperatures make outputs more deterministic; higher temperatures increase randomness.

    Satisficing

    Satisficing is choosing a solution that is 'good enough' to meet constraints, rather than optimizing for the absolute best.

    Scaling Laws

    Scaling laws are empirical relationships showing how model performance tends to improve predictably as you scale data, compute, and parameters.

    Schema Drift

    Schema drift is when the expected structure of data changes over time (fields added/removed/renamed, types change, enums expand), often breaking pipelines.

    Schema Validation

    Schema validation checks that structured data conforms to a defined schema (types, required fields, enums, constraints).

    Seedance

    AI video generator by ByteDance with controversial training data origins and photorealistic results.

    Self-Attention

    Attention mechanism where input elements are related to each other.

    Self-Consistency

    Self-consistency is a technique where you sample multiple reasoning paths/answers and aggregate them (e.g., majority vote) to improve reliability.

    Self-Supervised Learning

    Learning paradigm where the model generates labels from the data itself.

    Semantic Caching

    Semantic caching reuses past answers/results when a new query is semantically similar to a previous query, not necessarily identical.

    Semantic Chunking

    Semantic chunking splits documents into chunks based on meaning boundaries (topics/sections) rather than fixed token counts alone.

    Semantic Router

    A semantic router routes queries to the right workflow, toolset, or model using semantic signals (embeddings, intent classification, similarity to known categories).

    Semantic Search

    A search method that understands the meaning of queries and documents, rather than relying solely on keyword matching.

    Semantic Segmentation

    Pixel-level classification of image regions by object categories.

    Sentence Transformers

    A Python library and collection of models that produce semantically meaningful sentence embeddings – optimized for similarity search and clustering.

    SFT (Supervised Fine-Tuning)

    Supervised fine-tuning (SFT) adapts a pretrained model using labeled input→output examples to shape behavior (format, style, task performance).

    SFT (Supervised Fine-Tuning)

    Training a pre-trained model on curated (input, output) pairs to adapt it to specific tasks or formats.

    SHAP (Shapley Additive Explanations)

    SHAP is a model explainability method based on Shapley values from cooperative game theory that attributes a prediction to individual features.

    Siamese Network

    A Siamese network is a neural architecture with two (or more) identical subnetworks that learn to compare inputs by producing embeddings and measuring similarity.

    Signal-to-Noise Ratio

    Signal-to-noise ratio (SNR) is the proportion of meaningful information ("signal") relative to irrelevant or misleading information ("noise").

    SimCLR

    SimCLR (Simple Contrastive Learning of Visual Representations) is a framework for self-supervised learning that learns visual representations by comparing augmented image versions.

    Similarity Score Calibration

    Similarity score calibration maps raw similarity scores (from embeddings/rerankers) to more reliable confidence signals (e.g., probabilities or risk bands).

    Similarity Search

    Similarity search finds items most similar to a query under a similarity metric (cosine similarity, dot product, etc.), commonly used with embeddings.

    Similarity Thresholding

    Similarity thresholding sets cutoff values on similarity scores (embedding similarity, reranker scores) to decide actions like "use cache," "retrieve more," or "ask a clarifying question."

    SimPO (Simple Preference Optimization)

    A simplified version of DPO that works without a reference model and uses length-normalized reward.

    Simulation

    The imitation of a real or hypothetical system or process in a controlled virtual environment.

    Sliding Window Attention

    Sliding window attention restricts attention to a moving window of nearby tokens rather than the full sequence, reducing compute and memory costs.

    Slot Filling

    Extraction of specific parameters from user utterances for conversational AI.

    Small Language Model

    A Small Language Model (SLM) is a comparatively smaller LLM designed for lower latency, lower cost, and easier deployment—often used for narrow tasks or as part of a routed system.

    Soft Prompt

    A soft prompt is a learned vector representation (rather than human-written text) used to steer a model's behavior—often trained as a small set of prompt embeddings.

    Softmax

    Function that converts logits into probability distribution.

    Solomonoff Induction

    Solomonoff induction is a theoretical framework for optimal prediction that combines Bayesian inference with algorithmic complexity, weighting hypotheses by how simply they describe the data.

    Sora 2

    The second generation of OpenAI's text-to-video model with improved quality, longer clips, and more realistic physics simulation.

    Source Attribution

    Source attribution is explicitly indicating where information came from (documents, URLs, internal systems), often via citations or links.

    Source Grounding

    Source grounding is constraining an AI system to base its answers on provided sources (retrieved documents, tools, or approved references) rather than unverified model knowledge.

    Sparse Attention

    Sparse attention reduces attention computation by allowing tokens to attend only to a subset of other tokens (patterned or learned sparsity).

    Sparse Autoencoder

    A Sparse Autoencoder (SAE) is an autoencoder trained with a sparsity constraint so that only a small subset of features activate for any given input.

    Sparse Retrieval

    Sparse retrieval uses sparse representations (often term-frequency based) such as BM25 to retrieve documents by lexical match.

    Speaker Diarization

    Speaker diarization identifies "who spoke when" in an audio recording by segmenting audio into speaker-labeled turns.

    Speculative Decoding

    Speculative decoding accelerates LLM generation by using a smaller "draft" model to propose tokens that a larger model then verifies/accepts in batches.

    Speech-to-Text

    Technology for converting spoken language into written text – the foundation for voice assistants and transcription.

    Speech-to-Text (STT)

    Speech-to-Text (STT) converts spoken audio into written text using automatic speech recognition (ASR) models.

    State Space Models (SSMs)

    State Space Models (SSMs) are sequence models that maintain a latent "state" that evolves over time to process sequential data efficiently.

    Statefulness

    Statefulness describes whether a system retains information across interactions (stateful) or treats each request independently (stateless).

    Steering Vector

    A steering vector is a direction in a model's internal representation space that, when added or applied to activations, can bias outputs toward or away from certain behaviors or attributes.

    Stochastic Parrot

    Stochastic parrot is a critique framing that highlights how LLMs can generate fluent text by pattern-matching from training data without true understanding—raising concerns about bias, misinformation, and misuse.

    Stop Sequence

    A stop sequence is a token/string pattern that tells a model to stop generating when encountered.

    Streaming ASR

    Streaming ASR transcribes speech in near real-time as audio arrives, rather than after the full recording is complete.

    STRIPS

    STRIPS is a classical planning formalism where actions are defined by preconditions and effects (add/delete lists) over symbolic state predicates.

    Structured Output

    Structured output is requiring the model to produce outputs in a predefined structure (JSON, YAML, sections with strict headings), often enforced with validation.

    Style Transfer

    Style transfer modifies an image (or text) to match a target style while preserving core content.

    Subject Consistency

    The ability of an AI image generator to consistently render characters and objects across multiple images.

    Summarization

    Summarization is generating a shorter representation of content while preserving key meaning—extractive (selecting parts) or abstractive (rewriting).

    Superposition

    Superposition in neural networks describes how multiple features can be represented in overlapping directions within a limited-dimensional space, rather than one feature per neuron.

    Supervised Learning

    ML paradigm where the model learns from labeled examples (input-output pairs).

    SWE-Bench (Software Engineering Benchmark)

    A benchmark that tests LLMs by having them solve real bug reports from GitHub repositories – the most realistic test for AI coding abilities.

    Sycophancy

    Sycophancy is an LLM behavior where the model overly agrees with the user's stated beliefs or incorrect premises instead of correcting them.

    Synthetic Data

    Synthetic data is artificially generated data used to train, test, or evaluate systems when real data is scarce, sensitive, or costly.

    System Prompt

    A system prompt is the highest-priority instruction layer that defines the model's role, boundaries, and policies for a session.

    T

    Technological Singularity

    A hypothetical point at which technological progress (especially AI) becomes so rapid and profound that it fundamentally and unpredictably transforms human civilization.

    Temperature

    A parameter that controls randomness in LLM output.

    Temporal Graph Network

    A GNN for time-evolving graphs that models the evolution of nodes and edges over time.

    Text Generation

    Text generation is the automatic creation of text by AI models, typically based on a prompt or context.

    Text-to-Image

    Text-to-image is generating images from text prompts using generative models (commonly diffusion-based or transformer-based approaches).

    Text-to-Speech

    Technology for converting written text into natural-sounding speech – today mostly using neural models.

    Tokenization

    Breaking text into smaller units (tokens) for processing by NLP models.

    Top-k Sampling

    A sampling parameter that restricts selection to the k most likely tokens, regardless of their absolute probabilities.

    Top-p (Nucleus Sampling)

    A sampling parameter that selects only from the most likely tokens whose cumulative probability does not exceed p.

    Transfer Learning

    Using knowledge learned from one task to improve performance on a related task.

    Transformer

    A neural network architecture that uses self-attention to model relationships between all positions in a sequence.

    Tree of Thoughts (ToT)

    Prompting strategy where the LLM explores multiple reasoning paths in parallel, evaluates them, and selects the best – like a decision tree for thought chains.

    Triplet Loss

    A loss function for metric learning that uses anchor, positive, and negative samples to train embeddings so similar items are closer and different ones further apart.

    Trust & Safety

    Trust & Safety is the practice of protecting users, platforms, and brands from harmful content, abuse, and unsafe outcomes—through policy, enforcement, and product design.

    TruthfulQA

    A benchmark that tests whether LLMs avoid popular misinformation and conspiracy theories.

    V

    VAE (Variational Autoencoder)

    VAE stands for Variational Autoencoder, a generative model that learns a probabilistic latent space for sampling and generation.

    Value Alignment

    Value alignment is ensuring an AI system's behavior reliably matches intended human/organizational values and constraints (safety, fairness, truth-seeking, privacy).

    Value of Information (VoI)

    Value of Information (VoI) quantifies how much benefit you gain by obtaining additional information before making a decision.

    Vanishing Gradient

    Vanishing gradient is a training problem where gradients become extremely small as they propagate backward through a network, slowing or preventing learning in early layers.

    Variational Autoencoder (VAE)

    A Variational Autoencoder (VAE) is a generative model that learns a probabilistic latent space, enabling sampling and generation of new data.

    Veo 3

    Google's third-generation video generation model with native audio, longer clips, and improved physics.

    Verification

    Checking whether LLM outputs are correct, factual, and source-supported.

    Verification Layer

    A verification layer is a system component that checks whether an AI output or action meets required correctness, safety, policy, and formatting constraints before it is delivered or executed.

    Verification-First Policy

    A verification-first policy requires AI outputs and high-impact actions to pass defined verification checks before being shown to users or executed.

    Video AI

    Video AI encompasses AI technologies for automatic analysis, generation, editing, and optimization of video content.

    Vision Transformer (ViT)

    A Vision Transformer (ViT) applies transformer architectures to images by representing them as sequences of patch embeddings.

    Vision-Language Model (VLM)

    A Vision-Language Model (VLM) processes both images and text to perform tasks like image understanding, captioning, document Q&A, and multimodal reasoning.

    VQ-VAE

    VQ-VAE is a variant of VAE that uses vector quantization to learn discrete latent representations via a learned codebook.

    W

    Warm Start

    A warm start initializes training or optimization from a previously learned state (weights, embeddings, or parameters) rather than starting from scratch.

    Watermarking

    Watermarking is adding a detectable signal to content (text, image, audio, video) to indicate origin, authenticity, or provenance—often used to mark AI-generated outputs.

    Weak Supervision

    Weak supervision uses imperfect, noisy, or indirect signals (heuristics, rules, distant labels) to create training labels instead of manual annotation.

    Weakly Supervised Learning

    Weakly supervised learning trains models using weak supervision signals (noisy labels, partial labels, aggregated labels) rather than fully reliable labels.

    Weavy

    AI video platform with node-based editor for complex generative video workflows and multi-model pipelines.

    Web Browsing Tool

    A web browsing tool is an AI tool integration that fetches live web pages or search results to answer questions with up-to-date information.

    Web Grounding

    The ability of an AI model to access web search results in real-time to generate current and factually accurate content.

    Weight Decay

    Weight decay is a regularization technique that discourages large weights during training, often implemented as L2 regularization or decoupled weight decay (e.g., in AdamW).

    WER (Word Error Rate)

    Word Error Rate (WER) measures speech recognition accuracy as the proportion of substitutions, deletions, and insertions needed to transform a transcript into the ground truth.

    Whisper

    An open-source speech recognition model from OpenAI trained on 680,000 hours of multilingual audio.

    Windowed Attention

    Windowed attention restricts attention to a local token window instead of the full sequence, reducing compute and enabling longer contexts.

    WinoGrande

    A benchmark for pronominal reference resolution where small word changes flip the correct answer.

    Word Embedding

    A dense vector representation of a word that encodes its semantic meaning.

    Word2Vec

    Word2Vec is a technique for generating word embeddings that represents words as dense vectors, where semantically similar words have similar vectors.

    World Model

    An internal representation of the environment in an AI system that enables predictions about future states and the effects of actions.

    Term not found?

    Browse the full glossary with over 1407 terms from all categories.

    View Full Glossary
    👋Questions? Chat with us!