Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    AI Terms A-Z

    Discover the most important terms in Artificial Intelligence – from Machine Learning to Deep Learning to Large Language Models. Each term is explained clearly with practical marketing examples.

    Machine Learning
    Deep Learning
    Large Language Models
    Neural Networks
    Prompt Engineering
    Computer Vision
    1020 terms in Artificial Intelligence

    A

    A* Search

    A* (pronounced "A-star") is a classical search algorithm that finds the shortest path between a start and a goal node in a graph by minimizing the total cost f(n) = g(n) + h(n) at every node — the sum of actual path cost so far and an estimated remaining distance (heuristic).

    Abductive Logic Programming (ALP)

    A framework in logic programming that allows certain premises to be left unspecified and then infers plausible explanations for observations.

    Abductive Reasoning

    A form of logical inference that starts from an observation and seeks the simplest and most likely explanation for it.

    Ablation

    In AI research, an ablation refers to the removal or disabling of a component of a system to assess that component's impact on the overall performance.

    Accountability

    The obligation to take responsibility for AI decisions and be able to explain their impacts.

    Action Language

    A formal language used to describe state changes in a system – how actions affect the state of the world over time.

    Action Model Learning

    A machine learning approach focused on enabling an AI agent to learn the outcomes and requirements of its actions within an environment.

    Action Selection

    The process by which an intelligent agent decides "what to do next," choosing the next action from a set of possible actions.

    Activation Function

    A mathematical function used in artificial neural networks to determine the output of a node (neuron) given an input or set of inputs.

    Active Learning

    ML strategy where the model selects the most informative samples for labeling.

    Actor-Critic

    RL architecture with two components: an actor (policy) selects actions, a critic (value function) evaluates them – combines strengths of policy gradient and value-based methods.

    Adafactor

    Memory-efficient optimizer that replaces Adam's second moment with a factorized approximation – saves up to 50% optimizer memory.

    AdaGrad

    Optimizer that adaptively adjusts the learning rate per parameter – frequently updated parameters get smaller rates, rare ones get larger.

    Adam Optimizer

    Adaptive optimization algorithm with momentum and adaptive learning rates.

    AdamW

    Corrected variant of the Adam optimizer that decouples weight decay from the gradient update – the de facto standard for LLM and transformer training.

    Adaptive Algorithm

    An algorithm that changes its behavior or parameters in response to the problem instance or environment as it runs, aiming to improve performance on the fly.

    Adaptive Learning

    An educational methodology (often implemented with AI) that customizes learning content and pace to the individual needs and performance of each learner.

    Adaptive Neuro-Fuzzy Inference System

    A hybrid system that combines neural networks and fuzzy logic principles to create a model capable of learning from data while employing human-like reasoning.

    Admissible Heuristic

    A heuristic h(n) is called admissible if it never overestimates the true remaining cost from node n to the goal — i.e. it always provides an optimistic lower bound. This property guarantees that search algorithms like A* find an optimal path.

    Adversarial Attacks

    Targeted input manipulations that cause AI systems to misclassify or behave incorrectly.

    Adversarial Robustness

    The ability of an ML model to maintain correct predictions even when inputs are deliberately manipulated.

    Agent Architecture

    The underlying structure and components of an intelligent agent system, describing how the agent is organized internally to sense, think, and act.

    Agent Handoff

    The process where an AI agent passes a task to another specialized agent or to a human.

    Agent Loop

    The iterative cycle of an AI agent: Observe → Think → Act → Evaluate result → Repeat until goal is reached.

    Agent Memory

    Systems for storing information that AI agents can use beyond the context window – from short-term scratchpads to persistent knowledge stores.

    Agent Swarms

    A system of multiple specialized AI agents that work together autonomously, distribute tasks among themselves, and achieve complex goals in a coordinated manner – inspired by swarm behavior in nature.

    AgentBench

    A benchmark for evaluating LLM agents in 8 different interactive environments like websites, databases, games, and operating systems.

    Agentic AI

    AI systems that can autonomously pursue goals, make decisions, use tools, and execute multi-step tasks without continuous human guidance.

    Agentic Coding

    A paradigm where AI agents autonomously write, test, debug, and iterate on code – with minimal human intervention.

    Agentic RAG

    Agentic RAG is an evolution of retrieval-augmented generation in which an AI agent dynamically decides when, which and how many sources to query — instead of following a rigid retrieval pipeline with fixed top-k vector search.

    AI Agent

    An autonomous software system that uses AI to independently plan and execute tasks.

    AI Agents

    Autonomous AI systems that independently pursue goals, create plans, use tools, and interact with their environment – beyond simple prompt-response.

    AI Agents for Search

    Autonomous AI systems that conduct complex research – searching multiple sources, synthesizing, drawing conclusions.

    AI Alignment

    The research field and practice of developing AI systems that understand and reliably pursue human values, intentions, and goals.

    AI Art

    Visual art created wholly or partially by AI systems – from prompt-based image generation to interactive installations.

    AI Audit

    The independent examination of AI systems for fairness, bias, security, compliance, and performance by external or internal auditors.

    AI Avatars

    Computer-generated, photorealistic digital humans animated by AI that can present any content.

    AI Code Review

    AI-powered automatic review of code changes for bugs, security vulnerabilities, best practices, and style.

    AI Coding Assistants

    AI-powered tools that assist developers with programming – from autocomplete to code generation to complete feature implementations.

    AI Copyright

    The legal question of who owns copyrights to AI-generated content and how training data usage should be legally classified.

    AI Debugging

    The use of AI to automatically identify, analyze, and fix software bugs.

    AI Discovery

    AI systems that proactively recommend relevant content, products, or information – without explicit search query.

    AI Ethics

    The interdisciplinary field examining moral principles, values, and guidelines for the development, deployment, and societal impact of AI systems on society and individuals.

    AI Governance

    The framework of policies, processes, and responsibilities for the responsible development, deployment, and use of AI systems in organizations.

    AI Liability

    The legal responsibility for damages caused by AI systems, and the question of who is liable: developer, operator, or user.

    AI Music Generation

    AI music generation creates musical pieces from text prompts, melodies, or style specifications – from background music to complete songs.

    AI Regulation

    The entirety of legal regulations and guidelines governing the development, deployment, and impact of AI systems.

    AI Risk Management

    The systematic identification, assessment, and management of risks that can arise from AI systems.

    AI Safety

    The research field focused on making AI systems safe, controllable, and aligned with human values.

    AI Search

    Search engines that use LLMs to understand queries and deliver direct answers instead of link lists.

    AI Slop

    Pejorative term for low-quality, mass-produced AI-generated content flooding the internet that provides no real value.

    AI Transparency

    The disclosure of how AI systems work, were trained, and make decisions, as well as labeling AI-generated content.

    AI Watermarking

    Techniques for embedding invisible markers in AI-generated content to prove its origin and enable detection of deepfakes.

    AI-Complete

    A problem is termed AI-complete if solving it by machine would essentially require general human-level intelligence.

    Aider Polyglot Benchmark

    Coding benchmark testing LLMs on real-world multi-file edits across multiple programming languages.

    Algorithmic Discrimination

    Algorithmic discrimination refers to the systematic disadvantage of certain groups by algorithmic decision systems – often as a result of biased training data or unbalanced model architectures.

    Algorithmic Efficiency

    Algorithmic efficiency measures how economically an algorithm uses computation time, memory, and energy – typically expressed in Big-O notation for scaling behavior.

    Algorithmic Impact Assessment

    Systematic evaluation of the potential impacts of an algorithmic system on individuals, groups, and society before and during deployment.

    Algorithmic Probability

    A theoretical measure that assigns a probability to an observation by considering all possible algorithms that could produce it, weighted by their simplicity.

    ALiBi (Attention with Linear Biases)

    A method for position encoding that adds linear biases directly to attention scores instead of learning position embeddings.

    Alignment

    The problem of ensuring that AI systems pursue the intended goals and values of their developers and society.

    Alignment Tax

    The performance loss caused by alignment and safety training – a model becomes safer but potentially less capable.

    Alpha-Beta Pruning

    An optimization technique for the minimax algorithm that prunes parts of the game tree without affecting the result.

    Anchor Box

    Predefined bounding boxes of various sizes and aspect ratios that serve as starting points for object detection.

    Ant Colony Optimization

    A probabilistic optimization technique inspired by the behavior of ants foraging for food and their use of pheromone trails.

    Anthropic

    An AI safety company founded by former OpenAI researchers, known for Claude – one of the most advanced LLMs focused on safety and honesty.

    Anytime Algorithm

    An anytime algorithm is an algorithm that can return a valid — though not yet optimal — solution at any intermediate stage and monotonically improves solution quality with additional compute time.

    Approximation Error

    The difference between an exact, true value and an approximate value that is used or obtained by an algorithm or model.

    ARC (AI2 Reasoning Challenge)

    A multiple-choice benchmark with natural science questions at elementary and middle-school level in Easy and Challenge sets.

    ARC-AGI-2

    Benchmark by the ARC Prize Foundation that measures general reasoning ability of AI systems via abstract pattern tasks.

    Artificial General Intelligence (AGI)

    A hypothetical form of AI that possesses human-like cognitive abilities across all domains and can learn and adapt autonomously.

    Artificial Neural Network (ANN)

    An Artificial Neural Network (ANN) is a computational model inspired by the biological brain, consisting of layers of connected neurons that can learn to extract complex patterns from data by adjusting weights.

    Assessment

    Assessment is the measurement of knowledge, skill, or performance—used to diagnose current ability, provide feedback, and certify learning outcomes.

    Attention Mechanism

    A neural network mechanism that allows models to dynamically "focus" on relevant parts of the input – the key innovation behind modern LLMs.

    Attention Pooling

    Attention pooling aggregates a sequence of vectors into a single representation vector by giving learned attention weights more importance to the most relevant elements.

    Attention Sink

    A phenomenon in LLMs where the first token (BOS) receives disproportionately high attention, even when semantically irrelevant.

    Attributional Calculus

    A logical framework combining predicate logic with multi-valued (fuzzy) logic to represent attributes of entities in an intuitive, human-readable way.

    Audio Deepfake

    AI-generated audio recordings that convincingly imitate a real person and can be used for fraud, misinformation, or manipulation.

    Audio Generation

    The creation of audio content through AI models – from music to sound effects to speech and ambient sounds.

    Audio Language Models

    AI models that can directly understand and generate audio – from speech recognition to music analysis to natural speech generation with emotions and intonation.

    Autoencoder

    A type of neural network designed to learn a compressed representation (encoding) of input data and then reconstruct the original data from this encoding.

    AutoGPT

    An experimental open-source project that lets GPT-4 autonomously pursue goals – pioneer of the agentic AI movement.

    Automated Machine Learning

    The process of automating the end-to-end process of applying machine learning to real-world problems, including data preprocessing, model selection, and hyperparameter tuning.

    Automated Planning

    Automated planning is the AI subfield concerned with algorithms that, given an initial state, a goal state, and a set of possible actions, automatically find a sequence of actions (a plan) that achieves the goal.

    AutoML (Automated Machine Learning)

    AutoML automates parts of the machine learning lifecycle such as model selection, feature preprocessing, hyperparameter tuning, and validation.

    Autonomous Agent

    An AI agent that pursues goals, makes decisions, and executes actions without human intervention – the highest autonomy level.

    Autonomous Driving

    The use of AI systems for full or partial control of vehicles without human intervention, classified in SAE Level 0-5.

    Autoregressive Model

    An autoregressive model generates sequences token by token, where each new token depends on all previous ones – the architecture behind GPT, LLaMA, and all modern LLMs.

    B

    Backpropagation

    An algorithm for computing gradients in neural networks that propagates errors backwards through the network to adjust weights.

    Backtracking

    An algorithmic technique that systematically explores all possible solutions and returns to the last decision point when hitting dead ends.

    Backward Chaining

    An inference strategy that starts from the goal and works backward to find the facts and rules that would prove the goal.

    Bag of Words (BoW)

    Simplest text representation that represents text as an unordered set of words with frequencies.

    Bagging

    An ensemble learning method that trains multiple models on bootstrap samples and aggregates their predictions.

    Bandit-Based Recommendation

    Recommendation systems using multi-armed bandits to balance exploration of new items with exploitation of known preferences.

    Batch Normalization

    A normalization technique that normalizes activations in neural networks across mini-batches – stabilizing training and enabling higher learning rates.

    Batch Size

    Number of training examples per gradient update.

    Bayesian Optimization

    Bayesian optimization is an approach to optimizing expensive black-box functions (e.g., model hyperparameters) using a probabilistic surrogate model and an acquisition function.

    Beam Search

    Beam search is a heuristic search algorithm that, at every search step, keeps only the k best partial solutions ("beam width") — a compromise between exhaustive breadth-first search (high quality, high cost) and greedy search (low quality, low cost).

    Behavioral AI

    AI systems that analyze user behavior, recognize patterns, and predict future actions.

    BERT

    BERT (Bidirectional Encoder Representations from Transformers) is a language model developed by Google that processes text bidirectionally, enabling deep contextual understanding.

    BERT (Google)

    Google's Transformer model for bidirectional language understanding.

    BERTScore

    A semantic evaluation metric that uses BERT embeddings to measure similarity between generated and reference text.

    BGE Embedding

    BGE (BAAI General Embedding) is a family of open-source embedding models from Beijing Academy of AI that achieve top results on MTEB.

    Bi-Encoder

    An encoder architecture that transforms query and document independently into embeddings – enabling fast similarity search over pre-computed vectors.

    Bias (AI)

    Systematic distortions in AI systems leading to unfair or discriminatory outcomes for certain groups of people, often caused by imbalanced training data or flawed assumptions.

    Bias-Variance Tradeoff

    Fundamental tradeoff: simple models have high bias (underfitting), complex ones high variance (overfitting).

    BIG-Bench

    A collaborative benchmark with 200+ tasks created by 400+ researchers to test LLM capabilities beyond existing benchmarks.

    Bing Copilot

    Microsoft's AI-powered search engine combining GPT-4 with Bing search – integrated into Windows, Edge, Office.

    BLEU Score

    Metric for automatic evaluation of translation quality.

    Boosting

    An ensemble learning method that sequentially combines weak learners to create a strong classifier.

    Bootstrapping

    Statistical resampling method that repeatedly draws samples with replacement from the dataset.

    BPE (Byte Pair Encoding)

    Subword tokenization algorithm that iteratively merges frequent character pairs to create an optimal vocabulary.

    Breadth-First Search (BFS)

    Breadth-First Search (BFS) traverses a graph level by level, exploring all neighbors of a node before moving deeper.

    C

    Calibration

    The process of adjusting a model's predicted probabilities so they reflect actual event probabilities.

    Causal Masking

    Causal masking prevents tokens from attending to future positions – the technique enabling autoregressive generation in decoders like GPT.

    CER (Character Error Rate)

    Metric for speech recognition and OCR at character level.

    Certified Defense

    Defense methods against adversarial attacks that provide mathematically provable robustness guarantees.

    Chain of Thought

    Prompting technique and model capability where the model explicitly articulates its thinking process in intermediate steps before arriving at the final answer.

    Chain-of-Thought Prompting

    A prompting technique that gets LLMs to lay out their thoughts step by step before giving a final answer – leading to significantly better results on complex tasks.

    Chatbot

    A software program that simulates conversations with humans, typically through text or voice interfaces.

    Chatbot Arena

    A public Elo-based leaderboard where users blindly choose between two LLMs – the most important benchmark for LLM ranking.

    ChatGPT

    A conversational AI system built on large language models that generates human-like responses to user prompts.

    ChatGPT Agent

    Autonomous mode of ChatGPT that independently executes multi-step tasks in browsers, apps, and files.

    Chinchilla Optimal

    The finding that for compute-optimal LLM training, the number of training tokens should scale proportionally to parameter count.

    Chunking

    The process of dividing large documents into smaller, semantically coherent text segments for efficient embedding and retrieval in RAG systems.

    CIDEr

    A metric for image captioning that measures TF-IDF-weighted n-gram similarity.

    Class Imbalance

    Situation where one class in the training dataset occurs significantly more frequently than others.

    Classification

    A supervised ML algorithm that assigns data to predefined categories or classes.

    Classifier-Free Guidance (CFG)

    Classifier-Free Guidance controls how strongly a diffusion model follows the text prompt – higher values produce more prompt-faithful but potentially over-saturated images.

    Claude

    Anthropic's family of LLMs, known for long context windows, nuanced responses, and a focus on safety and honesty.

    Claude Computer Use

    Claude's capability to operate a desktop computer: mouse, keyboard, screenshots, and applications like a human user.

    Claude Cowork

    Collaborative multi-user mode of Claude for joint project work with shared context and role distribution.

    Claude Design

    Visual design mode of Claude for UI mockups, brand asset generation, and layout iteration via natural language.

    Claude Haiku

    Anthropic's fastest and most cost-effective AI model, optimized for speed and volume in tasks like classification, chatbots, and real-time processing.

    Claude Opus

    Anthropic's most powerful and expensive AI model, designed for complex analysis, strategic planning, and tasks requiring the highest cognitive depth.

    Claude Opus 4.6

    Anthropic's 2026 flagship LLM with extended reasoning, 1M-token context, and native computer-use capabilities.

    Claude Skills

    Modular system by Anthropic that bundles reusable capabilities (prompt + tools + data) for Claude.

    Claude Sonnet

    Anthropic's balanced AI model offering optimal balance between quality, speed, and cost – the all-rounder of the Claude family.

    CLIP (Contrastive Language–Image Pretraining)

    A multimodal model approach that learns aligned representations of images and text by training them to match corresponding image–caption pairs.

    Clustering

    An unsupervised learning technique that groups data points into clusters such that items in the same cluster are more similar to each other.

    Code Generation

    The automatic creation of program code by AI models based on natural language descriptions, examples, or partial code snippets.

    Codex 5.3

    OpenAI's specialized 2026 coding model for agentic software development and long-running tasks in repositories.

    Cohere

    An enterprise-focused AI company specializing in RAG, embeddings, and multilingual LLMs.

    Cohere Embed

    Cohere's commercial embedding API with special optimization for retrieval and distinction between query and document embeddings.

    ColBERT

    ColBERT is a late-interaction retrieval architecture that creates token-level embeddings for query and document, aggregating them via MaxSim during search.

    Cold Start Problem

    The problem when a system has insufficient data about a new user, item, or context to make accurate predictions or recommendations.

    Collaborative Filtering

    A recommendation approach that predicts a user's preferences based on the behavior of similar users or similarities between items.

    ComfyUI

    ComfyUI is a visual, node-based workflow editor for Stable Diffusion and other diffusion models – the professional standard for complex image generation pipelines.

    Command R

    Cohere's RAG-optimized language model, specifically developed for enterprise retrieval, multilingual applications, and tool use.

    Computer Vision

    The AI subfield that enables computers to understand and interpret visual information.

    Conditional Generation

    Conditional generation produces outputs based on conditions like text, class, image, or other control signals.

    Conformal Prediction

    A framework-agnostic method that provides predictions with guaranteed confidence intervals without assumptions about model distribution.

    Consistency Model

    Consistency models generate images in one or few steps by learning to jump from any point on the diffusion trajectory directly to the result.

    Constitutional AI

    An approach developed by Anthropic where AI systems are trained according to a set of ethical principles ("constitution") to self-correct and avoid harmful outputs.

    Content Filter

    Systems that check and block AI inputs and outputs for unwanted content.

    Content-Based Filtering

    Recommendations based on properties of items a user liked.

    Context Engineering

    The practice of designing, selecting, and structuring the information an LLM receives so it produces more reliable and relevant outputs.

    Context Window

    The maximum amount of text (measured in tokens) that an AI language model can process and "remember" at once – the larger it is, the more context can be considered.

    Contextual Bandit

    A decision-making algorithm that chooses among actions using current context features, while learning from feedback to balance exploration and exploitation.

    Continual Learning

    The ability of an ML model to continuously learn from new data without forgetting previously learned knowledge – the "lifelong learning" problem of AI.

    Continuous Batching

    A serving technique that inserts new requests into running batches as soon as other requests complete, instead of waiting for batch completion.

    Contrastive Learning

    A representation learning approach that trains models to pull similar pairs closer and push dissimilar pairs apart in embedding space.

    ControlNet

    ControlNet is a neural network architecture that adds additional conditions (edges, pose, depth) to diffusion models, enabling precise control over image generation.

    Convergence

    The point where a model stops improving significantly – the loss stabilizes and further epochs bring no progress.

    Conversational AI

    Conversational AI refers to AI systems that can conduct natural, human-like conversations via text or voice – from chatbots to voice agents.

    Conversational Search

    Conversational Search enables information retrieval through natural dialogs instead of rigid keywords – the future of search engines and enterprise search.

    Convolutional Neural Network (CNN)

    A neural network architecture that uses convolution operations to learn hierarchical feature representations from grid-like data such as images.

    Coreference Resolution

    Identifying all mentions in text that refer to the same entity (e.g., "Angela Merkel" → "she" → "the chancellor").

    Cosine Annealing

    A learning rate schedule strategy that gently reduces the learning rate from a maximum value to near zero following a cosine curve.

    Counterfactual Explanation

    Explanation method that shows what minimal input change would have led to a different model outcome.

    Cross-Attention

    Cross-attention computes attention between two different sequences – e.g., between text conditioning and image generation in diffusion models.

    Cross-Encoder

    An encoder architecture that processes query and document together and outputs a relevance score – more precise than bi-encoders but slower.

    Cross-Entropy Loss

    Loss function for classification tasks based on information theory.

    Cross-Validation

    A technique for evaluating model performance by training and testing on different data subsets.

    CTC (Connectionist Temporal Classification)

    CTC is a training algorithm for sequence-to-sequence problems where input and output have different lengths – the key to modern ASR.

    Curriculum Learning

    Training strategy where samples are presented in a meaningful order – from easy to hard, similar to a curriculum.

    Cursor

    An AI-native code editor (VS Code fork) that offers deep AI integration for code generation, refactoring, and natural language programming.

    Custom GPT

    GPT tailored to a specific use case with its own prompt, knowledge base, and tool set, hosted by OpenAI.

    CutMix

    Data augmentation technique that cuts out a rectangular region from one image and replaces it with a region from another image.

    Cyclical Learning Rate (CLR)

    Learning rate schedule that cyclically varies the LR between a minimum and maximum – prevents stagnation and helps overcome saddle points.

    D

    DALL-E 3

    OpenAI's latest text-to-image generation, integrated into ChatGPT, known for precise prompt following and text rendering.

    Data Augmentation

    Techniques for artificially expanding training data through transformations.

    Data Leakage

    Situation where information from the test set or the future leaks into training, producing unrealistically good results.

    Data Parallelism

    The simplest form of distributed training: Each GPU holds a complete model copy and processes different data batches – gradients are synchronized.

    Data Poisoning

    An attack where manipulated data is injected into the training process to deliberately influence model behavior.

    Datasheets for Datasets

    Standardized documentation for ML datasets describing provenance, composition, collection methods, recommended use, and known limitations.

    DDIM (Denoising Diffusion Implicit Model)

    DDIM is an accelerated sampling algorithm for diffusion models enabling deterministic generation with significantly fewer steps.

    DDPM (Denoising Diffusion Probabilistic Model)

    DDPM is the foundational framework for diffusion models that generates images by progressively denoising from pure noise.

    Decision Making

    Decision making is the process of selecting an action (or non-action) among alternatives based on goals, evidence, constraints, and uncertainty.

    Decision Theory

    Decision theory studies how agents should make choices under uncertainty, often by maximizing expected utility subject to constraints.

    Decision Tree

    An ML model that represents decisions as a tree structure with branches based on feature values.

    Decoder

    The part of a model that transforms a compressed representation back to the original format.

    Decoding

    The process of converting encoded data or signals back to their original or usable form, in ML specifically the token-by-token generation of outputs.

    Decoding Strategy

    A decoding strategy is the method used to convert a model token probability distribution into an actual output sequence.

    Deductive Reasoning

    A form of logical inference where specific conclusions are drawn from general premises—if the premises are true, the conclusion is guaranteed to be true.

    Deep Compression

    A three-stage compression pipeline (Pruning → Quantization → Huffman Coding) that can compress neural networks by 35-49x – the foundational work of model compression.

    Deep Learning

    A subfield of machine learning that uses deep neural networks with many layers to learn complex patterns from data.

    Deep Reinforcement Learning

    Reinforcement learning that uses deep neural networks to learn policies that choose actions to maximize long-term reward.

    Deepfake

    Deepfakes are AI-generated or -manipulated media (video, audio, images) showing people doing or saying things that never happened.

    Deepfake Detection

    Technologies and methods for identifying AI-generated or manipulated media content such as videos, audios, and images.

    DeepSeek

    Chinese AI startup developing powerful open-source language models, competing with Western providers at significantly lower costs.

    DeepSeek R1

    An open-source reasoning model from DeepSeek that competes with GPT-4 and Claude on complex thinking and coding tasks.

    DeepSeek V4

    Open-weight flagship by DeepSeek that reaches comparable benchmarks at 1/10 the training cost of Western models.

    DeepWalk

    A graph embedding algorithm that combines random walks on graphs with Word2Vec to learn node representations.

    Default Reasoning

    Default reasoning draws conclusions using 'defaults' that hold in typical cases, while allowing exceptions when new information arrives.

    Demographic Parity

    Fairness criterion: A model satisfies demographic parity when prediction rates (e.g., approval rate) are equal across all protected groups.

    Denoising

    Denoising is the process of removing noise from a signal; in diffusion models, it's the iterative transformation from noisy latents to a clean sample.

    Dense Passage Retrieval

    A retrieval approach using bi-encoder embeddings for query and passages – the foundation of modern semantic search.

    Dense Retrieval

    Retrieval method that uses dense vector representations (embeddings) to find semantically similar documents.

    Dependency Parsing

    Analyzing the grammatical structure of a sentence by identifying dependency relationships between words.

    Depth Estimation

    Predicting depth values (distances) for every pixel of a 2D image to generate a 3D depth map.

    Depth-First Search (DFS)

    Depth-First Search (DFS) traverses a graph by going as deep as possible along one path before backtracking.

    Depthwise Separable Convolution

    An efficient convolution variant that decomposes a standard convolution into two steps – depthwise (per channel) and pointwise (1x1 convolution) – for 8-9x fewer computations.

    Detokenization

    The process of converting tokens back into readable text – the reverse of tokenization.

    DETR (Detection Transformer)

    A transformer-based model for object detection that predicts bounding boxes as set prediction without anchor boxes.

    Devin

    The first "AI software engineer" from Cognition Labs that can work on complex programming tasks autonomously over extended periods.

    Dialogue Management

    Component of a conversational AI system that controls the conversation flow.

    Diffusion Model

    Diffusion models are generative AI models that learn to gradually remove noise from data to produce high-quality samples (images, audio, video).

    Dilated Convolution (Atrous Convolution)

    Dilated Convolution expands the receptive field of a filter by inserting gaps between filter values – larger context without more parameters.

    Disclosure UX

    Disclosure UX is the set of interface patterns that transparently communicate important system facts to users (e.g., AI involvement, limitations, data use, confidence, and provenance).

    Disparate Impact

    A legal concept: A seemingly neutral rule or practice that disproportionately negatively affects a protected group.

    Distributed Training

    Distributed training distributes ML training across multiple GPUs or machines – necessary for models that don't fit on a single GPU.

    Distribution Shift

    A change in statistical distribution between training and production data that degrades model performance.

    Diversity in Recommendations

    Strategies for increasing variety in recommendation lists to avoid filter bubbles and improve user satisfaction.

    DP-SGD (Differentially Private SGD)

    A training algorithm integrating Differential Privacy into Stochastic Gradient Descent – through gradient clipping and calibrated noise.

    DPO (Direct Preference Optimization)

    A simplified alternative to RLHF that optimizes models directly on preference data, without separate reward model or RL training.

    DPO (Direct Preference Optimization)

    A simplified alternative to RLHF that directly embeds human preferences into model weights without training a separate reward model – simpler, more stable, and cheaper.

    DreamBooth

    A fine-tuning method that personalizes diffusion models with just a few images (3-5) of a subject to generate it in arbitrary contexts.

    DROP (Discrete Reasoning Over Paragraphs)

    A reading comprehension benchmark that requires numerical reasoning over text passages (counting, sorting, arithmetic).

    Dropout

    A regularization technique that randomly deactivates neurons during training.

    E

    E5 Embedding

    E5 is a family of embedding models from Microsoft Research created through text-to-text contrastive training.

    Early Stopping

    Regularization technique that stops training when validation loss increases.

    ELBO (Evidence Lower Bound)

    ELBO is the lower bound on the log-likelihood in variational inference – the central objective function for VAEs and diffusion models.

    Elo Rating

    A rating system for measuring relative abilities, originally from chess – now standard for LLM leaderboards.

    ELU (Exponential Linear Unit)

    An activation function that exponentially dampens negative values toward a negative saturation value – smoother than ReLU with zero-mean outputs.

    Embedding

    An embedding is a dense vector representation of discrete data (words, images, users, products) where semantically similar objects lie close together in vector space.

    Embedding Models

    Specialized models that convert text, images, or other data into dense vectors that capture semantic meaning and enable similarity search.

    Embeddings

    Vector representations of data (words, sentences, images) in a lower-dimensional space that capture semantic similarity.

    Emergent Abilities

    Capabilities that suddenly appear in LLMs only above a certain model size, without being observable in smaller models.

    Emotion Recognition

    Emotion Recognition detects emotional states (joy, anger, sadness) from speech, facial expressions, or text – with focus on audio-based analysis.

    Encoder

    The part of a model that transforms input data into a compressed representation.

    Encoder-Decoder

    Architecture that encodes input into a representation and decodes output from it.

    Energy-Based Model (EBM)

    Energy-based models assign energy values to data points – low energy for likely data, high for unlikely – and generate by energy minimization.

    Ensemble Learning

    Combining multiple models to achieve better predictions than any single model alone.

    Entity Extraction

    The automatic identification and classification of named entities in text.

    Entity Linking

    Entity Linking is the process of mapping text mentions of entities to unique entries in a knowledge base (e.g., Wikidata).

    Epistemic vs. Aleatoric Uncertainty

    Epistemic uncertainty arises from lack of knowledge (reducible with more data); aleatoric uncertainty is inherent noise in data (irreducible).

    Epoch (Machine Learning)

    In machine learning, an epoch refers to one complete pass of a learning algorithm through the entire training dataset — i.e. the moment in which every training example has been used exactly once to update the model weights.

    Equalized Odds

    Fairness criterion: A model satisfies equalized odds when True Positive Rate and False Positive Rate are equal across all protected groups.

    Error Analysis

    Systematic examination of model errors to identify patterns and improvement opportunities.

    EU AI Act

    The world's first comprehensive legal regulation for Artificial Intelligence, adopted by the EU Parliament in 2024, establishing risk-based requirements for AI systems.

    Evaluation Harness

    A framework for systematically evaluating model performance across various metrics and test cases.

    Expected Calibration Error (ECE)

    The standard metric for measuring classifier calibration quality – the weighted average of the difference between confidence and accuracy across bins.

    Explainability

    The ability to make an AI model's decisions or predictions understandable to humans.

    Explainability UX Patterns

    Explainability UX patterns are interface patterns that help users understand why an AI system produced an output, what evidence it used, and what actions it took (or refused).

    Explainable AI (XAI)

    Explainable AI (XAI) comprises methods and product practices that make AI outputs understandable, traceable, and auditable.

    Exploration vs. Exploitation

    The fundamental RL dilemma: Should the agent exploit known good actions (exploitation) or explore new options (exploration)?

    Exponential Moving Average (EMA)

    Technique that maintains an exponentially weighted average of model weights over training – the EMA model often generalizes better than the final model.

    F

    Fairness

    The goal that AI systems treat all groups equitably and don't cause systematic discrimination.

    Faithfulness

    How accurately an LLM output corresponds to the provided sources and instructions.

    FastText

    Facebook's open-source library for efficient text classification and word embeddings with sub-word information.

    Feature Extraction

    The process of automatically deriving relevant features from raw data.

    Federated Learning

    A decentralized training approach where models are trained locally on many devices, and only model updates (not raw data) are sent to a central server – training without data centralization.

    Feed-Forward Network (FFN)

    In the Transformer context: a two-layer MLP applied independently to each position after the attention layer.

    Feedback Loop

    A system where outputs are fed back to influence future inputs or decisions.

    Few-Shot Learning

    A technique where the model is given few examples in the prompt to demonstrate the desired output format or task.

    Fine-Tuning

    Adapting a pre-trained model to a specific task by further training it on task-specific data.

    Flash Attention

    An optimized implementation of the attention mechanism that reduces memory access and maximizes GPU efficiency through tiling and kernel fusion.

    Flow Matching

    Flow matching is a generative modeling technique that learns straight transport paths between noise and data distributions – faster and more stable than classical diffusion.

    Flux

    A new open-source image generation model from Black Forest Labs (ex-Stability AI) that competes in quality with Midjourney.

    Focal Loss

    Modified cross-entropy loss that up-weights hard-to-classify examples and down-scales easy examples.

    Forward Chaining

    An inference strategy that starts from known facts and applies rules to derive new facts until the goal is reached.

    Forward Pass

    Computing the model output by forward propagating through all layers.

    Foundation Model

    A large model pre-trained on broad data that can be adapted for many downstream tasks.

    FSDP (Fully Sharded Data Parallel)

    PyTorch's native implementation of parameter sharding – distributes model parameters, gradients, and optimizer states across GPUs for memory-efficient training.

    Function Calling

    The ability of LLMs to call external functions in a structured way – the model decides which function with which parameters, execution happens externally.

    Function Calling (LLM)

    Function Calling enables LLMs to generate structured function calls – the bridge between natural language and APIs, databases, or external tools.

    Fuzzy Inference System

    A fuzzy inference system uses fuzzy logic rules to map inputs to outputs when concepts are imprecise (e.g., "high risk," "medium demand").

    G

    G-Eval

    An LLM evaluation framework that uses chain-of-thought reasoning and weighted probabilities for more nuanced scoring.

    Gaussian Mixture Model (GMM)

    A probabilistic model representing data as a mixture of Gaussian distributions.

    GDPR AI

    The application of GDPR principles to AI systems, especially in automated decision-making and profiling.

    GELU (Gaussian Error Linear Unit)

    A smooth activation function that weights inputs by their cumulative normal distribution probability – standard in BERT, GPT-2, and many Transformers.

    Gemini

    Google's multimodal AI model – natively built for text, image, audio, video, and code, not retrofitted together.

    Gemini 3.1 Pro

    Google's 2026 flagship LLM with natively multimodal architecture and 2M-token context.

    Gemma 4

    Open-weight model family by Google for on-device and edge inference, ranging from 2B to 27B parameters.

    Generalization

    A model's ability to perform well on new, unseen data.

    Generative Adversarial Network (GAN)

    Architecture with two competing networks for generating realistic data.

    Generative AI

    AI models that create new content – text, images, audio, code, or structured data.

    GitHub Copilot

    An AI coding assistant from GitHub/Microsoft that provides real-time code suggestions directly in the editor based on OpenAI models.

    GloVe

    GloVe (Global Vectors for Word Representation) is a word embedding method that uses global co-occurrence statistics of a text corpus to generate semantic word vectors.

    Google AI Overviews

    Google's AI-generated summaries at the top of search results – synthesized from multiple sources.

    Google DeepMind

    Google's merged AI research division, formed from DeepMind and Google Brain, responsible for Gemini and groundbreaking AI research.

    Governance

    Governance is the set of roles, rules, processes, and controls that ensure a system is used responsibly and predictably—aligned with risk, compliance, and business objectives.

    GPQA (Graduate-Level Google-Proof Q&A)

    A benchmark with 448 expert-level questions from physics, biology, and chemistry, so difficult that even PhDs without expertise only achieve 30%.

    GPQA Diamond

    High-difficulty science benchmark with PhD-level questions in biology, physics, and chemistry.

    GPT (Generative Pre-trained Transformer)

    A family of large language models by OpenAI based on the Transformer architecture.

    GPT-4

    OpenAI's most advanced multimodal language model that can process text, images, and code, serving as the benchmark for LLM performance.

    GPT-4V (Vision)

    OpenAI's GPT-4 extension with image understanding – the breakthrough that taught ChatGPT to "see".

    GPT-5

    OpenAI's most advanced language model (2026), combining multimodal processing, enhanced reasoning, and native tool use in one model.

    GPT-5.4

    OpenAI's 2026 flagship LLM with thinking mode, multimodal processing, and agent-native architecture.

    GQA (Grouped-Query Attention)

    An attention variant where multiple Query heads share a single Key-Value pair to reduce KV-Cache size and memory consumption.

    Grad-CAM (Gradient-weighted Class Activation Mapping)

    XAI method that generates heatmaps showing which image regions a CNN considers most important for its decision.

    Gradient Accumulation

    Gradient accumulation sums gradients over multiple mini-batches before an optimization step – simulates larger batch sizes without more GPU memory.

    Gradient Centralization (GC)

    Simple technique that subtracts the mean of gradients before applying them to weights – improves generalization at zero cost.

    Gradient Checkpointing

    Gradient checkpointing saves GPU memory by discarding intermediate activations and recomputing them during the backward pass – trades compute for memory.

    Gradient Clipping

    Gradient clipping limits the norm or value of gradients during training to prevent exploding gradients.

    Gradient Descent

    An optimization algorithm that iteratively adjusts parameters in the direction of steepest descent of the loss function.

    Gradient Noise

    The natural noise in gradient estimates from mini-batch sampling – acts as implicit regularization and helps find better minima.

    Graph Attention Network (GAT)

    Graph Attention Networks use attention mechanisms during message passing to automatically learn which neighbor nodes are more important.

    Graph Classification

    The task of assigning an entire graph to a class based on its structure and node properties.

    Graph Convolutional Network

    A GNN variant that generalizes convolution operations to graphs to learn node representations.

    Graph Isomorphism Network

    A GNN with maximum discriminative power among message-passing architectures, theoretically grounded by the Weisfeiler-Leman test.

    Graph Neural Network

    A class of neural networks that operate directly on graph structures, learning node, edge, and graph-level properties.

    Graph Search

    Graph search is the process of exploring a graph to find a target node, a path, or an optimal solution under a defined objective (e.g., shortest path, lowest cost).

    Graph Transformer

    Graph Transformers combine Transformer architectures with graph structures, applying self-attention directly on graph nodes.

    GraphSAGE

    An inductive GNN framework that learns scalable node representations by sampling and aggregating neighborhoods.

    Greedy Algorithm

    An algorithm that makes the locally optimal choice at each step.

    Greedy Best-First Search

    Greedy Best-First Search expands the node that appears closest to the goal using only a heuristic score h(n), ignoring the cost accumulated so far.

    Greedy Decoding

    A decoding strategy that always selects the token with the highest probability – deterministic, but often repetitive.

    Grid Search

    Hyperparameter tuning method that systematically tries all combinations of a predefined parameter space.

    Griffin (Google)

    Google's hybrid architecture combining linear recurrences (gated RNN) with local attention, productionized in RecurrentGemma.

    Grok

    xAI's LLM with real-time access to X (Twitter), known for humorous, uncensored style and current information.

    Ground Truth

    The actual, correct data or labels used as reference for model training and evaluation.

    Grounding

    Techniques for anchoring LLM outputs in verifiable sources – the model explicitly references documents, data, or facts rather than generating freely.

    Group Normalization

    Group Normalization divides channels into groups and normalizes within each group – works batch-independently and is ideal for small batch sizes.

    GRPO (Group Relative Policy Optimization)

    GRPO is an RL alignment method that works without a separate reward model – instead, groups of responses are evaluated relative to each other.

    GRU (Gated Recurrent Unit)

    A simplified RNN architecture with gates to control information flow.

    GRU (Gated Recurrent Unit)

    GRU is a simplified RNN architecture with update and reset gates – fewer parameters than LSTM with comparable performance.

    GSM8K

    A benchmark with 8,500 grade-school math problems that require multi-step reasoning.

    Guardrails

    Mechanisms and systems that monitor, filter, and correct AI outputs to ensure they stay within defined boundaries for safety, ethics, and brand guidelines.

    Guardrails (AI)

    Mechanisms for constraining and validating AI outputs – prevents toxic, incorrect, or off-brand content and uncontrolled agent actions.

    Guidance Scale

    Guidance scale is a parameter (commonly in classifier-free guidance) that controls how strongly a diffusion model follows the text prompt versus generating more diverse outputs.

    H

    Hallucination (AI)

    The phenomenon where AI models generate plausible-sounding but factually incorrect or fabricated information that was not contained in the training data.

    Hallucination Detection

    Methods and tools for detecting "hallucinations" – false or fabricated information that LLMs present as facts with high confidence.

    Hallucination Rate

    The percentage of AI-generated outputs containing information not supported by facts or sources.

    HellaSwag

    A benchmark for common-sense reasoning where LLMs must choose the most plausible continuation of a scenario.

    HELM (Holistic Evaluation of Language Models)

    A comprehensive evaluation framework from Stanford that assesses LLMs on dozens of dimensions like accuracy, fairness, robustness, and efficiency simultaneously.

    Heterogeneous Graph

    A graph with different types of nodes and/or edges, modeling various entity types and relationships.

    Heuristic

    A heuristic is a practical scoring rule or estimate that guides search or decision-making toward promising options without guaranteeing optimality.

    Heuristic Search

    Heuristic search is a family of search algorithms that use a heuristic (a guiding estimate) to explore a problem space more efficiently than uninformed search.

    High-Level Representation

    A high‑level representation abstracts raw data into more meaningful structures (symbols, concepts, latent variables, or summaries).

    HNSW

    Hierarchical Navigable Small World – a graph-based algorithm for efficient approximate nearest neighbor search.

    Hold-Out Validation

    Simplest evaluation method: dataset is split once into training and test set (e.g., 80/20).

    HuBERT

    HuBERT (Hidden-Unit BERT) is a self-supervised speech model from Meta that learns high-quality speech representations by predicting discretized audio clusters.

    Hugging Face

    The leading open-source platform for machine learning, functioning as the "GitHub for AI" and hosting over 500,000 models.

    Human Evaluation

    The evaluation of AI outputs by human annotators – the gold standard for quality measurement, but expensive and slow.

    HumanEval

    A benchmark for code generation with 164 Python programming tasks, evaluated by Pass@k (code must pass tests).

    Hybrid AI System

    A hybrid AI system combines multiple AI paradigms—typically symbolic/rule-based methods with statistical/ML models (including LLMs).

    Hybrid Recommender System

    A recommendation system combining multiple approaches (collaborative filtering, content-based, knowledge-based) for better recommendation quality.

    Hybrid Search

    A search method that combines lexical search (BM25/keyword) with semantic search (embeddings) to leverage the strengths of both approaches.

    Hyena

    A subquadratic attention replacement based on long convolutions and data-controlled gates, scaling O(N log N) instead of O(N²).

    Hyperparameter

    Configuration settings chosen before training that influence how a model learns.

    Hyperparameter Optimization

    The systematic process of finding the best hyperparameter settings for an ML model.

    Hypothesis Generation

    Hypothesis generation is producing candidate explanations (or candidate solutions) that could plausibly account for observed evidence.

    I

    Identity-Preference Optimization

    An alignment method that extends DPO for more stable training.

    Ideogram

    A text-to-image model that excels at outstanding text rendering capabilities in generated images.

    IFEval (Instruction Following Evaluation)

    A benchmark that tests how well LLMs follow explicit format instructions (e.g., "Answer in exactly 3 paragraphs", "Start each sentence with a capital letter").

    Image Captioning

    Automatic generation of text descriptions for images.

    Image Classification

    Assigning an entire image to one or more predefined categories using a machine learning model.

    Image Generation

    Image generation is the automatic creation of images by AI models based on text prompts, other images, or other inputs.

    Image Segmentation

    Dividing an image into meaningful regions or objects at the pixel level.

    Image Understanding

    AI's ability to not just recognize objects but understand the semantic context and meaning of images.

    Image-to-Image

    Models that transform an input image into a modified or transformed output image.

    Image-to-Image (img2img)

    Image-to-image transforms an input image based on a text prompt and a denoise strength parameter – from subtle changes to complete redesign.

    Image-to-Text

    AI generation of natural language descriptions for images – from simple captions to detailed analyses.

    Image-to-Video

    AI technology that transforms static images into moving videos by adding realistic animation, camera movement, and scene development.

    ImageBind

    Meta's multimodal embedding model that unifies six modalities (image, text, audio, video, depth, thermal) in a shared vector space.

    Imitation Learning

    An ML approach where an agent learns by observing and imitating expert behavior.

    Implicit Feedback

    User signals derived from behavior (clicks, dwell time, purchases) rather than explicit ratings.

    In-Context Learning

    The ability of LLMs to learn from the context of the prompt without changing model weights – the foundation of modern prompting techniques.

    Inductive Reasoning

    A form of logical inference where general rules or patterns are derived from specific observations—the conclusion is probable but not guaranteed.

    Inference

    The process of applying a trained AI model to new inputs to generate predictions or outputs.

    Inference Engine

    The core component of an expert system that applies logical rules to a knowledge base to derive new facts or make decisions.

    Inference Optimization

    The collection of all techniques for accelerating and improving efficiency of LLM inference, including quantization, batching, caching, and hardware optimization.

    Inference-Time Compute

    A technique where AI models use additional compute time during response generation (inference) to achieve better results through longer "thinking."

    Information Extraction

    Automatically extracting structured information (entities, relations, facts) from unstructured text.

    Information Retrieval

    Finding relevant documents or information from a large collection.

    Inpainting

    Filling in missing or masked regions of an image with plausible content.

    Instance Normalization

    Instance Normalization normalizes each feature map (channel) of each sample individually – standard in style transfer and image generation.

    Instruction Tuning

    A fine-tuning method where models are trained on (instruction, response) pairs to follow natural language instructions – the step that turns base models into helpful assistants.

    Instructor Embedding

    An embedding model that uses task-specific instructions in the prompt to optimize embeddings for different tasks.

    Integrated Gradients

    XAI method that computes feature attributions by integrating gradients along a path from a baseline to the actual input.

    Intelligent Tutoring System

    An Intelligent Tutoring System (ITS) is an AI-driven learning system that personalizes instruction, feedback, and practice to a learner's needs.

    Intent Classification

    Determining the intention or goal behind a user query.

    Intent Recognition

    AI capability to recognize the intent behind a user utterance.

    Interpretability

    The degree to which humans can understand how a model arrives at its decisions.

    Interpretable Machine Learning

    ML models that are inherently understandable – their decision logic can be directly inspected without additional explanation methods.

    Inverse Reinforcement Learning (IRL)

    IRL learns the reward function from observed expert behavior – instead of specifying a reward function, it is inferred from demonstrations.

    IoU (Intersection over Union)

    A metric measuring the overlap between a predicted and ground truth region, calculated as intersection divided by union.

    IP-Adapter

    IP-Adapter enables image prompts for diffusion models – a reference image controls style, composition, or face identity of the generation.

    Iterative Deepening

    Iterative deepening is a search strategy that repeatedly runs depth-limited search with increasing depth limits until it finds a solution or exhausts a budget.

    Iterative Prompting

    A prompting approach that refines results through multiple successive prompts.

    K

    K-Armed Bandit

    The k-armed bandit problem models choosing among k options to maximize reward while balancing exploration vs exploitation.

    K-Fold Cross-Validation

    K-fold cross-validation is an evaluation method where data is split into k parts; the model trains on k−1 folds and is tested on the remaining fold.

    K-Fold Cross-Validation

    Cross-validation variant that splits the dataset into k equal parts and trains k models.

    K-Means Clustering

    K-means is an unsupervised algorithm that partitions data into k clusters by minimizing within-cluster distance to cluster centroids.

    K-Means++

    K-means++ is an initialization method for k-means that chooses starting centroids to improve convergence and cluster quality.

    K-Shot Prompting

    K-shot prompting provides k examples in the prompt to guide the model's behavior (format, reasoning pattern, tone).

    Kernel (ML)

    In ML, a kernel is a function that measures similarity between data points, enabling algorithms to operate in implicit high-dimensional feature spaces.

    Kernel Trick

    The kernel trick allows algorithms to compute dot products in an implicit higher-dimensional space without explicitly transforming the data.

    Kling AI

    Kuaishou's Chinese text-to-video model that competes with Sora and generates realistic videos up to 2 minutes.

    KNN (k-Nearest Neighbors)

    KNN is a method that predicts outcomes based on the k most similar examples in a dataset.

    KNN Search

    KNN search retrieves the k closest vectors to a query vector under a distance metric.

    Knowledge Base (KB)

    A knowledge base is a curated repository of information (articles, FAQs, policies) designed for retrieval and reuse.

    Knowledge Cutoff

    Knowledge cutoff is the point in time after which a model's training data does not include new information.

    Knowledge Distillation

    A technique where a smaller "student" model is trained to imitate the behavior of a larger "teacher" model, transferring knowledge.

    Knowledge Distillation

    A technique for transferring knowledge from a large, complex "teacher" model to a smaller, more efficient "student" model that achieves similar performance with lower resource consumption.

    Knowledge Graph Embedding

    Knowledge Graph Embeddings learn low-dimensional vector representations for entities and relations of a Knowledge Graph.

    Knowledge Tracing

    Knowledge tracing models a learner's evolving mastery of skills over time using their interactions (answers, attempts, time, hints).

    KTO (Kahneman-Tversky Optimization)

    An alignment method that only needs binary feedback (good/bad) instead of pairwise preferences, inspired by Prospect Theory.

    KV Cache (Key-Value Cache)

    A caching mechanism that stores the Key and Value tensors of attention layers to avoid redundant computations during autoregressive generation.

    L

    L1 Regularization (Lasso)

    L1 regularization adds a penalty proportional to the absolute value of model weights, encouraging sparsity (many weights become exactly zero).

    L2 Regularization (Ridge)

    L2 regularization adds a penalty proportional to the square of model weights, encouraging smaller weights without forcing exact zeros.

    Label Leakage

    Label leakage describes the situation in which a machine-learning model's training dataset contains features that carry direct or indirect information about the target variable (the label) — information that simply would not be available at inference time in production.

    Label Smoothing

    Label smoothing is a training technique that replaces hard labels (0 or 1) with slightly softened targets (e.g., 0.9 and 0.1).

    LAMB (Layer-wise Adaptive Moments for Batch Training)

    Optimizer for extremely large batch sizes (up to 64K+) that adapts learning rates per layer, enabling stable training with massive parallelization.

    Language Model (LM)

    A language model is a model that estimates the probability of sequences of tokens, enabling tasks like prediction, generation, and scoring.

    Large Language Model (LLM)

    A large neural network trained on vast amounts of text to understand and generate human-like text.

    Large Language Model (LLM)

    A large neural network trained on massive amounts of text that can understand and generate human-like text.

    LARS (Layer-wise Adaptive Rate Scaling)

    Optimizer that combines SGD with layer-wise learning rate adaptation – enables stable training with large batch sizes for computer vision.

    Late Interaction

    A retrieval paradigm where query and document tokens are encoded independently but interact via token-level similarity only at search time.

    Latent Diffusion

    Latent diffusion performs the diffusion process in compressed latent space instead of pixel space – 10-100x faster with comparable quality.

    Latent Space

    A compressed, lower-dimensional space where a model stores internal representations of data.

    Latent Variable

    A latent variable is an unobserved variable inferred from observed data, used to explain hidden structure.

    Layer Dropping

    A compression technique that removes entire transformer layers from a trained model – the simplest way to make an LLM smaller and faster.

    Layer Normalization

    Layer normalization is a technique that normalizes activations within a layer to stabilize and speed up training in deep networks.

    Leaky ReLU

    A variant of ReLU that lets negative values pass with a small factor (e.g., 0.01) instead of setting them to 0 – prevents the dead neuron problem.

    Learning Objectives

    Learning objectives are clear, measurable statements of what a learner should be able to do after instruction.

    Learning Rate

    A hyperparameter that determines how much to adjust model weights at each training step.

    Learning Rate Range Test

    Diagnostic method that exponentially increases the learning rate while observing loss – finds the optimal LR range in a single training run.

    Learning Rate Schedule

    A learning rate schedule changes the learning rate over training (warmup, decay, cosine, step, exponential).

    Learning Rate Warmup

    Training technique that slowly ramps the learning rate from near zero to the target value in the first steps/epochs.

    Learning to Rank (LTR)

    ML approaches for learning optimal ranking functions for search results, recommendations, or feeds.

    Lemmatization

    Linguistically informed reduction of words to their base form (lemma) considering part of speech and context.

    Length Penalty

    Length penalty is a decoding adjustment that prevents generation algorithms (especially beam search) from unfairly preferring overly short sequences.

    Leonardo AI

    An AI image generation platform focused on gaming, concept art, and professional creative workflows.

    LIME (Local Interpretable Model-agnostic Explanations)

    LIME (Local Interpretable Model-agnostic Explanations) explains an individual model prediction by fitting a simple, interpretable surrogate model around that specific input.

    Linear Attention

    Attention variants that reduce the quadratic O(N²) complexity to linear O(N) through kernel approximation or alternative computation order.

    Link Prediction

    Link Prediction predicts which connections between nodes in a graph are likely to exist or will form.

    Lion (Evolved Sign Momentum)

    Optimizer discovered by Google Brain through AutoML search that only uses the sign of gradients – simpler than Adam, often comparable results.

    Lip Sync AI

    AI technology that automatically adjusts lip movements in videos to new audio tracks so spoken words look natural.

    LiveCodeBench

    Contamination-free coding benchmark that continuously adds new programming tasks from competitions.

    Llama

    Meta's open-weight LLM family that serves as foundation for thousands of fine-tuned models and has democratized open-source AI.

    LLM Security

    The field of security research and practices specifically for Large Language Models and generative AI.

    LLM-as-a-Judge

    LLM-as-a-judge uses a model to evaluate other model outputs against rubrics like correctness, groundedness, style, and safety.

    LLM-as-Judge

    An evaluation method where an LLM evaluates the quality of outputs from another (or the same) model.

    LMSYS

    LMSYS (Large Model Systems Organization) is a research organization that operates the famous Chatbot Arena benchmark and enables LLM performance comparisons through human evaluations.

    Log-Likelihood

    Log-likelihood is the logarithm of the likelihood that a probabilistic model assigns to observed data.

    Log-Sum-Exp

    Log-sum-exp is a numerical trick for computing log(∑ᵢ eˣⁱ) stably without overflow/underflow.

    Logit

    A logit is the raw, unnormalized score a model outputs before converting to probabilities (e.g., via softmax).

    Logit Bias

    Logit bias is a technique to increase or decrease the likelihood of specific tokens during generation by adjusting their logits.

    Long Context

    Long context refers to an LLM's ability to accept and use a large number of input tokens in a single request.

    Lookahead Optimizer

    Meta-optimizer that maintains two sets of weights: "fast" weights (normal optimizer) and "slow" weights that are periodically interpolated toward the fast ones.

    LoRA (Low-Rank Adaptation)

    An efficient fine-tuning method that trains only small adapter matrices instead of the entire model, drastically reducing memory and training costs.

    LoRA Fine-Tuning

    An efficient fine-tuning method that only trains small "adapter" matrices instead of all model weights – typically <1% of parameters with comparable performance.

    LoRA vs Full Fine-Tuning

    A comparison between adapting a model via LoRA adapters versus updating all parameters (full fine-tuning).

    Loss Function

    A mathematical function that measures how good or bad a model's predictions are.

    Loss Landscape

    The multi-dimensional surface representing loss as a function of model parameters – the "mountain" that gradient descent descends.

    Lottery Ticket Hypothesis

    The hypothesis that every large neural network contains a small subnetwork ("winning ticket") that, trained alone with the same initialization, can achieve the full performance of the large network.

    Lovable

    An AI platform that generates complete web applications from natural language descriptions – including frontend, backend, and deployment.

    LSTM (Long Short-Term Memory)

    LSTM is an RNN variant with gate mechanisms (forget, input, output gate) enabling learning of long-term dependencies in sequences.

    Luma AI

    An AI company specialized in 3D capture and video generation, known for Dream Machine and NeRF technology.

    M

    Machine Learning

    A subfield of AI where systems learn from data to make predictions or decisions without being explicitly programmed.

    Machine Translation

    Automatic translation of text or speech from one natural language to another using an AI system.

    Machine Unlearning

    Techniques to remove the influence of specific training data from an ML model without retraining the entire model.

    Mamba

    Mamba is a neural network architecture built on selective state space models (SSMs) designed to model long sequences efficiently with linear scaling in sequence length.

    Manus AI

    An autonomous general-purpose AI agent capable of independently executing complex tasks like research, coding, and data analysis.

    Masked Language Modeling (MLM)

    MLM is a training objective where a model predicts masked-out tokens in a text sequence (e.g., replacing words with a special [MASK] token).

    Mastery Learning

    Mastery learning is an instructional approach where learners progress only after demonstrating mastery of a skill or objective, with targeted remediation as needed.

    MATH Benchmark

    A benchmark with 12,500 competition mathematics problems (from algebra to number theory) that tests advanced mathematical reasoning.

    Matrix Factorization

    A technique for decomposing a matrix into the product of smaller matrices.

    Matryoshka Embedding

    An embedding training approach where the first N dimensions of a vector are already usable – enabling flexible compression without quality loss.

    Matryoshka Representation Learning (MRL)

    Matryoshka Representation Learning (MRL) is an embedding approach that encodes information at multiple granularities so a single embedding can be truncated to smaller dimensions while remaining useful for downstream tasks.

    Max Tokens

    An API parameter that limits the maximum number of tokens an LLM can generate in a response.

    MBPP (Mostly Basic Python Problems)

    A benchmark with 974 simple Python programming tasks that test basic programming abilities of LLMs.

    Mechanistic Interpretability

    Mechanistic interpretability is the effort to reverse engineer neural networks by identifying internal mechanisms (features, circuits, algorithms) that produce outputs.

    Mel Spectrogram

    A Mel spectrogram is a visual representation of audio frequencies on the Mel scale – the standard input for modern speech and audio AI models.

    Membership Inference Attack

    An attack that determines whether a specific data point was included in the training dataset of an ML model.

    Memory Augmentation

    Techniques for extending the effective context of LLMs beyond the token limit – enables memory of previous conversations, facts, and user preferences.

    Message Passing

    Message Passing is the fundamental computation paradigm of Graph Neural Networks where nodes exchange information with their neighbors.

    Message Passing Neural Network

    A unifying framework for GNNs where nodes receive messages from neighbors, aggregate them, and update their representations.

    Meta AI

    The AI research division of Meta (Facebook), known for open-source release of Llama and leading research in multimodality.

    Meta-Learning

    Meta-learning ("learning to learn") aims to train models or systems that adapt quickly to new tasks with limited data or few examples.

    Metaprompt

    A metaprompt is a higher-level prompt that defines the rules, structure, and constraints for generating other prompts or for a whole class of outputs.

    METEOR

    An evaluation metric for machine translation that combines unigram matching with stemming, synonyms, and word order.

    Metric Learning

    Metric learning trains models to learn a distance function (embedding space) where "similar items are close" and "dissimilar items are far apart."

    Midjourney

    The leading commercial text-to-image model, known for highly aesthetic, artistic image generation via Discord.

    Minimum Description Length

    Minimum Description Length (MDL) is a principle for model selection that prefers the model that yields the shortest total description of the model plus the data encoded under it.

    Mish Activation Function

    Mish = x · tanh(softplus(x)) – a smooth, self-regularizing activation function used in YOLOv4 and some CNNs.

    Mistral AI

    A French AI startup developing open-weight models, considered the European alternative to US AI companies.

    Mixed Precision Training

    Mixed precision training uses a mix of lower-precision (e.g., FP16/BF16) and single-precision (FP32) representations to speed up training while preserving accuracy.

    Mixtral

    Mistral AI's Mixture-of-Experts model that achieves GPT-4-level performance efficiently by activating only a portion of parameters.

    Mixture of Experts

    An AI architecture where a large model consists of specialized "expert" subnetworks, of which only the most relevant ones are activated for each query – enabling efficiency with high performance.

    Mixup

    Data augmentation technique that creates new training examples by linearly interpolating between two existing examples.

    MLCommons

    Industry consortium developing open benchmarks (MLPerf), datasets, and best practices for ML performance.

    MMLU (Massive Multitask Language Understanding)

    A multiple-choice benchmark with 57 subject areas (STEM, humanities, social sciences) for measuring LLM world knowledge.

    MMLU-Pro

    Extended MMLU benchmark with more challenging multiple-choice questions and reduced guessing advantage.

    MMR (Maximal Marginal Relevance)

    MMR is a retrieval diversification method that selects items that are both relevant to the query and non-redundant with each other.

    Mode Collapse

    Mode collapse occurs when a generative model produces only a limited diversity of outputs, ignoring large parts of the data distribution.

    Model Card

    A model card is a standardized documentation artifact describing a model's intended use, limitations, training data context, evaluation results, and ethical/safety considerations.

    Model Cards

    Standardized documentation for ML models describing training, capabilities, limitations, bias analyses, and recommended use cases.

    Model Collapse

    Model collapse is a degradation phenomenon where training on synthetic/model-generated data (especially repeatedly) can reduce diversity and quality, causing the model to "collapse" toward narrower outputs.

    Model Compression

    Techniques for reducing the size of ML models while maintaining performance.

    Model Distillation

    A technique where a large "teacher" model transfers its knowledge to a smaller, more efficient "student" model.

    Model Drift

    Model drift is performance degradation over time due to changes in data distributions, user behavior, environment, or upstream systems.

    Model Extraction

    Attacks that attempt to reconstruct or clone a proprietary ML model through systematic queries.

    Model Extraction Attack

    An attack where an adversary creates a functionally equivalent copy of an ML model through systematic API queries.

    Model Governance

    Processes and controls for the entire lifecycle of ML models: Development, validation, deployment, monitoring, and retirement.

    Model Merging

    Techniques for combining multiple trained models into a single model that unifies the strengths of all source models – without additional training.

    Model Monitoring

    Continuous monitoring of ML models in production for performance degradation, drift, fairness, and anomalies.

    Model Serving

    The infrastructure and processes for deploying trained ML models as API endpoints for real-time or batch inference in production environments.

    Model Simplification

    Model simplification reduces complexity to improve interpretability, efficiency, robustness, or deployment feasibility.

    Model Spec

    A model spec is a written specification describing how a model should behave—including intended behavior, constraints, and principles—often used to guide training, alignment, and deployment policy.

    Model Watermarking

    Techniques for embedding invisible markers in ML models or their outputs to prove authorship or detect unauthorized use.

    Model-Based Learning

    Model‑based learning learns a model of the environment (dynamics) and uses it for planning, prediction, or control.

    Model-Based Reinforcement Learning

    Model-based RL learns a model of the environment (dynamics model) and plans with this model instead of only learning from direct experience.

    Momentum

    Acceleration technique for gradient descent that accumulates past gradient directions to converge faster and escape local minima.

    Monte Carlo Dropout (MC Dropout)

    Monte Carlo Dropout estimates model uncertainty by keeping dropout active at inference time and performing multiple stochastic forward passes, then aggregating results.

    Monte Carlo Tree Search (MCTS)

    MCTS is a planning algorithm that builds a decision tree through random simulations and identifies the most promising actions.

    MT-Bench

    A multi-turn conversation benchmark for LLMs with 80 questions across 8 categories, evaluated by GPT-4-as-Judge.

    MTEB

    The Massive Text Embedding Benchmark – a comprehensive benchmark for text embedding models across 56+ datasets in 8 tasks.

    Multi-Agent System

    System of multiple specialized AI agents that collaborate to solve complex tasks that a single agent could not handle.

    Multi-Agent Systems

    Systems of multiple specialized AI agents working together – each agent has a role (researcher, writer, critic) and they communicate to solve complex tasks.

    Multi-Armed Bandit

    An algorithm for sequential decision-making that balances exploration and exploitation.

    Multi-Head Attention (MHA)

    Multi-Head Attention runs multiple attention computations in parallel with different learned projections and combines the results.

    Multi-Objective Optimization

    Multi-objective optimization (Pareto optimization) is optimization with multiple objectives that often conflict, where you typically seek Pareto-optimal solutions rather than one single optimum.

    Multi-Query Attention (MQA)

    Multi-Query Attention shares a single key-value head across all query heads – reduces KV cache by up to 8x with minimal quality loss.

    Multi-Teacher Distillation

    A distillation method where a student model learns from multiple specialized teacher models simultaneously – combines expertise from different domains.

    Multi-Turn Conversation

    A multi-turn conversation is an interaction where context and intent evolve across multiple exchanges rather than a single query-response.

    Multimodal

    AI systems that can process and understand multiple data types (text, image, audio, video) simultaneously.

    Multimodal AI

    AI systems that can process, understand, and generate multiple data types such as text, images, audio, and video simultaneously.

    Multimodal AI

    AI systems that jointly process text, image, audio, and video and can respond in any modality.

    Multimodal Embeddings

    Vector representations that project different data types (text, images, audio) into the same semantic space – enables cross-modal searching and understanding.

    Multimodal Model

    A multimodal model can process and/or generate across multiple data types (e.g., text, images, audio, video).

    N

    N-gram

    Contiguous sequence of N elements (characters or words) from a text.

    N-gram Blocking

    N-gram blocking is a decoding constraint that prevents a model from generating an n-gram (sequence of n tokens) that has already appeared in the generated text.

    N-Shot Prompting

    N-shot prompting provides N examples in the prompt to teach the model the desired pattern (0-shot = instructions only; few-shot = small N).

    N+1 Tool Call Problem

    The N+1 tool call problem happens when an AI workflow makes one initial tool call and then makes N additional tool calls (often one per retrieved item), causing unnecessary latency and cost.

    NAdam (Nesterov-Accelerated Adam)

    Optimizer that integrates Nesterov momentum into Adam – combines NAG's look-ahead correction with Adam's adaptive learning rates.

    Named Entity Canonicalization

    Entity canonicalization is standardizing different surface forms of the same entity into one canonical representation (e.g., "OpenAI Inc.", "OpenAI", "Open AI").

    Named Entity Linking (NEL)

    Named Entity Linking connects an entity mention in text (e.g., "OpenAI", "Apple", "Paris") to a specific canonical entity ID in a knowledge base (internal or external).

    Named Entity Recognition (NER)

    Identifying and classifying named entities in text (people, places, organizations).

    Named Entity Recognition (NER)

    NLP task for identifying and classifying named entities in text.

    Nano Banana

    Codename for Google's image editing model (Gemini 2.5 Flash Image) enabling pixel-precise edits via prompt.

    Nano Banana 2

    Google's second-generation AI image generation model, based on Gemini 3.1 Flash Image, combining Pro quality with Flash speed.

    Narrow AI / Weak AI

    Narrow AI (also "weak AI") is AI designed to perform a specific task or a limited set of tasks, rather than general-purpose reasoning across domains.

    Natural Gradient

    Natural gradient is an optimization approach that accounts for the geometry of parameter space, often leading to more efficient steps than standard gradient descent in some probabilistic models.

    Natural Language Generation

    Natural Language Generation (NLG) is the process of producing human-readable text from data, intent, or internal representations (rules, templates, or neural models).

    Natural Language Processing (NLP)

    The field of AI concerned with the interaction between computers and human language.

    Natural Language Understanding

    NLU is the AI capability to understand the meaning, intent, and structure of natural language – not just recognizing words but grasping their meaning.

    Natural Questions (NQ)

    A question answering benchmark from Google with real search queries and Wikipedia articles as answer sources.

    Negative Cycle

    A negative cycle is a cycle in a weighted graph whose total weight is negative, allowing path cost to be reduced indefinitely by looping.

    Negative Prompt

    A negative prompt describes what should NOT appear in a generated image – controls diffusion models by excluding unwanted elements.

    Negative Prompting

    Negative prompting is explicitly telling a generative model what to avoid (content, style, formatting, claims) during generation.

    Negative Transfer

    Negative transfer occurs when transferring knowledge from a pretrained model or source task hurts performance on the target task.

    Negative Weights

    Negative weights are negative edge costs in a weighted graph (i.e., an action/transition reduces total cost).

    NeRF (Neural Radiance Fields)

    NeRFs are neural methods for representing 3D scenes by learning a function that maps spatial coordinates and viewing direction to color and density, enabling novel view synthesis.

    Nesterov Accelerated Gradient (NAG)

    Improved momentum variant that computes the gradient at a "look-ahead" point instead of the current one – faster and more stable convergence.

    Neural Architecture Search (NAS)

    An AutoML approach where algorithms automatically discover the optimal neural network architecture for a given task – the "AI designs AI" approach.

    Neural Audio Codec

    Neural Audio Codecs compress audio into discrete tokens – the bridge between audio and language models that enables music and speech generation.

    Neural Code Search

    Neural code search retrieves relevant code snippets or files using embeddings and semantic matching rather than exact keyword search.

    Neural Collaborative Filtering (NCF)

    A deep learning approach using neural networks instead of classical matrix factorization for collaborative filtering.

    Neural Collapse

    Neural collapse is a phenomenon observed in deep classifiers near the end of training where learned representations and classifier weights exhibit a highly structured geometry (classes become tightly clustered and symmetrically arranged).

    Neural Embeddings

    Neural embeddings are learned vector representations of items (text, users, products, documents) such that distance in vector space reflects similarity.

    Neural Index Rebuild

    A neural index rebuild is re-generating embeddings and rebuilding vector (or hybrid) indexes after changes to content, chunking, or the embedding model.

    Neural Indexing

    Neural indexing is using learned representations and neural methods to build or optimize an index for retrieval (often in vector search or learned sparse retrieval).

    Neural IR (Neural Information Retrieval)

    Neural IR is the use of neural models (embeddings, cross-encoders, rerankers) to retrieve and rank documents based on semantic relevance.

    Neural Network

    A computational model inspired by the structure of biological neurons, consisting of interconnected nodes (neurons) in layers.

    Neural Ordinary Differential Equation (Neural ODE)

    Neural ODEs model transformations as continuous-time dynamics defined by a neural network, enabling certain efficiency and modeling properties.

    Neural Pruning

    Neural pruning removes weights, neurons, attention heads, or entire structures from a model to reduce compute/memory while trying to preserve performance.

    Neural Rendering

    Neural rendering combines neural networks with computer graphics to produce photorealistic images and videos – from 3D scene rendering to style manipulation.

    Neural Reranking

    Neural reranking uses a model (often a cross-encoder) to re-score and reorder an initial set of retrieved candidates based on deeper query–candidate understanding.

    Neural Retrieval

    Neural retrieval is retrieving relevant items using learned representations (dense embeddings and similarity search) instead of relying purely on keyword matching.

    Neural Scaling Laws

    Scaling laws describe empirical relationships showing how model performance tends to improve predictably as you increase compute, data, and/or model parameters—often following power-law-like trends.

    Neural Style Transfer (NST)

    Neural style transfer is a technique that applies the "style" of one image (textures, patterns) to the "content" of another, using neural representations.

    Neural Topic Routing

    Neural topic routing is using ML/embeddings to classify or route an input (query, pageview, conversation) into a topic, workflow, or handler based on semantic meaning.

    Neural Voice Transfer

    AI technology that transfers voice characteristics from one recording to another voice in real-time while preserving the content.

    Neuro-Symbolic "Verification Layer"

    A neuro-symbolic verification layer is a system component that checks neural outputs against symbolic constraints (rules, schemas, policies) before acting or publishing.

    Neuro-Symbolic AI

    Neuro-symbolic AI combines neural methods (LLMs, embeddings) with symbolic methods (rules, logic, knowledge graphs) to improve reliability, interpretability, and constraint satisfaction.

    Next Best Question (NBQ)

    Next Best Question is a conversational design and decisioning pattern where a system asks the single most valuable clarifying question to progress toward a correct outcome.

    Next Sentence Prediction (NSP)

    Next Sentence Prediction is a training objective where a model predicts whether one sentence likely follows another in the original text.

    NL2SQL (Natural Language to SQL)

    NL2SQL converts natural language questions into SQL queries that can be executed against a database.

    NLP (Natural Language Processing)

    Natural Language Processing (NLP) is the subfield of AI concerned with the machine processing, interpretation, and generation of natural language.

    No Free Lunch Theorem

    The No Free Lunch theorem (in optimization/learning) states that averaged over all possible problems, no one algorithm performs better than all others—performance depends on the problem distribution.

    Node2Vec

    Node2Vec is an algorithm that represents graph nodes as low-dimensional vectors based on random walks over the graph structure.

    Noise Injection

    Noise injection is deliberately adding noise during training or processing to improve robustness, generalization, or privacy.

    Noise Schedule

    A noise schedule defines how much noise is added (and later removed) at each step in a diffusion model's forward and reverse processes.

    Noisy Student Training

    Noisy Student Training is a semi-supervised learning approach where a "teacher" model labels unlabeled data, and a "student" model is trained on a mix of labeled + pseudo-labeled data with noise/augmentation.

    Nomic Embed

    Open-source embedding models from Nomic AI with full reproducibility – all training data and code are public.

    Non-Maximum Suppression (NMS)

    Non-maximum suppression is a post-processing step in object detection that removes redundant overlapping bounding boxes, keeping only the most confident ones.

    Non-Monotonic Logic

    A logical system where conclusions can be retracted when new information arrives that contradicts previous assumptions.

    Nonlinear Activation Function

    A nonlinear activation function introduces nonlinearity into neural networks (e.g., ReLU, GELU, tanh), enabling them to model complex relationships beyond linear transformations.

    Normalization

    Normalization is the transformation of numerical data to a unified value range (often 0–1 or mean 0 / standard deviation 1) to improve the training stability of machine learning models.

    Normalization Layer

    A normalization layer is a neural network component that normalizes activations to improve training stability and convergence (e.g., LayerNorm, RMSNorm).

    Normalizing Flow

    A normalizing flow is a generative modeling approach that transforms a simple distribution (e.g., Gaussian) into a complex one via a sequence of invertible transformations with tractable likelihoods.

    Novel Class Discovery (NCD)

    Novel class discovery finds previously unknown categories in unlabeled data while leveraging knowledge from known classes.

    NT-Xent Loss (Normalized Temperature-Scaled Cross-Entropy)

    NT-Xent is a contrastive learning loss used to train embeddings by pulling positive pairs together and pushing negatives apart, with a temperature term controlling distribution sharpness.

    O

    Object Detection

    Identification and localization of objects in images or videos.

    Observability for LLM Apps

    LLM observability extends classic observability with AI-specific signals: prompt/version tracking, retrieval evidence, tool traces, token usage, and quality/safety metrics.

    Off-Policy Evaluation (OPE)

    Estimates how a new decision policy would perform using data collected from a different (existing) policy—without deploying the new policy.

    Offline Evaluation

    Measures model/system performance using predefined datasets and metrics before production rollout.

    On-Device Inference

    Runs a model locally on a user's device (phone, laptop, edge hardware) instead of calling a cloud API.

    Once-for-All (OFA)

    A training method that trains a single "supernet" from which many specialized subnetworks can be extracted for different hardware constraints – train once, deploy everywhere.

    One-Cycle Policy (Super-Convergence)

    Learning rate schedule that first ramps up the LR (warmup) and then decreases it to a very low value – enables training in a fraction of the usual epochs.

    One-Shot Learning

    Ability to learn and generalize from a single example.

    One-Shot Prompting

    Provides a single example in the prompt to demonstrate the desired output pattern.

    Online Distillation

    A distillation variant where multiple models train simultaneously and serve as teachers to each other – no pre-trained teacher needed.

    Online Evaluation

    Measures performance on real user traffic (A/B tests, canaries, interleaving, holdouts) after deployment.

    Online Learning

    Updates a model incrementally as new data arrives, rather than retraining from scratch in large batches.

    ONNX (Open Neural Network Exchange)

    An open format for exchanging ML models between different frameworks – train in PyTorch, deploy with TensorRT or CoreML.

    Ontology

    A formal representation of concepts and relationships in a domain (entities, classes, properties, constraints).

    Open-Domain Dialogue

    Open-Domain Dialogue refers to AI systems that can freely converse about any topic – without being limited to predefined intents or domains.

    Open-Weight Model

    A model whose trained weights are publicly available, enabling self-hosting and deeper customization.

    OpenAI

    A leading AI research company and developer of ChatGPT, GPT-4, DALL-E, and the world's most widely used AI applications.

    OpenAI Codex

    OpenAI's specialized AI model for programming – the technology behind GitHub Copilot and foundation for code LLMs.

    OpenAI Embeddings

    OpenAI's commercial embedding API with text-embedding-3-small and text-embedding-3-large – the easiest path to high-quality embeddings.

    OpenAI o1

    OpenAI's first o-series model that uses explicit reasoning with chain-of-thought for complex problem-solving.

    OpenAI o3

    Advanced reasoning model from OpenAI with improved performance in mathematics, coding, and scientific reasoning.

    OpenLLM Leaderboard

    A public leaderboard by Hugging Face that compares open-source LLMs on standardized benchmarks (MMLU, HellaSwag, etc.).

    Operationalization

    Turning a concept, model, or prototype into a repeatable, reliable, governed production capability with clear ownership, monitoring, and change control.

    Operator Fusion

    A compiler optimization that fuses multiple consecutive operations in neural networks into a single kernel – reducing memory accesses and accelerating inference.

    Optical Flow

    Computing motion vectors between consecutive video frames showing where each pixel moves.

    Optimization

    The process of finding parameter values that minimize a loss function or maximize an objective under constraints.

    Optimizer

    The algorithm that updates model parameters during training (e.g., SGD, Adam), based on gradients and configuration.

    Orchestration

    Coordinates multiple steps, services, and tools into a reliable workflow—often with state, retries, and observability.

    Orchestrator

    The system component that implements orchestration logic—deciding the next step, calling tools, managing state, and enforcing budgets/guardrails.

    ORPO (Odds Ratio Preference Optimization)

    An evolution of DPO that combines SFT and preference alignment in a single training step.

    Out-of-Distribution (OOD) Detection

    Identifies inputs that differ significantly from what a model was trained on, signaling increased uncertainty and risk.

    Outpainting

    Outpainting extends an image beyond its original borders by generating context-aware content with AI.

    Output Guardrails

    Controls applied to model outputs to enforce safety, policy, formatting, and correctness constraints before displaying or acting.

    Output Length Control

    The set of techniques used to shape response length and structure (token limits, section caps, templates, validators).

    Output Parsing

    Extracting structured fields from model output (JSON, YAML, XML, or patterns) so downstream systems can reliably use it.

    Output Token

    A token generated by a language model as part of its response.

    Over-Generation

    Producing more output than needed (too long, too verbose, too many steps), increasing cost and reducing user clarity.

    Over-Retrieval

    Retrieving too many documents/chunks for a query, increasing cost and often reducing answer quality due to noise and context dilution.

    Overfitting

    When a model learns training data too well and generalizes poorly to new data.

    Overlapping Chunks

    A chunking strategy where consecutive text chunks share some repeated content (overlap) to preserve context across chunk boundaries.

    OWASP LLM Top 10

    A standardized list of the most critical security risks for LLM applications, published by OWASP.

    P

    PagedAttention

    A memory management technique inspired by OS virtual memory that manages KV cache in blocks, eliminating GPU memory fragmentation.

    Parallel Tool Calls

    Executing multiple tool/API calls concurrently rather than sequentially, reducing end-to-end latency.

    Parameter Count

    The number of learned weights in a model, often used as a rough proxy for capacity and compute needs.

    Parameter Sharing

    A modeling technique where multiple parts of a neural network reuse the same weights instead of having separate parameters.

    Part-of-Speech Tagging

    Automatically assigning parts of speech (noun, verb, adjective, etc.) to each word in a sentence.

    Passage Reranking

    Reorders retrieved passages using a stronger relevance model (often a cross-encoder) to improve precision before generation.

    Passage Retrieval

    Finds relevant passages (chunks) of text rather than whole documents, improving precision for question answering and RAG.

    Pathfinding

    Pathfinding is the process of finding a route between nodes in a graph that optimizes an objective (shortest, cheapest, safest, fastest).

    PDDL (Planning Domain Definition Language)

    A standardized language for describing planning problems in AI that formally defines states, actions, and goals.

    PEFT (Parameter-Efficient Fine-Tuning)

    A family of techniques that adapt LLMs by training only a small subset of parameters instead of updating the entire model.

    Perceptron

    The Perceptron is the simplest form of an artificial neuron and the foundation of modern neural networks – a linear classifier that weighted-sums inputs and passes them through an activation function.

    Perplexity

    A language model metric derived from the average negative log-likelihood; measures how "surprised" a model is by text.

    Perplexity

    An AI-first search engine that answers questions with cited, summarized answers – the leading Google challenger.

    Phi

    Microsoft's Small Language Models (SLMs) that show surprisingly strong performance despite small size and enable on-device AI.

    Pika Labs

    An AI video startup with user-friendly text-to-video and image-to-video generation, popular for short clips.

    Pipeline Parallelism

    A parallelization strategy that distributes different model layers across different GPUs – data flows through the GPU chain like a pipeline.

    Planning (AI Agents)

    The ability of AI agents to break down complex goals into executable steps and develop a strategy for goal achievement.

    Poisoning Attack

    An attack when an adversary manipulates training data, retrieval corpora, or feedback signals to degrade model behavior.

    Policy

    A policy is a rule or strategy that determines what actions are taken under which conditions.

    Policy Engine

    A component that enforces rules and constraints (who can do what, which tools are allowed, what outputs are permitted) at runtime.

    Policy Gradient

    Methods that optimize a policy directly by adjusting parameters in the direction that improves expected reward.

    Popularity Bias

    The systematic overrepresentation of popular items in recommendations, disadvantaging niche items and reinforcing filter bubbles.

    Pose Estimation

    Detection and localization of body joints and skeleton keypoints in images or videos.

    Positional Encoding

    A method that gives transformer models information about the position of tokens in a sequence, since they have no inherent ordering information.

    Positional Interpolation

    A technique to extend a model's usable context length by rescaling how positions are represented.

    Post-Training

    Any training stage applied after pretraining to shape a model for desired behaviors—helpfulness, safety, instruction-following.

    Post-Training Quantization (PTQ)

    Reduces model precision (e.g., FP16 → INT8/INT4) after training to lower memory use and speed up inference.

    Posterior Collapse

    Posterior collapse occurs in VAEs when the encoder learns to copy the prior instead of producing informative latent representations.

    Pre-LN vs. Post-LN

    Refers to the placement of layer normalization in Transformer blocks: Pre-LN normalizes before attention/FFN, Post-LN after.

    Pre-Training

    The first training phase of an LLM where the model learns to understand and generate language from massive amounts of text (often trillions of tokens) – before specialized fine-tuning follows.

    Predictive Maintenance

    AI-powered prediction of machine failures before they occur to prevent unplanned downtime.

    Preference Data

    Datasets where humans (or AI judges) indicate which of two model responses is better – the training material for RLHF, DPO, and similar alignment methods.

    Preference Optimization

    Training or adjusting models using preference signals (A preferred to B) to improve alignment with desired outputs.

    Prefill

    The inference stage where the model processes the prompt to build the initial internal state before generating output tokens.

    Prefill Latency

    The time spent processing the input prompt before the model can start generating tokens.

    Prefix Cache

    Reuses computed model state (often KV cache) for repeated prompt prefixes, avoiding repeated prefill computation.

    Prefix Caching

    Prefix caching stores KV cache computations for frequently reused prompt prefixes (e.g., system prompts) and shares them between requests.

    Prefix Tuning

    A parameter-efficient adaptation technique where you learn small "prefix" vectors that steer attention layers, instead of fine-tuning all model weights.

    PReLU (Parametric Rectified Linear Unit)

    A ReLU variant with a learnable negative slope parameter – the leak factor is optimized during training.

    Pretraining

    Training a model on large-scale data (often self-supervised) to learn general representations before task-specific adaptation.

    Privacy-Preserving Machine Learning

    A set of techniques that reduce privacy risk when training or serving models.

    Product Quantization (PQ)

    A vector compression technique that approximates high-dimensional vectors using compact codes, enabling faster approximate nearest neighbor search.

    Progressive Shrinking

    A training technique that progressively shrinks a large network – first kernel, then depth, then width – to train a supernet supporting many subnetworks.

    Prompt

    The input (instructions + context + examples + constraints) provided to a language model to elicit a desired output.

    Prompt A/B Testing

    Comparing two prompt versions on real traffic to measure differences in outcomes and guardrails.

    Prompt Budget

    An explicit allocation of tokens for instructions, context, retrieved evidence, and examples.

    Prompt Caching

    An optimization technique where frequently used prompt prefixes are cached to reduce API costs and latency.

    Prompt Chaining

    Connecting multiple prompts where the output of one prompt serves as input for the next, to solve complex tasks.

    Prompt Compression

    Reduces prompt length while preserving essential constraints and context.

    Prompt Engineering

    The art and science of designing input prompts to obtain desired outputs from LLMs.

    Prompt Hardening

    Strengthening prompts and surrounding controls to resist misuse, injection, and unsafe outputs.

    Prompt Leakage

    Unintended exposure of system prompts, hidden instructions, or sensitive context—through model outputs, logs, or UI/debug tools.

    Prompt Leaking

    Techniques to extract hidden system prompts from LLM applications.

    Prompt Linting

    Automated static analysis of prompts to detect issues before deployment (conflicts, missing constraints, unsafe phrasing).

    Prompt Registry

    A system for storing, versioning, testing, and governing prompts as production artifacts.

    Prompt Regression Testing

    Running a stable evaluation suite against prompt changes to detect quality, safety, format, and cost regressions.

    Prompt Router

    Selects the best prompt template (or workflow) for a request based on intent, difficulty, risk, and context.

    Prompt Sandbox

    A safe environment to test prompts with controlled data, tools, and logs before production.

    Prompt Template

    A reusable prompt structure with variables (placeholders) that can be filled dynamically.

    Prompt Tokens

    The tokens consumed by the model's input (system instructions, user message, retrieved context, tool schemas, examples).

    Prompt Tuning

    Parameter-efficient method where only learnable token embeddings at the input are trained while the entire model stays frozen.

    Proximal Policy Optimization (PPO)

    A reinforcement learning algorithm that updates policies in a constrained way to avoid overly large, unstable changes.

    Pruning (Neural Network Pruning)

    A model compression technique that removes unimportant weights or neurons from a neural network to reduce size and accelerate inference.

    Q

    Q-Former

    A Q-Former is a query-based transformer module used in some multimodal systems to extract and compress information from one modality.

    Q-Function

    The Q-function (action-value function) maps a state-action pair to expected return: Q(s, a).

    Q-Learning

    Q-learning is a reinforcement learning method that learns a value function Q(s, a) estimating the expected return of taking action a in state s.

    QAT (Quantization-Aware Training)

    Quantization-aware training trains a model while simulating quantization effects, improving accuracy after quantization compared to PTQ.

    QKV (Query–Key–Value)

    QKV refers to the Query (Q), Key (K), and Value (V) matrices used in transformer attention mechanisms.

    QLoRA (Quantized LoRA)

    A combination of quantization and LoRA that enables fine-tuning of LLMs with drastically reduced memory requirements by quantizing the base model while training only LoRA adapters in full precision.

    Quadratic Attention Cost

    Quadratic attention cost refers to the classic computational scaling of full self-attention, which grows roughly with the square of sequence length (O(n²)).

    Quality-of-Answer Score

    A quality-of-answer score is a composite metric that estimates how good an AI answer is (usefulness, correctness, clarity, groundedness, safety).

    Quantization-Aware Training (QAT)

    A training method that simulates quantization errors during training so the model learns to handle lower precision – higher quality than post-training quantization.

    Quantum Machine Learning (QML)

    Quantum machine learning explores using quantum computing concepts (qubits, superposition, entanglement) to accelerate or enhance certain ML computations.

    Quarantine

    Quarantine is isolating content, inputs, or events that are suspicious, unsafe, or low-trust so they cannot affect production outputs.

    Query Embeddings

    Query embeddings are vector representations of search queries used for semantic similarity matching against embedded documents/passages.

    Query Expansion

    Query expansion augments a query with additional terms or semantic signals to improve retrieval recall.

    Query Fan-Out

    Query fan-out is when one request triggers many downstream queries/tool calls to gather context or results.

    Query Federation

    Query federation executes a query across multiple systems/sources (databases, services, indexes) and combines results.

    Query Likelihood Model

    A query likelihood model is an information retrieval approach where documents are ranked by the probability that the document's language model would generate the query.

    Query Reranking

    Query reranking reorders search/retrieval results using a stronger scoring function (often a cross-encoder or LLM-based scorer) to improve relevance at the top.

    Query Rewrite

    Query rewrite is modifying a search query to improve retrieval quality (recall/precision), often by clarifying intent, expanding terms, or normalizing vocabulary.

    Query Rewriting

    Transforming a user query into a form that yields better retrieval results.

    Query Routing

    Query routing sends a query to the most appropriate engine, model, index, or workflow based on intent, confidence, and constraints.

    Query Understanding Evaluation

    Query understanding evaluation measures how well your system interprets user intent, entities, constraints, and risk level from queries.

    Query-Time Filtering

    Query-time filtering applies constraints during retrieval—such as permissions, tenant boundaries, recency windows, language, or document type.

    Question Answering (QA)

    Question Answering is a task where a system answers questions based on a corpus, knowledge base, or model knowledge.

    Question Decomposition

    Question decomposition breaks a complex question into smaller sub-questions that can be answered more reliably.

    Quota-Aware Routing

    Quota-aware routing chooses models/workflows based on remaining quota and cost budgets (e.g., route simple queries to cheaper modes when budget is low).

    Qwen

    Alibaba's open-weight LLM family that competes with Llama and Mistral in many benchmarks and offers strong multilingual capabilities.

    R

    RAG (Retrieval-Augmented Generation)

    Retrieval-Augmented Generation (RAG) is an architecture where an LLM generates an answer using retrieved external information (documents/chunks) as evidence, rather than relying only on its internal parameters.

    RAG Chunking Strategy

    A RAG chunking strategy defines how source documents are split into retrievable units (chunk size, overlap, structure preservation, metadata).

    RAG Evaluation

    The systematic evaluation of RAG systems across retrieval quality, answer relevancy, groundedness, and faithfulness.

    RAG Poisoning

    RAG poisoning is an attack or failure mode where the retrieval corpus is manipulated so that malicious or misleading content is retrieved as "evidence," degrading outputs or steering the system.

    Ragas

    Ragas is a popular evaluation approach/library for RAG systems that provides practical metrics and workflows to assess retrieval + generation quality.

    Random Search

    Hyperparameter tuning by randomly sampling from the parameter space – more efficient than grid search with the same compute budget.

    Re-Embedding

    Re-embedding is regenerating embeddings for a corpus (documents/chunks) using the same or a new embedding model, then updating the vector index accordingly.

    ReAct (Reason + Act)

    ReAct is an agentic pattern where a model alternates between reasoning and taking actions (tool calls), incorporating observations before continuing.

    ReAct (Reasoning + Acting)

    A prompting paradigm that connects reasoning (thinking) and acting (doing) in a loop – the LLM thinks aloud, executes actions, and reflects on results.

    Reasoning Model

    AI models that perform and show explicit thinking steps before generating a final answer – optimized for complex reasoning.

    Reasoning Models

    A new class of LLMs (OpenAI o1, o3, DeepSeek R1) that perform explicit step-by-step reasoning before answering – "thinking" becomes visible and improves complex problem-solving.

    Recall@k

    Recall@k measures how often the needed relevant item(s) appear within the top-k retrieved results.

    Recency Bias

    Recency bias is a tendency to overweight more recent information—either in human judgment or in system behavior (ranking, context usage).

    Reciprocal Rank Fusion (RRF)

    RRF combines multiple ranked result lists into one by summing reciprocal ranks, improving robustness when different retrieval methods excel on different queries.

    Recommendation Engine

    System that generates personalized recommendations based on user behavior.

    Recurrent Neural Network (RNN)

    RNNs process sequences by passing a hidden state across timesteps – the original architecture for language and time series, now largely replaced by Transformers.

    Red Teaming

    The systematic attempt to find vulnerabilities and dangerous behaviors in AI systems before they are exploited by malicious actors.

    Reflection Agent

    An agent pattern where the LLM critically evaluates its own outputs and iteratively improves them – like an internal code review.

    Regression

    ML method for predicting continuous numerical values.

    Regression Testing

    Regression testing ensures that changes (code, prompts, retrieval config, model versions) don't break existing behavior or quality.

    Regularization

    Techniques that prevent overfitting by constraining model complexity.

    Reinforcement Learning

    A learning paradigm where an agent learns by interacting with an environment to maximize rewards.

    Reinforcement Learning (RL)

    Reinforcement learning is a paradigm where an agent learns to make decisions by interacting with an environment and optimizing cumulative reward.

    Relation Extraction

    Relation Extraction identifies and classifies semantic relationships between entities in unstructured text.

    ReLU (Rectified Linear Unit)

    ReLU is the most used activation function in deep learning: f(x) = max(0, x) – simple, fast, and effective against vanishing gradients.

    Reparameterization Trick

    The reparameterization trick enables backpropagation through stochastic sampling operations by treating randomness as an external variable.

    Reproducibility

    Reproducibility is the ability to recreate the same (or equivalent) outputs and behavior given the same inputs, versions, and configuration.

    Reranker

    A reranker is a model that re-scores and reorders retrieved candidates (documents/chunks) to improve relevance at the top.

    Reranking

    Reordering retrieval results with a more powerful model for better relevance.

    Residual Connection

    Residual connections add a layer's input to its output, allowing gradients to flow directly through deep networks.

    ResNet

    A CNN architecture with skip connections (residual connections) that enables training of very deep networks.

    Response Generation

    AI process for generating natural language responses.

    Responsible AI

    A holistic approach to developing and deploying AI systems that prioritizes ethical principles such as fairness, transparency, privacy, and human oversight.

    RetNet (Retentive Network)

    An architecture from Microsoft combining Transformer quality with linear inference complexity through a "retention" mechanism.

    Retrieval Confidence

    Retrieval confidence is a signal estimating whether retrieved results contain sufficient, relevant evidence to answer the query reliably.

    Retrieval Drift

    Retrieval drift is a change in retrieval behavior/quality over time due to corpus updates, embedding model changes, indexing settings, query distribution shifts, or metadata changes.

    Retrieval-Augmented Generation

    An AI architecture that connects Large Language Models with external knowledge sources by retrieving relevant documents and using them as context for response generation.

    Retrieval-Augmented Generation (RAG)

    A technique that combines LLM generation with external knowledge retrieval to provide more grounded and current responses.

    Retrieval-First Policy

    A retrieval-first policy forces the system to retrieve evidence before generating substantive answers, especially for factual or high-risk queries.

    Retriever

    A retriever is the component that selects candidate documents/chunks relevant to a query (keyword, vector, hybrid, or federated).

    Retriever-Reranker Cascade

    A retriever–reranker cascade is a two-stage retrieval approach: a fast retriever generates candidates, then a slower, more accurate reranker selects the best top-k.

    Reward Hacking

    Reward hacking occurs when a model/agent finds ways to maximize reward without actually achieving the intended real-world goal.

    Reward Model

    A reward model scores model outputs according to a preference objective (helpfulness, safety, format compliance), often used in alignment-style training or evaluation.

    Right to Explanation

    The legal or ethical right of affected individuals to receive an understandable explanation for automated decisions.

    Ring Attention

    A distributed attention technique that distributes long sequences across multiple GPUs by passing KV blocks in a ring between devices.

    RLAIF (Reinforcement Learning from AI Feedback)

    RLAIF uses AI-generated critiques or preferences (often from a judge model) as feedback signals to improve model behavior, reducing reliance on human labeling.

    RLHF (Reinforcement Learning from Human Feedback)

    A training method that uses human feedback to make LLMs more helpful, safer, and better aligned – the key to "alignment" in modern ChatGPT-like models.

    RMSNorm (Root Mean Square Normalization)

    A simplified variant of layer normalization using only root mean square without mean centering – faster and standard in LLaMA/Mistral.

    RMSprop

    Adaptive optimizer that solves AdaGrad's problem by using an exponentially weighted average of squared gradients instead of their sum.

    RNN (Recurrent Neural Network)

    A Recurrent Neural Network (RNN) is a neural network architecture for sequential data where neurons use their own output as additional input for the next time step — preserving context across sequences.

    Robotics (AI)

    The field of developing intelligent robots that use AI to autonomously perceive, plan, and execute tasks in the physical world.

    Robustness Testing

    Robustness testing evaluates how reliably a model or system performs under perturbations, edge cases, noise, or distribution shifts.

    RoPE (Rotary Position Embedding)

    A method for encoding positional information in Transformers by rotating Query and Key vectors, naturally capturing relative positions.

    RoPE (Rotary Positional Embeddings)

    RoPE is a positional encoding method that applies rotations to query/key vectors, enabling models to represent token positions in a way that supports relative position behavior.

    ROUGE Score

    Metrics for evaluating automatic text summarization.

    Routing Policy

    A routing policy is the rule set that decides which model/workflow/tools to use for a request based on intent, risk, confidence, and budgets.

    Runway

    A leading AI video platform with text-to-video, image-to-video, and advanced editing tools for creative professionals.

    RWKV (Receptance Weighted Key Value)

    An open-source architecture combining RNN efficiency (O(1) inference per token) with Transformer-like parallelizability during training.

    S

    S4 (Structured State Spaces)

    The groundbreaking state space architecture combining HiPPO initialization with efficient convolution computation that sparked the SSM revolution.

    Safety

    Safety in AI systems is the set of measures that prevent harmful, insecure, or policy-violating outputs and actions—especially under adversarial or ambiguous inputs.

    Safety Alignment

    Safety alignment is shaping model/system behavior so it reliably follows safety constraints (refusals, safe defaults, policy adherence) across normal and adversarial inputs.

    Safety Case

    A safety case is a structured argument—supported by evidence—that a system is acceptably safe for a specific context and risk profile.

    Safety Classifier

    A safety classifier is a model/rule system that detects unsafe content or risky intent (e.g., self-harm, hate, data exfiltration attempts, policy violations).

    Safety Evaluation

    Safety evaluation is the systematic testing of an AI system for harmful, policy-violating, insecure, or privacy-risk behavior—across normal and adversarial inputs.

    Safety Filters

    Safety filters detect and block or transform unsafe outputs (or unsafe inputs) based on policy (e.g., sexual content, violence, hate, self-harm, illegal instructions).

    Safety Guardrails

    Safety guardrails are mechanisms that constrain an AI system's behavior to reduce harm (policies, validators, permission boundaries, rate limits, refusals).

    Safety Incident Taxonomy

    A safety incident taxonomy is a structured classification system for AI safety incidents (what happened, severity, impact, root cause, mitigation).

    Safety Training

    The process of making LLMs safer through specialized training – includes RLHF, DPO, Constitutional AI, and red-teaming-based training.

    Saliency Map

    Visualization showing which input pixels or tokens have the greatest influence on model output, based on gradients.

    SAM (Segment Anything Model)

    A foundation model by Meta for universal image segmentation that can segment any object in an image with zero-shot capability.

    Sampling Steps

    Sampling steps are the number of iterative denoising iterations used during diffusion inference to generate an output.

    Sampling Temperature

    Sampling temperature scales the model's output distribution: lower temperatures make outputs more deterministic; higher temperatures increase randomness.

    SARSA (State-Action-Reward-State-Action)

    SARSA is an on-policy RL algorithm that updates Q-values based on the action actually taken – unlike Q-Learning's off-policy maximum.

    Satisficing

    Satisficing is choosing a solution that is 'good enough' to meet constraints, rather than optimizing for the absolute best.

    Scalable Oversight

    Methods to monitor and correct AI systems that exceed human capabilities – how do you oversee something smarter than yourself?

    Scaled Dot-Product Attention

    The base attention computation: Attention(Q,K,V) = softmax(QK^T / √d_k) · V – the mathematical foundation of all Transformers.

    Scaling Laws

    Scaling laws are empirical relationships showing how model performance tends to improve predictably as you scale data, compute, and parameters.

    Scene Understanding

    AI ability to holistically understand complex visual scenes – objects, their relationships, context, and implicit meaning.

    Schema Drift

    Schema drift is when the expected structure of data changes over time (fields added/removed/renamed, types change, enums expand), often breaking pipelines.

    Score Matching

    Score matching learns the gradient of the log-probability density (score function) of a data distribution to generate samples via Langevin dynamics.

    SearchGPT

    OpenAI's real-time web search feature integrated into ChatGPT – combines conversation with current web information.

    Secure Aggregation

    A cryptographic protocol that allows a server to compute aggregate values from individual contributions without seeing the individual values.

    Seedance

    AI video generator by ByteDance with controversial training data origins and photorealistic results.

    Selective Prediction

    An approach where a model refuses uncertain predictions and delegates to humans or other systems.

    Self-Attention

    Attention mechanism where input elements are related to each other.

    Self-Consistency

    Self-consistency is a technique where you sample multiple reasoning paths/answers and aggregate them (e.g., majority vote) to improve reliability.

    Self-Distillation

    A variant of knowledge distillation where a model uses itself as teacher – the same or identical model serves as teacher for a new training run.

    Self-Play

    Self-Play is an RL training method where an agent plays against copies of itself, continuously improving through competition.

    Self-Supervised Learning

    Learning paradigm where the model generates labels from the data itself.

    SELU (Scaled Exponential Linear Unit)

    A self-normalizing activation function that automatically centers outputs to mean 0 and variance 1 – no batch/layer norm needed.

    Semantic Caching

    Semantic caching reuses past answers/results when a new query is semantically similar to a previous query, not necessarily identical.

    Semantic Chunking

    Semantic chunking splits documents into chunks based on meaning boundaries (topics/sections) rather than fixed token counts alone.

    Semantic Router

    A semantic router routes queries to the right workflow, toolset, or model using semantic signals (embeddings, intent classification, similarity to known categories).

    Semantic Search

    A search method that understands the meaning and context of queries rather than just matching exact keywords – enabling more natural and intelligent search results.

    Semantic Segmentation

    Pixel-level classification of image regions by object categories.

    Sentence Transformers

    A Python library and collection of models that produce semantically meaningful sentence embeddings – optimized for similarity search and clustering.

    SentencePiece

    Language-independent open-source tokenizer framework by Google that works directly on raw text without prior word segmentation.

    Sequence-to-Sequence

    A model architecture that transforms an input sequence into an output sequence of variable length.

    Session-Based Recommendation

    Recommendations based on the current user session rather than historical profiles – ideal for anonymous visitors.

    SFT (Supervised Fine-Tuning)

    Supervised fine-tuning (SFT) adapts a pretrained model using labeled input→output examples to shape behavior (format, style, task performance).

    SFT (Supervised Fine-Tuning)

    Training a pre-trained model on curated (input, output) pairs to adapt it to specific tasks or formats.

    SHAP (Shapley Additive Explanations)

    SHAP is a model explainability method based on Shapley values from cooperative game theory that attributes a prediction to individual features.

    Sharpness-Aware Minimization (SAM)

    Optimization method that minimizes not only the loss but also the "sharpness" of the loss landscape – finds flatter minima for better generalization.

    Siamese Network

    A Siamese network is a neural architecture with two (or more) identical subnetworks that learn to compare inputs by producing embeddings and measuring similarity.

    Sigmoid Function

    The Sigmoid function σ(x) = 1/(1+e^(-x)) maps any value to the range (0, 1) – historically important as activation function, today primarily for binary classification.

    Signal-to-Noise Ratio

    Signal-to-noise ratio (SNR) is the proportion of meaningful information ("signal") relative to irrelevant or misleading information ("noise").

    SiLU / Swish

    SiLU/Swish = x · σ(x) – a smooth, self-gated activation function that outperforms ReLU in many benchmarks and is the basis of SwiGLU.

    Sim-to-Real Transfer

    Transferring AI models trained in simulation to real physical systems – train in the virtual world, deploy in the real one.

    SimCLR

    SimCLR (Simple Contrastive Learning of Visual Representations) is a framework for self-supervised learning that learns visual representations by comparing augmented image versions.

    Similarity Score Calibration

    Similarity score calibration maps raw similarity scores (from embeddings/rerankers) to more reliable confidence signals (e.g., probabilities or risk bands).

    Similarity Search

    Similarity search finds items most similar to a query under a similarity metric (cosine similarity, dot product, etc.), commonly used with embeddings.

    Similarity Thresholding

    Similarity thresholding sets cutoff values on similarity scores (embedding similarity, reranker scores) to decide actions like "use cache," "retrieve more," or "ask a clarifying question."

    SimPO (Simple Preference Optimization)

    A simplified version of DPO that works without a reference model and uses length-normalized reward.

    Simulation

    The imitation of a real or hypothetical system or process in a controlled virtual environment.

    Sinusoidal Positional Encoding

    The original positional encoding from the Transformer paper using sine and cosine functions of different frequencies.

    Skip Connection

    Skip connections forward the input of a layer directly to the output of later layers – the core mechanism making 100+ layer deep networks trainable.

    Sliding Window Attention (SWA)

    An attention variant where each token only attends to a limited number of previous tokens (window) instead of the entire sequence.

    Slot Filling

    Extraction of specific parameters from user utterances for conversational AI.

    Small Language Model

    A Small Language Model (SLM) is a comparatively smaller LLM designed for lower latency, lower cost, and easier deployment—often used for narrow tasks or as part of a routed system.

    Small Language Models

    Language models with significantly fewer parameters than large LLMs (typically 1-7B instead of 100B+), optimized for specific tasks and capable of running locally or on edge devices.

    SMOTE (Synthetic Minority Over-sampling Technique)

    Algorithm that generates synthetic examples for the minority class by interpolating between existing data points.

    Soft Prompt

    A soft prompt is a learned vector representation (rather than human-written text) used to steer a model's behavior—often trained as a small set of prompt embeddings.

    Softmax

    Function that converts logits into probability distribution.

    Solomonoff Induction

    Solomonoff induction is a theoretical framework for optimal prediction that combines Bayesian inference with algorithmic complexity, weighting hypotheses by how simply they describe the data.

    Sora

    OpenAI's revolutionary text-to-video model that generates photorealistic videos up to one minute from text descriptions.

    Sora 2

    The second generation of OpenAI's text-to-video model with improved quality, longer clips, and more realistic physics simulation.

    Source Attribution

    Source attribution is explicitly indicating where information came from (documents, URLs, internal systems), often via citations or links.

    Source Grounding

    Source grounding is constraining an AI system to base its answers on provided sources (retrieved documents, tools, or approved references) rather than unverified model knowledge.

    Source Separation

    Source Separation separates a mixed audio signal into individual sources – e.g., vocals, drums, bass, and instruments from a song.

    Sparse Attention

    Sparse attention reduces attention computation by allowing tokens to attend only to a subset of other tokens (patterned or learned sparsity).

    Sparse Autoencoder

    A Sparse Autoencoder (SAE) is an autoencoder trained with a sparsity constraint so that only a small subset of features activate for any given input.

    Sparse Mixture of Experts (SMoE)

    An architecture where only a small fraction of all "expert sub-networks" is activated per input – enabling huge model capacity with efficient inference.

    Sparse Model

    A neural network where only a small portion of weights or activations are used for each computation, significantly increasing efficiency.

    Sparse Retrieval

    Sparse retrieval uses sparse representations (often term-frequency based) such as BM25 to retrieve documents by lexical match.

    Sparse Training

    Training with sparsity from the start – instead of "train dense, then prune," the model stays sparse from the beginning with connections dynamically added/removed.

    Speaker Diarization

    Speaker diarization identifies "who spoke when" in an audio recording by segmenting audio into speaker-labeled turns.

    Spectral Normalization

    Spectral Normalization constrains the Lipschitz constant of network layers by normalizing with the largest singular value – standard stabilization in GANs.

    Speculative Decoding

    An inference acceleration technique where a small "draft model" quickly proposes multiple tokens and a large "verifier model" verifies them in parallel – up to 3x faster generation.

    Speech Enhancement

    Speech Enhancement improves speech recording quality by removing noise, reverb, and interference – often as preprocessing for ASR.

    Speech-to-Text

    Technology for converting spoken language into written text – the foundation for voice assistants and transcription.

    Speech-to-Text (STT)

    Speech-to-Text (STT) converts spoken audio into written text using automatic speech recognition (ASR) models.

    Stability AI

    The company behind Stable Diffusion, one of the most widely used open-source models for AI image generation.

    Stable Diffusion

    The leading open-source model for text-to-image generation, enabling local execution and fine-tuning on consumer hardware.

    State Space Model (SSM)

    A class of sequence models based on continuous state space theory offering linear scaling O(N) instead of quadratic attention O(N²).

    State Space Models (SSMs)

    State Space Models (SSMs) are sequence models that maintain a latent "state" that evolves over time to process sequential data efficiently.

    Statefulness

    Statefulness describes whether a system retains information across interactions (stateful) or treats each request independently (stateless).

    Steering Vector

    A steering vector is a direction in a model's internal representation space that, when added or applied to activations, can bias outputs toward or away from certain behaviors or attributes.

    Stemming

    Rule-based reduction of words to their stem by removing suffixes.

    Step Decay (Learning Rate)

    Simplest learning rate schedule strategy that reduces the LR by a factor after fixed intervals (epochs or steps).

    Stochastic Gradient Descent (SGD)

    Variant of gradient descent that uses only a mini-batch per update instead of all data – faster and often better generalizing.

    Stochastic Parrot

    Stochastic parrot is a critique framing that highlights how LLMs can generate fluent text by pattern-matching from training data without true understanding—raising concerns about bias, misinformation, and misuse.

    Stochastic Weight Averaging (SWA)

    Training technique that averages model weights over multiple checkpoints to find flatter minima and better generalization.

    Stop Sequence

    A stop sequence is a token/string pattern that tells a model to stop generating when encountered.

    Stopword Removal

    Removing high-frequency words without semantic content (the, a, is, and, of) from text before processing.

    Stratified Sampling

    Sampling method that ensures class/group proportions in the sample match the overall distribution.

    Streaming ASR

    Streaming ASR transcribes speech in near real-time as audio arrives, rather than after the full recording is complete.

    STRIPS

    STRIPS is a classical planning formalism where actions are defined by preconditions and effects (add/delete lists) over symbolic state predicates.

    Structured Output

    Structured output is requiring the model to produce outputs in a predefined structure (JSON, YAML, sections with strict headings), often enforced with validation.

    Structured Pruning

    A pruning variant that removes entire structures (neurons, filters, attention heads, layers) instead of individual weights – delivers real speedups without specialized sparse hardware.

    Style Transfer

    Style transfer modifies an image (or text) to match a target style while preserving core content.

    StyleGAN

    StyleGAN is NVIDIA's groundbreaking GAN architecture that generates photorealistic faces and images with unprecedented control over style and details.

    Subject Consistency

    The ability of an AI image generator to consistently render characters and objects across multiple images.

    Summarization

    Summarization is generating a shorter representation of content while preserving key meaning—extractive (selecting parts) or abstractive (rewriting).

    Super Resolution

    Super resolution increases the resolution of images or videos using AI – reconstructing details not present in the original.

    Superalignment

    The research problem of how to make AI systems smarter than humans (superintelligence) safe and controllable.

    Superposition

    Superposition in neural networks describes how multiple features can be represented in overlapping directions within a limited-dimensional space, rather than one feature per neuron.

    Supervised Learning

    ML paradigm where the model learns from labeled examples (input-output pairs).

    Surrogate Model

    A simple, interpretable model that approximates a complex black-box model to explain its decisions.

    SWE-Bench (Software Engineering Benchmark)

    A benchmark that tests LLMs by having them solve real bug reports from GitHub repositories – the most realistic test for AI coding abilities.

    SwiGLU

    An activation function for Transformer FFN blocks combining Swish gating with linear projection, standard in modern LLMs like LLaMA.

    Sycophancy

    Sycophancy is an LLM behavior where the model overly agrees with the user's stated beliefs or incorrect premises instead of correcting them.

    Synthetic Media

    Umbrella term for all media content (text, image, audio, video) that has been wholly or partially created or manipulated by AI.

    System Prompt

    A special prompt category that defines the base behavior, persona, and rules for an AI session.

    T

    Talking Head Generation

    AI technology that generates a realistic video of a speaking person from a single portrait photo and audio input.

    Tanh (Hyperbolic Tangent)

    An activation function that maps values to the range [-1, 1] – zero-centered and smoother than sigmoid.

    Technological Singularity

    A hypothetical point at which technological progress (especially AI) becomes so rapid and profound that it fundamentally and unpredictably transforms human civilization.

    Temperature

    A parameter that controls randomness in LLM output.

    Temperature (Sampling)

    A parameter controlling the "creativity" of LLM outputs: Low values (0-0.3) produce focused, deterministic responses; high values (0.7-1.0) bring variation and surprises.

    Temperature Scaling

    A post-hoc calibration method that uses a single parameter (temperature) to adjust model confidence values.

    Temporal Difference Learning (TD)

    TD learning updates value estimates based on the difference between successive predictions – learns from incomplete episodes through bootstrapping.

    Temporal Graph Network

    A GNN for time-evolving graphs that models the evolution of nodes and edges over time.

    Tensor Parallelism

    A parallelization strategy that splits individual tensor operations (matrix multiplications) across multiple GPUs – necessary for layers too large for one GPU.

    Test-Time Training (TTT)

    A paradigm where a model adapts to each new input during inference by optimizing a self-supervised loss on the test instance – "learning while predicting".

    Text Classification

    Automatically assigning texts to predefined categories using a machine learning model.

    Text Generation

    Text generation is the automatic creation of text by AI models, typically based on a prompt or context.

    Text Normalization

    Standardizing text data by converting to a uniform form – lowercasing, Unicode normalization, character replacement, and more.

    Text Summarization

    Automatically generating a shorter version of a text while retaining the most important information.

    Text-to-3D

    Text-to-3D generates three-dimensional objects and scenes from natural language text descriptions using AI.

    Text-to-Image

    AI generation of images from text descriptions – the breakthrough that democratized creative work.

    Text-to-Speech

    Technology for converting written text into natural-sounding speech – today mostly using neural models.

    Text-to-Video

    AI technology that generates complete videos with moving images, people, and scenes from text descriptions.

    Textual Inversion

    Textual Inversion learns a new word embedding for a concept from a few images, without modifying the diffusion model itself.

    TF-IDF

    Statistical measure for evaluating the relevance of a word in a document relative to a document collection.

    Thompson Sampling

    Bayesian bandit algorithm that selects actions proportionally to the probability that they are optimal.

    Throughput

    The number of tokens or requests a system can process per time unit – a key measure for ML inference efficiency.

    Time Series Foundation Model

    Pre-trained Transformer models for time series enabling zero-shot forecasting without specific training.

    Time-to-First-Token (TTFT)

    The time from request to first generated token – critical for perceived responsiveness of AI applications.

    Tokenization

    The process of breaking text into smaller units (tokens) that can be processed by language models – from whole words to syllables to individual characters.

    Tool Use

    The ability of LLMs to call external tools and APIs – from calculators to web search to databases and custom functions.

    Top-k Sampling

    A sampling parameter that restricts selection to the k most likely tokens, regardless of their absolute probabilities.

    Top-p (Nucleus Sampling)

    A sampling parameter that selects only from the most likely tokens whose cumulative probability does not exceed p.

    Toxicity Detection

    ML systems that automatically detect and classify toxic, offensive, or hateful content.

    Transfer Learning

    Using knowledge learned from one task to improve performance on a related task.

    Transformer

    A neural network architecture that uses self-attention to model relationships between all positions in a sequence.

    Transformer Architecture

    The revolutionary neural network architecture from 2017 ("Attention Is All You Need") that replaced RNNs and forms the foundation of all modern LLMs like GPT, Claude, Gemini.

    Transparency

    The disclosure of how AI systems work, what data they use, and how decisions are made.

    Tree of Thoughts (ToT)

    Prompting strategy where the LLM explores multiple reasoning paths in parallel, evaluates them, and selects the best – like a decision tree for thought chains.

    Triplet Loss

    A loss function for metric learning that uses anchor, positive, and negative samples to train embeddings so similar items are closer and different ones further apart.

    Trust & Safety

    Trust & Safety is the practice of protecting users, platforms, and brands from harmful content, abuse, and unsafe outcomes—through policy, enforcement, and product design.

    TruthfulQA

    A benchmark that tests whether LLMs avoid popular misinformation and conspiracy theories.

    Two-Tower Model

    An architecture with two separate encoders (user tower, item tower) whose embeddings are efficiently matched via similarity search.

    U

    U-Net

    U-Net is a network architecture for image segmentation with encoder-decoder structure and skip connections.

    Ultra-Long Context Window

    An ultra-long context window is the ability to accept very large input contexts (tens or hundreds of thousands of tokens).

    Uncertainty Quantification (UQ)

    UQ estimates how uncertain a model is about an output.

    Uncertainty-Aware Routing

    Uncertainty-aware routing chooses workflows based on uncertainty signals (low-confidence → deeper retrieval).

    Underfitting

    Underfitting happens when a model is too simple to capture patterns—poor performance on both training and test.

    Uniform Information Density

    Prompt principle: keep "importance per token" consistent, avoid low-value text.

    Unigram Model (Tokenization)

    Subword tokenization algorithm that starts with a large vocabulary and iteratively removes least useful tokens.

    Unintended Memorization

    Unintended memorization: models retain specific training examples and may reproduce them.

    Universal Embeddings

    Universal embeddings: general-purpose representations for many domains without domain-specific training.

    Unlearning (Machine Unlearning)

    Machine unlearning removes the influence of specific training data from a model (privacy, compliance).

    Unsupervised Learning

    ML paradigm where the model finds patterns in unlabeled data.

    Untrusted Input Handling

    Controls that treat external/user-provided content as potentially malicious.

    Utility Function

    A utility function maps outcomes to numeric values representing preference, enabling tradeoffs between competing objectives.

    V

    VAE (Variational Autoencoder)

    VAE stands for Variational Autoencoder, a generative model that learns a probabilistic latent space for sampling and generation.

    Value Alignment

    Value alignment is ensuring an AI system's behavior reliably matches intended human/organizational values and constraints (safety, fairness, truth-seeking, privacy).

    Value of Information (VoI)

    Value of Information (VoI) quantifies how much benefit you gain by obtaining additional information before making a decision.

    Vanishing Gradient

    Vanishing gradient is a training problem where gradients become extremely small as they propagate backward through a network, slowing or preventing learning in early layers.

    Variational Autoencoder (VAE)

    A Variational Autoencoder (VAE) is a generative model that learns a probabilistic latent space, enabling sampling and generation of new data.

    Veo 3

    Google's third-generation video generation model with native audio, longer clips, and improved physics.

    Verification

    Checking whether LLM outputs are correct, factual, and source-supported.

    Verification Layer

    A verification layer is a system component that checks whether an AI output or action meets required correctness, safety, policy, and formatting constraints before it is delivered or executed.

    Verification-First Policy

    A verification-first policy requires AI outputs and high-impact actions to pass defined verification checks before being shown to users or executed.

    Video AI

    Video AI encompasses AI technologies for automatic analysis, generation, editing, and optimization of video content.

    Vision Language Models

    AI models that can understand and process both images and text – they "see" and "read" simultaneously and can communicate about visual content.

    Vision Transformer (ViT)

    A Vision Transformer (ViT) applies transformer architectures to images by representing them as sequences of patch embeddings.

    Vision-Language Model (VLM)

    A Vision-Language Model (VLM) processes both images and text to perform tasks like image understanding, captioning, document Q&A, and multimodal reasoning.

    Visual Question Answering (VQA)

    AI systems that can answer questions about images in natural language – "How many people are in the photo?"

    Vocabulary (NLP)

    The complete set of all tokens that a language model knows and can process.

    Vocoder

    A vocoder converts Mel spectrograms or other acoustic features into audible audio waveforms – the final step in TTS pipelines.

    Voice Activity Detection

    Voice Activity Detection automatically detects whether an audio signal contains human speech – the foundation for efficient speech processing.

    Voice Agent

    Voice Agents are AI-powered speech systems that autonomously conduct natural phone or voice conversations – from outbound calls to customer service hotlines.

    Voice Cloning

    AI technology that analyzes a human voice from just seconds of audio and synthetically reproduces it to speak any text in that voice.

    VQ-VAE

    VQ-VAE is a variant of VAE that uses vector quantization to learn discrete latent representations via a learned codebook.

    W

    Warm Start

    A warm start initializes training or optimization from a previously learned state (weights, embeddings, or parameters) rather than starting from scratch.

    Watermarking

    Watermarking is adding a detectable signal to content (text, image, audio, video) to indicate origin, authenticity, or provenance—often used to mark AI-generated outputs.

    Wav2Vec

    Wav2Vec is a self-supervised learning framework from Meta for speech representations that learns from raw audio and achieves state-of-the-art ASR with minimal labeled data.

    Weak Supervision

    Weak supervision uses imperfect, noisy, or indirect signals (heuristics, rules, distant labels) to create training labels instead of manual annotation.

    Weakly Supervised Learning

    Weakly supervised learning trains models using weak supervision signals (noisy labels, partial labels, aggregated labels) rather than fully reliable labels.

    Weavy

    AI video platform with node-based editor for complex generative video workflows and multi-model pipelines.

    Web Browsing Tool

    A web browsing tool is an AI tool integration that fetches live web pages or search results to answer questions with up-to-date information.

    Web Grounding

    The ability of an AI model to access web search results in real-time to generate current and factually accurate content.

    Weight Decay

    Weight decay is a regularization technique that discourages large weights during training, often implemented as L2 regularization or decoupled weight decay (e.g., in AdamW).

    Weight Initialization

    Weight initialization determines the starting values of network parameters – critical for stable training and fast convergence.

    Weight Normalization

    Weight Normalization reparameterizes weight vectors into direction and magnitude – an alternative to batch norm without batch dependency.

    Weight Sharing

    A technique where multiple parts of a neural network use the same weights – significantly reducing parameter count and memory usage.

    WER (Word Error Rate)

    Word Error Rate (WER) measures speech recognition accuracy as the proportion of substitutions, deletions, and insertions needed to transform a transcript into the ground truth.

    Whisper

    An open-source speech recognition model from OpenAI trained on 680,000 hours of multilingual audio.

    Windowed Attention

    Windowed attention restricts attention to a local token window instead of the full sequence, reducing compute and enabling longer contexts.

    WinoGrande

    A benchmark for pronominal reference resolution where small word changes flip the correct answer.

    Word Embedding

    A dense vector representation of a word that encodes its semantic meaning.

    Word Error Rate (WER)

    The standard metric for speech recognition – measures substitutions, deletions, and insertions relative to the reference.

    Word2Vec

    Word2Vec is a technique for generating word embeddings that represents words as dense vectors, where semantically similar words have similar vectors.

    WordPiece

    Subword tokenization algorithm developed by Google that maximizes training corpus likelihood.

    World Model

    An internal representation of the environment in an AI system that enables predictions about future states and the effects of actions.

    X

    x-Vector

    An x-vector is a type of speaker embedding used in speech processing to represent speaker identity characteristics in a fixed-length vector.

    xAI

    Elon Musk's AI company developing Grok – an LLM with real-time access to X (Twitter) and an uncensored, humorous style.

    XAI (Explainable AI)

    Explainable AI (XAI) is the set of methods and practices used to make an AI system's outputs more understandable—showing why a prediction, recommendation, or decision happened.

    Xavier Initialization (Glorot Initialization)

    Xavier (Glorot) initialization is a weight initialization method designed to keep activations and gradients in a healthy range as they flow through a neural network.

    XGBoost

    XGBoost (Extreme Gradient Boosting) is a high-performance ensemble learning algorithm that combines gradient boosting with decision trees for superior prediction accuracy.

    XLM (Cross-lingual Language Model)

    XLM refers to cross-lingual language modeling approaches and model families designed to represent and process multiple languages in a shared embedding space.

    XLM-R (Cross-lingual RoBERTa)

    XLM-R is a multilingual transformer model family often used for cross-lingual understanding tasks (classification, NER, semantic similarity).

    XLNet

    XLNet is a transformer-based language model approach that uses permutation-based training to capture bidirectional context while preserving autoregressive properties.

    xLSTM (Extended LSTM)

    A modernized LSTM variant by Sepp Hochreiter using exponential gating and matrix memory to compete with Transformers.

    XOR Problem

    The XOR problem is a classic example showing that a single linear classifier cannot separate data that is not linearly separable.

    Term not found?

    Browse the full glossary with over 1922 terms from all categories.

    View Full Glossary
    👋Questions? Chat with us!