AI Terms A-Z
Discover the most important terms in Artificial Intelligence – from Machine Learning to Deep Learning to Large Language Models. Each term is explained clearly with practical marketing examples.
A
A* Search
A* (pronounced "A-star") is a classical search algorithm that finds the shortest path between a start and a goal node in a graph by minimizing the total cost f(n) = g(n) + h(n) at every node — the sum of actual path cost so far and an estimated remaining distance (heuristic).
Abductive Logic Programming (ALP)
A framework in logic programming that allows certain premises to be left unspecified and then infers plausible explanations for observations.
Abductive Reasoning
A form of logical inference that starts from an observation and seeks the simplest and most likely explanation for it.
Ablation
In AI research, an ablation refers to the removal or disabling of a component of a system to assess that component's impact on the overall performance.
Action Language
A formal language used to describe state changes in a system – how actions affect the state of the world over time.
Action Model Learning
A machine learning approach focused on enabling an AI agent to learn the outcomes and requirements of its actions within an environment.
Action Selection
The process by which an intelligent agent decides "what to do next," choosing the next action from a set of possible actions.
Activation Function
A mathematical function used in artificial neural networks to determine the output of a node (neuron) given an input or set of inputs.
Active Learning
ML strategy where the model selects the most informative samples for labeling.
Adam Optimizer
Adaptive optimization algorithm with momentum and adaptive learning rates.
Adaptive Algorithm
An algorithm that changes its behavior or parameters in response to the problem instance or environment as it runs, aiming to improve performance on the fly.
Adaptive Learning
An educational methodology (often implemented with AI) that customizes learning content and pace to the individual needs and performance of each learner.
Adaptive Neuro-Fuzzy Inference System
A hybrid system that combines neural networks and fuzzy logic principles to create a model capable of learning from data while employing human-like reasoning.
Admissible Heuristic
A heuristic h(n) is called admissible if it never overestimates the true remaining cost from node n to the goal — i.e. it always provides an optimistic lower bound. This property guarantees that search algorithms like A* find an optimal path.
Agent Architecture
The underlying structure and components of an intelligent agent system, describing how the agent is organized internally to sense, think, and act.
AgentBench
A benchmark for evaluating LLM agents in 8 different interactive environments like websites, databases, games, and operating systems.
Agentic AI
AI systems that can autonomously pursue goals, make decisions, and take actions in the real world.
Agentic RAG
Agentic RAG is an evolution of retrieval-augmented generation in which an AI agent dynamically decides when, which and how many sources to query — instead of following a rigid retrieval pipeline with fixed top-k vector search.
AI Agent
An autonomous software system that uses AI to independently plan and execute tasks.
AI Ethics
The field and practice concerned with ensuring that artificial intelligence systems are developed and used in a morally responsible, fair, and safe manner.
AI Slop
Pejorative term for low-quality, mass-produced AI-generated content flooding the internet that provides no real value.
AI Watermarking
Techniques for embedding invisible markers in AI-generated content to prove its origin and enable detection of deepfakes.
AI-Complete
A problem is termed AI-complete if solving it by machine would essentially require general human-level intelligence.
Aider Polyglot Benchmark
Coding benchmark testing LLMs on real-world multi-file edits across multiple programming languages.
Algorithmic Discrimination
Algorithmic discrimination refers to the systematic disadvantage of certain groups by algorithmic decision systems – often as a result of biased training data or unbalanced model architectures.
Algorithmic Efficiency
Algorithmic efficiency measures how economically an algorithm uses computation time, memory, and energy – typically expressed in Big-O notation for scaling behavior.
Algorithmic Probability
A theoretical measure that assigns a probability to an observation by considering all possible algorithms that could produce it, weighted by their simplicity.
Alpha-Beta Pruning
An optimization technique for the minimax algorithm that prunes parts of the game tree without affecting the result.
Ant Colony Optimization
A probabilistic optimization technique inspired by the behavior of ants foraging for food and their use of pheromone trails.
Anytime Algorithm
An anytime algorithm is an algorithm that can return a valid — though not yet optimal — solution at any intermediate stage and monotonically improves solution quality with additional compute time.
Approximation Error
The difference between an exact, true value and an approximate value that is used or obtained by an algorithm or model.
ARC (AI2 Reasoning Challenge)
A multiple-choice benchmark with natural science questions at elementary and middle-school level in Easy and Challenge sets.
ARC-AGI-2
Benchmark by the ARC Prize Foundation that measures general reasoning ability of AI systems via abstract pattern tasks.
Artificial General Intelligence (AGI)
A hypothetical form of AI that possesses human-like cognitive abilities across all domains and can learn and adapt autonomously.
Artificial Neural Network (ANN)
An Artificial Neural Network (ANN) is a computational model inspired by the biological brain, consisting of layers of connected neurons that can learn to extract complex patterns from data by adjusting weights.
Assessment
Assessment is the measurement of knowledge, skill, or performance—used to diagnose current ability, provide feedback, and certify learning outcomes.
Attention Mechanism
Neural network module that weights relevant parts of the input.
Attributional Calculus
A logical framework combining predicate logic with multi-valued (fuzzy) logic to represent attributes of entities in an intuitive, human-readable way.
Audio Generation
The creation of audio content through AI models – from music to sound effects to speech and ambient sounds.
Autoencoder
A type of neural network designed to learn a compressed representation (encoding) of input data and then reconstruct the original data from this encoding.
Automated Machine Learning
The process of automating the end-to-end process of applying machine learning to real-world problems, including data preprocessing, model selection, and hyperparameter tuning.
Automated Planning
Automated planning is the AI subfield concerned with algorithms that, given an initial state, a goal state, and a set of possible actions, automatically find a sequence of actions (a plan) that achieves the goal.
AutoML (Automated Machine Learning)
AutoML automates parts of the machine learning lifecycle such as model selection, feature preprocessing, hyperparameter tuning, and validation.
B
Backpropagation
An algorithm for computing gradients in neural networks that propagates errors backwards through the network to adjust weights.
Backtracking
An algorithmic technique that systematically explores all possible solutions and returns to the last decision point when hitting dead ends.
Backward Chaining
An inference strategy that starts from the goal and works backward to find the facts and rules that would prove the goal.
Bagging
An ensemble learning method that trains multiple models on bootstrap samples and aggregates their predictions.
Batch Normalization
A technique for normalizing the inputs of each layer of a neural network across a mini-batch to stabilize training.
Batch Size
Number of training examples per gradient update.
Bayesian Optimization
Bayesian optimization is an approach to optimizing expensive black-box functions (e.g., model hyperparameters) using a probabilistic surrogate model and an acquisition function.
Beam Search
Beam search is a heuristic search algorithm that, at every search step, keeps only the k best partial solutions ("beam width") — a compromise between exhaustive breadth-first search (high quality, high cost) and greedy search (low quality, low cost).
BERT
BERT (Bidirectional Encoder Representations from Transformers) is a language model developed by Google that processes text bidirectionally, enabling deep contextual understanding.
BERT (Google)
Google's Transformer model for bidirectional language understanding.
BERTScore
A semantic evaluation metric that uses BERT embeddings to measure similarity between generated and reference text.
BGE Embedding
BGE (BAAI General Embedding) is a family of open-source embedding models from Beijing Academy of AI that achieve top results on MTEB.
Bi-Encoder
An encoder architecture that transforms query and document independently into embeddings – enabling fast similarity search over pre-computed vectors.
Bias (AI Bias)
Systematic distortions in AI systems that can lead to unfair or inaccurate outcomes for certain groups.
Bias-Variance Tradeoff
Fundamental ML dilemma between underfitting (high bias) and overfitting (high variance).
BIG-Bench
A collaborative benchmark with 200+ tasks created by 400+ researchers to test LLM capabilities beyond existing benchmarks.
BLEU Score
Metric for automatic evaluation of translation quality.
Boosting
An ensemble learning method that sequentially combines weak learners to create a strong classifier.
Breadth-First Search (BFS)
Breadth-First Search (BFS) traverses a graph level by level, exploring all neighbors of a node before moving deeper.
C
Calibration
The process of adjusting a model's predicted probabilities so they reflect actual event probabilities.
Chatbot
A software program that simulates conversations with humans, typically through text or voice interfaces.
Chatbot Arena
A public Elo-based leaderboard where users blindly choose between two LLMs – the most important benchmark for LLM ranking.
ChatGPT
A conversational AI system built on large language models that generates human-like responses to user prompts.
ChatGPT Agent
Autonomous mode of ChatGPT that independently executes multi-step tasks in browsers, apps, and files.
Chinchilla Optimal
The finding that for compute-optimal LLM training, the number of training tokens should scale proportionally to parameter count.
Classification
A supervised ML algorithm that assigns data to predefined categories or classes.
Claude Computer Use
Claude's capability to operate a desktop computer: mouse, keyboard, screenshots, and applications like a human user.
Claude Cowork
Collaborative multi-user mode of Claude for joint project work with shared context and role distribution.
Claude Design
Visual design mode of Claude for UI mockups, brand asset generation, and layout iteration via natural language.
Claude Opus 4.6
Anthropic's 2026 flagship LLM with extended reasoning, 1M-token context, and native computer-use capabilities.
Claude Skills
Modular system by Anthropic that bundles reusable capabilities (prompt + tools + data) for Claude.
CLIP (Contrastive Language–Image Pretraining)
A multimodal model approach that learns aligned representations of images and text by training them to match corresponding image–caption pairs.
Clustering
An unsupervised learning technique that groups data points into clusters such that items in the same cluster are more similar to each other.
Codex 5.3
OpenAI's specialized 2026 coding model for agentic software development and long-running tasks in repositories.
Cohere Embed
Cohere's commercial embedding API with special optimization for retrieval and distinction between query and document embeddings.
ColBERT
ColBERT is a late-interaction retrieval architecture that creates token-level embeddings for query and document, aggregating them via MaxSim during search.
Cold Start Problem
The problem when a system has insufficient data about a new user, item, or context to make accurate predictions or recommendations.
Collaborative Filtering
A recommendation approach that predicts a user's preferences based on the behavior of similar users or similarities between items.
Computer Vision
The AI subfield that enables computers to understand and interpret visual information.
Content-Based Filtering
Recommendations based on properties of items a user liked.
Context Engineering
The practice of designing, selecting, and structuring the information an LLM receives so it produces more reliable and relevant outputs.
Context Window
The portion of text (tokens) an LLM uses to generate its next output—typically consisting of instructions, conversation history, and retrieved content.
Contextual Bandit
A decision-making algorithm that chooses among actions using current context features, while learning from feedback to balance exploration and exploitation.
Contrastive Learning
A representation learning approach that trains models to pull similar pairs closer and push dissimilar pairs apart in embedding space.
Convolutional Neural Network (CNN)
A neural network architecture that uses convolution operations to learn hierarchical feature representations from grid-like data such as images.
Cross-Encoder
An encoder architecture that processes query and document together and outputs a relevance score – more precise than bi-encoders but slower.
Cross-Entropy Loss
Loss function for classification tasks based on information theory.
Cross-Validation
A technique for evaluating model performance by training and testing on different data subsets.
Curriculum Learning
Training strategy where samples are presented in a meaningful order – from easy to hard, similar to a curriculum.
Custom GPT
GPT tailored to a specific use case with its own prompt, knowledge base, and tool set, hosted by OpenAI.
D
Data Augmentation
Techniques for artificially expanding training data through transformations.
Decision Making
Decision making is the process of selecting an action (or non-action) among alternatives based on goals, evidence, constraints, and uncertainty.
Decision Theory
Decision theory studies how agents should make choices under uncertainty, often by maximizing expected utility subject to constraints.
Decision Tree
An ML model that represents decisions as a tree structure with branches based on feature values.
Decoder
The part of a model that transforms a compressed representation back to the original format.
Decoding
The process of converting encoded data or signals back to their original or usable form, in ML specifically the token-by-token generation of outputs.
Decoding Strategy
A decoding strategy is the method used to convert a model token probability distribution into an actual output sequence.
Deductive Reasoning
A form of logical inference where specific conclusions are drawn from general premises—if the premises are true, the conclusion is guaranteed to be true.
Deep Learning
A subfield of machine learning that uses deep neural networks with many layers to learn complex patterns from data.
Deep Reinforcement Learning
Reinforcement learning that uses deep neural networks to learn policies that choose actions to maximize long-term reward.
DeepSeek
Chinese AI startup developing powerful open-source language models, competing with Western providers at significantly lower costs.
DeepSeek R1
An open-source reasoning model from DeepSeek that competes with GPT-4 and Claude on complex thinking and coding tasks.
DeepSeek V4
Open-weight flagship by DeepSeek that reaches comparable benchmarks at 1/10 the training cost of Western models.
DeepWalk
A graph embedding algorithm that combines random walks on graphs with Word2Vec to learn node representations.
Default Reasoning
Default reasoning draws conclusions using 'defaults' that hold in typical cases, while allowing exceptions when new information arrives.
Denoising
Denoising is the process of removing noise from a signal; in diffusion models, it's the iterative transformation from noisy latents to a clean sample.
Dense Passage Retrieval
A retrieval approach using bi-encoder embeddings for query and passages – the foundation of modern semantic search.
Dense Retrieval
Retrieval method that uses dense vector representations (embeddings) to find semantically similar documents.
Depth-First Search (DFS)
Depth-First Search (DFS) traverses a graph by going as deep as possible along one path before backtracking.
Dialogue Management
Component of a conversational AI system that controls the conversation flow.
Diffusion Model
Diffusion models are generative AI models that learn to gradually remove noise from data to produce high-quality samples (images, audio, video).
Disclosure UX
Disclosure UX is the set of interface patterns that transparently communicate important system facts to users (e.g., AI involvement, limitations, data use, confidence, and provenance).
DPO (Direct Preference Optimization)
A simplified alternative to RLHF that optimizes models directly on preference data, without separate reward model or RL training.
DROP (Discrete Reasoning Over Paragraphs)
A reading comprehension benchmark that requires numerical reasoning over text passages (counting, sorting, arithmetic).
Dropout
A regularization technique that randomly deactivates neurons during training.
E
E5 Embedding
E5 is a family of embedding models from Microsoft Research created through text-to-text contrastive training.
Early Stopping
Regularization technique that stops training when validation loss increases.
Elo Rating
A rating system for measuring relative abilities, originally from chess – now standard for LLM leaderboards.
Embedding
An embedding is a dense vector representation of discrete data (words, images, users, products) where semantically similar objects lie close together in vector space.
Embeddings
Vector representations of data (words, sentences, images) in a lower-dimensional space that capture semantic similarity.
Emergent Abilities
Capabilities that suddenly appear in LLMs only above a certain model size, without being observable in smaller models.
Encoder
The part of a model that transforms input data into a compressed representation.
Encoder-Decoder
Architecture that encodes input into a representation and decodes output from it.
Ensemble Learning
Combining multiple models to achieve better predictions than any single model alone.
Entity Extraction
The automatic identification and classification of named entities in text.
Epoch (Machine Learning)
In machine learning, an epoch refers to one complete pass of a learning algorithm through the entire training dataset — i.e. the moment in which every training example has been used exactly once to update the model weights.
Error Analysis
Systematic examination of model errors to identify patterns and improvement opportunities.
Evaluation Harness
A framework for systematically evaluating model performance across various metrics and test cases.
Explainability
The ability to make an AI model's decisions or predictions understandable to humans.
Explainability UX Patterns
Explainability UX patterns are interface patterns that help users understand why an AI system produced an output, what evidence it used, and what actions it took (or refused).
Explainable AI (XAI)
AI systems whose decision-making processes are comprehensible and interpretable to humans.
Explainable AI (XAI)
Explainable AI (XAI) comprises methods and product practices that make AI outputs understandable, traceable, and auditable.
F
Faithfulness
How accurately an LLM output corresponds to the provided sources and instructions.
Feature Extraction
The process of automatically deriving relevant features from raw data.
Federated Learning
An ML approach where models are trained decentrally on local data without sharing raw data.
Feedback Loop
A system where outputs are fed back to influence future inputs or decisions.
Few-Shot Learning
A model's ability to learn and generalize from very few examples.
Fine-Tuning
Adapting a pre-trained model to a specific task by further training it on task-specific data.
Flash Attention
A memory-efficient implementation of the attention mechanism for transformers.
Forward Chaining
An inference strategy that starts from known facts and applies rules to derive new facts until the goal is reached.
Forward Pass
Computing the model output by forward propagating through all layers.
Foundation Model
A large model pre-trained on broad data that can be adapted for many downstream tasks.
Function Calling
An LLM's ability to generate structured calls to external tools or APIs.
Fuzzy Inference System
A fuzzy inference system uses fuzzy logic rules to map inputs to outputs when concepts are imprecise (e.g., "high risk," "medium demand").
G
G-Eval
An LLM evaluation framework that uses chain-of-thought reasoning and weighted probabilities for more nuanced scoring.
Gaussian Mixture Model (GMM)
A probabilistic model representing data as a mixture of Gaussian distributions.
Gemini 3.1 Pro
Google's 2026 flagship LLM with natively multimodal architecture and 2M-token context.
Gemma 4
Open-weight model family by Google for on-device and edge inference, ranging from 2B to 27B parameters.
Generalization
A model's ability to perform well on new, unseen data.
Generative Adversarial Network (GAN)
Architecture with two competing networks for generating realistic data.
Generative AI
AI models that create new content – text, images, audio, code, or structured data.
GloVe
GloVe (Global Vectors for Word Representation) is a word embedding method that uses global co-occurrence statistics of a text corpus to generate semantic word vectors.
Governance
Governance is the set of roles, rules, processes, and controls that ensure a system is used responsibly and predictably—aligned with risk, compliance, and business objectives.
GPQA (Graduate-Level Google-Proof Q&A)
A benchmark with 448 expert-level questions from physics, biology, and chemistry, so difficult that even PhDs without expertise only achieve 30%.
GPQA Diamond
High-difficulty science benchmark with PhD-level questions in biology, physics, and chemistry.
GPT (Generative Pre-trained Transformer)
A family of large language models by OpenAI based on the Transformer architecture.
GPT-5.4
OpenAI's 2026 flagship LLM with thinking mode, multimodal processing, and agent-native architecture.
Gradient Descent
An optimization algorithm that iteratively adjusts parameters in the direction of steepest descent of the loss function.
Graph Attention Network
A GNN architecture that uses attention mechanisms to adaptively weight the importance of neighboring nodes.
Graph Classification
The task of assigning an entire graph to a class based on its structure and node properties.
Graph Convolutional Network
A GNN variant that generalizes convolution operations to graphs to learn node representations.
Graph Isomorphism Network
A GNN with maximum discriminative power among message-passing architectures, theoretically grounded by the Weisfeiler-Leman test.
Graph Neural Network
A class of neural networks that operate directly on graph structures, learning node, edge, and graph-level properties.
Graph Search
Graph search is the process of exploring a graph to find a target node, a path, or an optimal solution under a defined objective (e.g., shortest path, lowest cost).
Graph Transformer
An architecture that applies Transformer attention to graph structures, enabling global node interactions.
GraphSAGE
An inductive GNN framework that learns scalable node representations by sampling and aggregating neighborhoods.
Greedy Algorithm
An algorithm that makes the locally optimal choice at each step.
Greedy Best-First Search
Greedy Best-First Search expands the node that appears closest to the goal using only a heuristic score h(n), ignoring the cost accumulated so far.
Greedy Decoding
A decoding strategy that always selects the token with the highest probability – deterministic, but often repetitive.
Ground Truth
The actual, correct data or labels used as reference for model training and evaluation.
Grounding
Connecting AI outputs to real, verifiable facts and sources.
GRU (Gated Recurrent Unit)
A simplified RNN architecture with gates to control information flow.
GSM8K
A benchmark with 8,500 grade-school math problems that require multi-step reasoning.
Guardrails
Safety mechanisms that prevent AI systems from producing harmful, inappropriate, or erroneous outputs.
Guidance Scale
Guidance scale is a parameter (commonly in classifier-free guidance) that controls how strongly a diffusion model follows the text prompt versus generating more diverse outputs.
H
Hallucination
When an LLM generates information not supported by input context or reliable sources.
Hallucination Rate
The percentage of AI-generated outputs containing information not supported by facts or sources.
HellaSwag
A benchmark for common-sense reasoning where LLMs must choose the most plausible continuation of a scenario.
HELM (Holistic Evaluation of Language Models)
A comprehensive evaluation framework from Stanford that assesses LLMs on dozens of dimensions like accuracy, fairness, robustness, and efficiency simultaneously.
Heterogeneous Graph
A graph with different types of nodes and/or edges, modeling various entity types and relationships.
Heuristic
A heuristic is a practical scoring rule or estimate that guides search or decision-making toward promising options without guaranteeing optimality.
Heuristic Search
Heuristic search is a family of search algorithms that use a heuristic (a guiding estimate) to explore a problem space more efficiently than uninformed search.
High-Level Representation
A high‑level representation abstracts raw data into more meaningful structures (symbols, concepts, latent variables, or summaries).
HNSW
Hierarchical Navigable Small World – a graph-based algorithm for efficient approximate nearest neighbor search.
Human Evaluation
The evaluation of AI outputs by human annotators – the gold standard for quality measurement, but expensive and slow.
Human-in-the-Loop
An AI design approach where humans are actively involved in the decision-making or training process of an AI system.
HumanEval
A benchmark for code generation with 164 Python programming tasks, evaluated by Pass@k (code must pass tests).
Hybrid AI System
A hybrid AI system combines multiple AI paradigms—typically symbolic/rule-based methods with statistical/ML models (including LLMs).
Hybrid Search
Combining keyword retrieval (BM25) with vector retrieval (embeddings) for better search results.
Hyperparameter
Configuration settings chosen before training that influence how a model learns.
Hyperparameter Optimization
The systematic process of finding the best hyperparameter settings for an ML model.
Hypothesis Generation
Hypothesis generation is producing candidate explanations (or candidate solutions) that could plausibly account for observed evidence.
I
Identity-Preference Optimization
An alignment method that extends DPO for more stable training.
IFEval (Instruction Following Evaluation)
A benchmark that tests how well LLMs follow explicit format instructions (e.g., "Answer in exactly 3 paragraphs", "Start each sentence with a capital letter").
Image Captioning
Automatic generation of text descriptions for images.
Image Generation
Image generation is the automatic creation of images by AI models based on text prompts, other images, or other inputs.
Image Segmentation
Dividing an image into meaningful regions or objects at the pixel level.
Image-to-Image
Models that transform an input image into a modified or transformed output image.
ImageBind
Meta's multimodal embedding model that unifies six modalities (image, text, audio, video, depth, thermal) in a shared vector space.
Imitation Learning
An ML approach where an agent learns by observing and imitating expert behavior.
In-Context Learning
The ability of LLMs to learn from examples in the prompt without changing the model weights.
Inductive Reasoning
A form of logical inference where general rules or patterns are derived from specific observations—the conclusion is probable but not guaranteed.
Inference
Using a trained model to make predictions on new, unseen data.
Inference Engine
The core component of an expert system that applies logical rules to a knowledge base to derive new facts or make decisions.
Information Retrieval
Finding relevant documents or information from a large collection.
Inpainting
Filling in missing or masked regions of an image with plausible content.
Instruction Tuning
Fine-tuning an LLM on instruction-response pairs to better follow instructions.
Instructor Embedding
An embedding model that uses task-specific instructions in the prompt to optimize embeddings for different tasks.
Intelligent Tutoring System
An Intelligent Tutoring System (ITS) is an AI-driven learning system that personalizes instruction, feedback, and practice to a learner's needs.
Intent Classification
Determining the intention or goal behind a user query.
Intent Recognition
AI capability to recognize the intent behind a user utterance.
Interpretability
The degree to which humans can understand how a model arrives at its decisions.
Inverse Reinforcement Learning
Inferring the reward function from observed expert behavior.
Iterative Deepening
Iterative deepening is a search strategy that repeatedly runs depth-limited search with increasing depth limits until it finds a solution or exhausts a budget.
Iterative Prompting
A prompting approach that refines results through multiple successive prompts.
J
Jaccard Similarity
A similarity measure between two sets, defined as the size of the intersection divided by the size of the union.
Jailbreak
Techniques that bypass LLM safety measures to produce unwanted or harmful outputs.
Jevons Paradox
The Jevons Paradox states that technological progress increasing the efficiency of a resource often leads to higher, not lower, overall consumption of that resource – because falling costs disproportionately increase demand.
Joint Distribution
The probability distribution describing the probability of combinations of multiple random variables.
JSON Mode
A model mode that guarantees the output is valid JSON.
Judge LLM
An LLM used to evaluate and rank outputs from other LLMs.
K
K-Armed Bandit
The k-armed bandit problem models choosing among k options to maximize reward while balancing exploration vs exploitation.
K-Fold Cross-Validation
K-fold cross-validation is an evaluation method where data is split into k parts; the model trains on k−1 folds and is tested on the remaining fold.
K-Means Clustering
K-means is an unsupervised algorithm that partitions data into k clusters by minimizing within-cluster distance to cluster centroids.
K-Means++
K-means++ is an initialization method for k-means that chooses starting centroids to improve convergence and cluster quality.
K-Shot Prompting
K-shot prompting provides k examples in the prompt to guide the model's behavior (format, reasoning pattern, tone).
Kernel (ML)
In ML, a kernel is a function that measures similarity between data points, enabling algorithms to operate in implicit high-dimensional feature spaces.
Kernel Trick
The kernel trick allows algorithms to compute dot products in an implicit higher-dimensional space without explicitly transforming the data.
KNN (k-Nearest Neighbors)
KNN is a method that predicts outcomes based on the k most similar examples in a dataset.
KNN Search
KNN search retrieves the k closest vectors to a query vector under a distance metric.
Knowledge Base (KB)
A knowledge base is a curated repository of information (articles, FAQs, policies) designed for retrieval and reuse.
Knowledge Cutoff
Knowledge cutoff is the point in time after which a model's training data does not include new information.
Knowledge Distillation
Transferring knowledge from a large model (teacher) to a smaller model (student).
Knowledge Distillation
Training a smaller model (student) using a larger model (teacher).
Knowledge Graph Embedding
Methods that embed entities and relations of a knowledge graph into a low-dimensional vector space.
Knowledge Tracing
Knowledge tracing models a learner's evolving mastery of skills over time using their interactions (answers, attempts, time, hints).
KTO (Kahneman-Tversky Optimization)
An alignment method that only needs binary feedback (good/bad) instead of pairwise preferences, inspired by Prospect Theory.
KV Cache (Key-Value Cache)
KV cache stores attention key/value tensors from previous tokens during transformer inference so the model doesn't recompute them each step.
L
L1 Regularization (Lasso)
L1 regularization adds a penalty proportional to the absolute value of model weights, encouraging sparsity (many weights become exactly zero).
L2 Regularization (Ridge)
L2 regularization adds a penalty proportional to the square of model weights, encouraging smaller weights without forcing exact zeros.
Label Leakage
Label leakage describes the situation in which a machine-learning model's training dataset contains features that carry direct or indirect information about the target variable (the label) — information that simply would not be available at inference time in production.
Label Smoothing
Label smoothing is a training technique that replaces hard labels (0 or 1) with slightly softened targets (e.g., 0.9 and 0.1).
Language Model (LM)
A language model is a model that estimates the probability of sequences of tokens, enabling tasks like prediction, generation, and scoring.
Large Language Model (LLM)
A large neural network trained on vast amounts of text to understand and generate human-like text.
Large Language Model (LLM)
A large neural network trained on massive amounts of text that can understand and generate human-like text.
Late Interaction
A retrieval paradigm where query and document tokens are encoded independently but interact via token-level similarity only at search time.
Latent Space
A compressed, lower-dimensional space where a model stores internal representations of data.
Latent Variable
A latent variable is an unobserved variable inferred from observed data, used to explain hidden structure.
Layer Normalization
Layer normalization is a technique that normalizes activations within a layer to stabilize and speed up training in deep networks.
Learning Objectives
Learning objectives are clear, measurable statements of what a learner should be able to do after instruction.
Learning Rate
A hyperparameter that determines how much to adjust model weights at each training step.
Learning Rate Schedule
A learning rate schedule changes the learning rate over training (warmup, decay, cosine, step, exponential).
Learning-to-Rank (LTR)
Learning-to-rank trains models to order results (documents, products, answers) by relevance for a given query.
Length Penalty
Length penalty is a decoding adjustment that prevents generation algorithms (especially beam search) from unfairly preferring overly short sequences.
LIME (Local Interpretable Model-agnostic Explanations)
LIME (Local Interpretable Model-agnostic Explanations) explains an individual model prediction by fitting a simple, interpretable surrogate model around that specific input.
Link Prediction
The task of predicting missing or future edges in a graph.
LiveCodeBench
Contamination-free coding benchmark that continuously adds new programming tasks from competitions.
LLM-as-a-Judge
LLM-as-a-judge uses a model to evaluate other model outputs against rubrics like correctness, groundedness, style, and safety.
LLM-as-Judge
An evaluation method where an LLM evaluates the quality of outputs from another (or the same) model.
LLMOps
The operational discipline of building, deploying, and governing LLM-based systems end-to-end—covering prompts, retrieval, tools, evaluation, safety, and cost.
LMSYS
LMSYS (Large Model Systems Organization) is a research organization that operates the famous Chatbot Arena benchmark and enables LLM performance comparisons through human evaluations.
Log-Likelihood
Log-likelihood is the logarithm of the likelihood that a probabilistic model assigns to observed data.
Log-Sum-Exp
Log-sum-exp is a numerical trick for computing log(∑ᵢ eˣⁱ) stably without overflow/underflow.
Logit
A logit is the raw, unnormalized score a model outputs before converting to probabilities (e.g., via softmax).
Logit Bias
Logit bias is a technique to increase or decrease the likelihood of specific tokens during generation by adjusting their logits.
Long Context
Long context refers to an LLM's ability to accept and use a large number of input tokens in a single request.
LoRA vs Full Fine-Tuning
A comparison between adapting a model via LoRA adapters versus updating all parameters (full fine-tuning).
Loss Function
A mathematical function that measures how good or bad a model's predictions are.
M
Machine Learning
A subfield of AI where systems learn from data to make predictions or decisions without being explicitly programmed.
Mamba
Mamba is a neural network architecture built on selective state space models (SSMs) designed to model long sequences efficiently with linear scaling in sequence length.
Manus AI
An autonomous general-purpose AI agent capable of independently executing complex tasks like research, coding, and data analysis.
Masked Language Modeling (MLM)
MLM is a training objective where a model predicts masked-out tokens in a text sequence (e.g., replacing words with a special [MASK] token).
Mastery Learning
Mastery learning is an instructional approach where learners progress only after demonstrating mastery of a skill or objective, with targeted remediation as needed.
MATH Benchmark
A benchmark with 12,500 competition mathematics problems (from algebra to number theory) that tests advanced mathematical reasoning.
Matrix Factorization
A technique for decomposing a matrix into the product of smaller matrices.
Matryoshka Embedding
An embedding training approach where the first N dimensions of a vector are already usable – enabling flexible compression without quality loss.
Matryoshka Representation Learning (MRL)
Matryoshka Representation Learning (MRL) is an embedding approach that encodes information at multiple granularities so a single embedding can be truncated to smaller dimensions while remaining useful for downstream tasks.
Max Tokens
An API parameter that limits the maximum number of tokens an LLM can generate in a response.
MBPP (Mostly Basic Python Problems)
A benchmark with 974 simple Python programming tasks that test basic programming abilities of LLMs.
Mechanistic Interpretability
Mechanistic interpretability is the effort to reverse engineer neural networks by identifying internal mechanisms (features, circuits, algorithms) that produce outputs.
Message Passing Neural Network
A unifying framework for GNNs where nodes receive messages from neighbors, aggregate them, and update their representations.
Meta-Learning
Meta-learning ("learning to learn") aims to train models or systems that adapt quickly to new tasks with limited data or few examples.
Metaprompt
A metaprompt is a higher-level prompt that defines the rules, structure, and constraints for generating other prompts or for a whole class of outputs.
METEOR
An evaluation metric for machine translation that combines unigram matching with stemming, synonyms, and word order.
Metric Learning
Metric learning trains models to learn a distance function (embedding space) where "similar items are close" and "dissimilar items are far apart."
Minimum Description Length
Minimum Description Length (MDL) is a principle for model selection that prefers the model that yields the shortest total description of the model plus the data encoded under it.
Mixed Precision Training
Mixed precision training uses a mix of lower-precision (e.g., FP16/BF16) and single-precision (FP32) representations to speed up training while preserving accuracy.
Mixture of Experts (MoE)
MoE is a model architecture where different "expert" sub-networks specialize, and a router selects which experts handle each token/input.
MLCommons
Industry consortium developing open benchmarks (MLPerf), datasets, and best practices for ML performance.
MMLU (Massive Multitask Language Understanding)
A multiple-choice benchmark with 57 subject areas (STEM, humanities, social sciences) for measuring LLM world knowledge.
MMLU-Pro
Extended MMLU benchmark with more challenging multiple-choice questions and reduced guessing advantage.
MMR (Maximal Marginal Relevance)
MMR is a retrieval diversification method that selects items that are both relevant to the query and non-redundant with each other.
Model Card
A model card is a standardized documentation artifact describing a model's intended use, limitations, training data context, evaluation results, and ethical/safety considerations.
Model Collapse
Model collapse is a degradation phenomenon where training on synthetic/model-generated data (especially repeatedly) can reduce diversity and quality, causing the model to "collapse" toward narrower outputs.
Model Compression
Techniques for reducing the size of ML models while maintaining performance.
Model Drift
Model drift is performance degradation over time due to changes in data distributions, user behavior, environment, or upstream systems.
Model Monitoring
Continuous monitoring of ML model performance and behavior in production.
Model Simplification
Model simplification reduces complexity to improve interpretability, efficiency, robustness, or deployment feasibility.
Model Spec
A model spec is a written specification describing how a model should behave—including intended behavior, constraints, and principles—often used to guide training, alignment, and deployment policy.
Model-Based Learning
Model‑based learning learns a model of the environment (dynamics) and uses it for planning, prediction, or control.
Monte Carlo Dropout (MC Dropout)
Monte Carlo Dropout estimates model uncertainty by keeping dropout active at inference time and performing multiple stochastic forward passes, then aggregating results.
MT-Bench
A multi-turn conversation benchmark for LLMs with 80 questions across 8 categories, evaluated by GPT-4-as-Judge.
MTEB
The Massive Text Embedding Benchmark – a comprehensive benchmark for text embedding models across 56+ datasets in 8 tasks.
Multi-Armed Bandit
An algorithm for sequential decision-making that balances exploration and exploitation.
Multi-Objective Optimization
Multi-objective optimization (Pareto optimization) is optimization with multiple objectives that often conflict, where you typically seek Pareto-optimal solutions rather than one single optimum.
Multi-Turn Conversation
A multi-turn conversation is an interaction where context and intent evolve across multiple exchanges rather than a single query-response.
Multimodal
AI systems that can process and understand multiple data types (text, image, audio, video) simultaneously.
Multimodal AI
AI systems that can process multiple modalities (text, image, audio, video).
Multimodal AI
AI systems that jointly process text, image, audio, and video and can respond in any modality.
Multimodal Model
A multimodal model can process and/or generate across multiple data types (e.g., text, images, audio, video).
N
N-gram Blocking
N-gram blocking is a decoding constraint that prevents a model from generating an n-gram (sequence of n tokens) that has already appeared in the generated text.
N-Shot Prompting
N-shot prompting provides N examples in the prompt to teach the model the desired pattern (0-shot = instructions only; few-shot = small N).
N+1 Tool Call Problem
The N+1 tool call problem happens when an AI workflow makes one initial tool call and then makes N additional tool calls (often one per retrieved item), causing unnecessary latency and cost.
Named Entity Canonicalization
Entity canonicalization is standardizing different surface forms of the same entity into one canonical representation (e.g., "OpenAI Inc.", "OpenAI", "Open AI").
Named Entity Linking (NEL)
Named Entity Linking connects an entity mention in text (e.g., "OpenAI", "Apple", "Paris") to a specific canonical entity ID in a knowledge base (internal or external).
Named Entity Recognition (NER)
Identifying and classifying named entities in text (people, places, organizations).
Named Entity Recognition (NER)
NLP task for identifying and classifying named entities in text.
Nano Banana
Codename for Google's image editing model (Gemini 2.5 Flash Image) enabling pixel-precise edits via prompt.
Nano Banana 2
Google's second-generation AI image generation model, based on Gemini 3.1 Flash Image, combining Pro quality with Flash speed.
Narrow AI / Weak AI
Narrow AI (also "weak AI") is AI designed to perform a specific task or a limited set of tasks, rather than general-purpose reasoning across domains.
Natural Gradient
Natural gradient is an optimization approach that accounts for the geometry of parameter space, often leading to more efficient steps than standard gradient descent in some probabilistic models.
Natural Language Generation
Natural Language Generation (NLG) is the process of producing human-readable text from data, intent, or internal representations (rules, templates, or neural models).
Natural Language Processing (NLP)
The field of AI concerned with the interaction between computers and human language.
Natural Questions (NQ)
A question answering benchmark from Google with real search queries and Wikipedia articles as answer sources.
Negative Cycle
A negative cycle is a cycle in a weighted graph whose total weight is negative, allowing path cost to be reduced indefinitely by looping.
Negative Prompting
Negative prompting is explicitly telling a generative model what to avoid (content, style, formatting, claims) during generation.
Negative Transfer
Negative transfer occurs when transferring knowledge from a pretrained model or source task hurts performance on the target task.
Negative Weights
Negative weights are negative edge costs in a weighted graph (i.e., an action/transition reduces total cost).
NeRF (Neural Radiance Fields)
NeRFs are neural methods for representing 3D scenes by learning a function that maps spatial coordinates and viewing direction to color and density, enabling novel view synthesis.
Neural Architecture Search
Automatic search for optimal neural network architectures.
Neural Code Search
Neural code search retrieves relevant code snippets or files using embeddings and semantic matching rather than exact keyword search.
Neural Collapse
Neural collapse is a phenomenon observed in deep classifiers near the end of training where learned representations and classifier weights exhibit a highly structured geometry (classes become tightly clustered and symmetrically arranged).
Neural Embeddings
Neural embeddings are learned vector representations of items (text, users, products, documents) such that distance in vector space reflects similarity.
Neural Index Rebuild
A neural index rebuild is re-generating embeddings and rebuilding vector (or hybrid) indexes after changes to content, chunking, or the embedding model.
Neural Indexing
Neural indexing is using learned representations and neural methods to build or optimize an index for retrieval (often in vector search or learned sparse retrieval).
Neural IR (Neural Information Retrieval)
Neural IR is the use of neural models (embeddings, cross-encoders, rerankers) to retrieve and rank documents based on semantic relevance.
Neural Network
A computational model inspired by the structure of biological neurons, consisting of interconnected nodes (neurons) in layers.
Neural Ordinary Differential Equation (Neural ODE)
Neural ODEs model transformations as continuous-time dynamics defined by a neural network, enabling certain efficiency and modeling properties.
Neural Pruning
Neural pruning removes weights, neurons, attention heads, or entire structures from a model to reduce compute/memory while trying to preserve performance.
Neural Reranking
Neural reranking uses a model (often a cross-encoder) to re-score and reorder an initial set of retrieved candidates based on deeper query–candidate understanding.
Neural Retrieval
Neural retrieval is retrieving relevant items using learned representations (dense embeddings and similarity search) instead of relying purely on keyword matching.
Neural Scaling Laws
Scaling laws describe empirical relationships showing how model performance tends to improve predictably as you increase compute, data, and/or model parameters—often following power-law-like trends.
Neural Style Transfer (NST)
Neural style transfer is a technique that applies the "style" of one image (textures, patterns) to the "content" of another, using neural representations.
Neural Topic Routing
Neural topic routing is using ML/embeddings to classify or route an input (query, pageview, conversation) into a topic, workflow, or handler based on semantic meaning.
Neuro-Symbolic "Verification Layer"
A neuro-symbolic verification layer is a system component that checks neural outputs against symbolic constraints (rules, schemas, policies) before acting or publishing.
Neuro-Symbolic AI
Neuro-symbolic AI combines neural methods (LLMs, embeddings) with symbolic methods (rules, logic, knowledge graphs) to improve reliability, interpretability, and constraint satisfaction.
Next Best Question (NBQ)
Next Best Question is a conversational design and decisioning pattern where a system asks the single most valuable clarifying question to progress toward a correct outcome.
Next Sentence Prediction (NSP)
Next Sentence Prediction is a training objective where a model predicts whether one sentence likely follows another in the original text.
NL2SQL (Natural Language to SQL)
NL2SQL converts natural language questions into SQL queries that can be executed against a database.
NLP (Natural Language Processing)
Natural Language Processing (NLP) is the subfield of AI concerned with the machine processing, interpretation, and generation of natural language.
No Free Lunch Theorem
The No Free Lunch theorem (in optimization/learning) states that averaged over all possible problems, no one algorithm performs better than all others—performance depends on the problem distribution.
Node2Vec
An algorithm that learns dense vector representations for graph nodes through biased random walks.
Noise Injection
Noise injection is deliberately adding noise during training or processing to improve robustness, generalization, or privacy.
Noise Schedule
A noise schedule defines how much noise is added (and later removed) at each step in a diffusion model's forward and reverse processes.
Noisy Student Training
Noisy Student Training is a semi-supervised learning approach where a "teacher" model labels unlabeled data, and a "student" model is trained on a mix of labeled + pseudo-labeled data with noise/augmentation.
Nomic Embed
Open-source embedding models from Nomic AI with full reproducibility – all training data and code are public.
Non-Maximum Suppression (NMS)
Non-maximum suppression is a post-processing step in object detection that removes redundant overlapping bounding boxes, keeping only the most confident ones.
Non-Monotonic Logic
A logical system where conclusions can be retracted when new information arrives that contradicts previous assumptions.
Nonlinear Activation Function
A nonlinear activation function introduces nonlinearity into neural networks (e.g., ReLU, GELU, tanh), enabling them to model complex relationships beyond linear transformations.
Normalization
Normalization is the transformation of numerical data to a unified value range (often 0–1 or mean 0 / standard deviation 1) to improve the training stability of machine learning models.
Normalization Layer
A normalization layer is a neural network component that normalizes activations to improve training stability and convergence (e.g., LayerNorm, RMSNorm).
Normalizing Flow
A normalizing flow is a generative modeling approach that transforms a simple distribution (e.g., Gaussian) into a complex one via a sequence of invertible transformations with tractable likelihoods.
Novel Class Discovery (NCD)
Novel class discovery finds previously unknown categories in unlabeled data while leveraging knowledge from known classes.
NT-Xent Loss (Normalized Temperature-Scaled Cross-Entropy)
NT-Xent is a contrastive learning loss used to train embeddings by pulling positive pairs together and pushing negatives apart, with a temperature term controlling distribution sharpness.
O
Object Detection
Identification and localization of objects in images or videos.
Observability for LLM Apps
LLM observability extends classic observability with AI-specific signals: prompt/version tracking, retrieval evidence, tool traces, token usage, and quality/safety metrics.
Off-Policy Evaluation (OPE)
Estimates how a new decision policy would perform using data collected from a different (existing) policy—without deploying the new policy.
Offline Evaluation
Measures model/system performance using predefined datasets and metrics before production rollout.
On-Device Inference
Runs a model locally on a user's device (phone, laptop, edge hardware) instead of calling a cloud API.
One-Shot Learning
Ability to learn and generalize from a single example.
One-Shot Prompting
Provides a single example in the prompt to demonstrate the desired output pattern.
Online Evaluation
Measures performance on real user traffic (A/B tests, canaries, interleaving, holdouts) after deployment.
Online Learning
Updates a model incrementally as new data arrives, rather than retraining from scratch in large batches.
Ontology
A formal representation of concepts and relationships in a domain (entities, classes, properties, constraints).
Open-Weight Model
A model whose trained weights are publicly available, enabling self-hosting and deeper customization.
OpenAI Embeddings
OpenAI's commercial embedding API with text-embedding-3-small and text-embedding-3-large – the easiest path to high-quality embeddings.
OpenAI o1
OpenAI's first o-series model that uses explicit reasoning with chain-of-thought for complex problem-solving.
OpenAI o3
Advanced reasoning model from OpenAI with improved performance in mathematics, coding, and scientific reasoning.
OpenLLM Leaderboard
A public leaderboard by Hugging Face that compares open-source LLMs on standardized benchmarks (MMLU, HellaSwag, etc.).
Operationalization
Turning a concept, model, or prototype into a repeatable, reliable, governed production capability with clear ownership, monitoring, and change control.
Optimization
The process of finding parameter values that minimize a loss function or maximize an objective under constraints.
Optimizer
The algorithm that updates model parameters during training (e.g., SGD, Adam), based on gradients and configuration.
Orchestration
Coordinates multiple steps, services, and tools into a reliable workflow—often with state, retries, and observability.
Orchestrator
The system component that implements orchestration logic—deciding the next step, calling tools, managing state, and enforcing budgets/guardrails.
ORPO (Odds Ratio Preference Optimization)
An evolution of DPO that combines SFT and preference alignment in a single training step.
Out-of-Distribution (OOD) Detection
Identifies inputs that differ significantly from what a model was trained on, signaling increased uncertainty and risk.
Output Guardrails
Controls applied to model outputs to enforce safety, policy, formatting, and correctness constraints before displaying or acting.
Output Length Control
The set of techniques used to shape response length and structure (token limits, section caps, templates, validators).
Output Parsing
Extracting structured fields from model output (JSON, YAML, XML, or patterns) so downstream systems can reliably use it.
Output Token
A token generated by a language model as part of its response.
Over-Generation
Producing more output than needed (too long, too verbose, too many steps), increasing cost and reducing user clarity.
Over-Retrieval
Retrieving too many documents/chunks for a query, increasing cost and often reducing answer quality due to noise and context dilution.
Overfitting
When a model learns training data too well and generalizes poorly to new data.
Overlapping Chunks
A chunking strategy where consecutive text chunks share some repeated content (overlap) to preserve context across chunk boundaries.
P
Paged Attention
An inference optimization that manages the KV cache in "pages" (blocks) to reduce memory fragmentation and improve throughput for serving LLMs.
Parallel Tool Calls
Executing multiple tool/API calls concurrently rather than sequentially, reducing end-to-end latency.
Parameter Count
The number of learned weights in a model, often used as a rough proxy for capacity and compute needs.
Parameter Sharing
A modeling technique where multiple parts of a neural network reuse the same weights instead of having separate parameters.
Parameter-Efficient Fine-Tuning (PEFT)
A set of techniques that adapt a pretrained model to a task by training only a small subset of parameters (or additional small modules).
Passage Reranking
Reorders retrieved passages using a stronger relevance model (often a cross-encoder) to improve precision before generation.
Passage Retrieval
Finds relevant passages (chunks) of text rather than whole documents, improving precision for question answering and RAG.
Pathfinding
Pathfinding is the process of finding a route between nodes in a graph that optimizes an objective (shortest, cheapest, safest, fastest).
PDDL (Planning Domain Definition Language)
A standardized language for describing planning problems in AI that formally defines states, actions, and goals.
Perceptron
The Perceptron is the simplest form of an artificial neuron and the foundation of modern neural networks – a linear classifier that weighted-sums inputs and passes them through an activation function.
Perplexity
A language model metric derived from the average negative log-likelihood; measures how "surprised" a model is by text.
Perplexity
Metric for evaluating how well a language model predicts text.
Planning
An AI field concerned with the automatic generation of action sequences to get from an initial state to a goal state.
Poisoning Attack
An attack when an adversary manipulates training data, retrieval corpora, or feedback signals to degrade model behavior.
Policy
A policy is a rule or strategy that determines what actions are taken under which conditions.
Policy Engine
A component that enforces rules and constraints (who can do what, which tools are allowed, what outputs are permitted) at runtime.
Policy Gradient
Methods that optimize a policy directly by adjusting parameters in the direction that improves expected reward.
Positional Encoding
How a transformer model represents token order (position) so it can distinguish "A then B" from "B then A."
Positional Interpolation
A technique to extend a model's usable context length by rescaling how positions are represented.
Post-Training
Any training stage applied after pretraining to shape a model for desired behaviors—helpfulness, safety, instruction-following.
Post-Training Quantization (PTQ)
Reduces model precision (e.g., FP16 → INT8/INT4) after training to lower memory use and speed up inference.
Preference Optimization
Training or adjusting models using preference signals (A preferred to B) to improve alignment with desired outputs.
Prefill
The inference stage where the model processes the prompt to build the initial internal state before generating output tokens.
Prefill Latency
The time spent processing the input prompt before the model can start generating tokens.
Prefix Cache
Reuses computed model state (often KV cache) for repeated prompt prefixes, avoiding repeated prefill computation.
Prefix Tuning
A parameter-efficient adaptation technique where you learn small "prefix" vectors that steer attention layers, instead of fine-tuning all model weights.
Pretraining
Training a model on large-scale data (often self-supervised) to learn general representations before task-specific adaptation.
Privacy-Preserving Machine Learning
A set of techniques that reduce privacy risk when training or serving models.
Product Quantization (PQ)
A vector compression technique that approximates high-dimensional vectors using compact codes, enabling faster approximate nearest neighbor search.
Prompt
The input (instructions + context + examples + constraints) provided to a language model to elicit a desired output.
Prompt A/B Testing
Comparing two prompt versions on real traffic to measure differences in outcomes and guardrails.
Prompt Budget
An explicit allocation of tokens for instructions, context, retrieved evidence, and examples.
Prompt Caching
Stores reusable prompt components or responses to reduce repeated compute, cost, and latency.
Prompt Chaining
A pattern where multiple prompts are run sequentially, where output from one step becomes input to the next.
Prompt Compression
Reduces prompt length while preserving essential constraints and context.
Prompt Engineering
The art and science of designing input prompts to obtain desired outputs from LLMs.
Prompt Hardening
Strengthening prompts and surrounding controls to resist misuse, injection, and unsafe outputs.
Prompt Injection
An attack where malicious or untrusted text attempts to override instructions or manipulate an LLM system.
Prompt Leakage
Unintended exposure of system prompts, hidden instructions, or sensitive context—through model outputs, logs, or UI/debug tools.
Prompt Linting
Automated static analysis of prompts to detect issues before deployment (conflicts, missing constraints, unsafe phrasing).
Prompt Registry
A system for storing, versioning, testing, and governing prompts as production artifacts.
Prompt Regression Testing
Running a stable evaluation suite against prompt changes to detect quality, safety, format, and cost regressions.
Prompt Router
Selects the best prompt template (or workflow) for a request based on intent, difficulty, risk, and context.
Prompt Sandbox
A safe environment to test prompts with controlled data, tools, and logs before production.
Prompt Template
A reusable prompt structure with variables (placeholders) that can be filled dynamically.
Prompt Tokens
The tokens consumed by the model's input (system instructions, user message, retrieved context, tool schemas, examples).
Prompt Tuning
Parameter-efficient method where only learnable token embeddings at the input are trained while the entire model stays frozen.
Proximal Policy Optimization (PPO)
A reinforcement learning algorithm that updates policies in a constrained way to avoid overly large, unstable changes.
Pruning
The removal of unnecessary or unimportant components from a model or search tree to increase efficiency or reduce overfitting.
Q
Q-Former
A Q-Former is a query-based transformer module used in some multimodal systems to extract and compress information from one modality.
Q-Function
The Q-function (action-value function) maps a state-action pair to expected return: Q(s, a).
Q-Learning
Q-learning is a reinforcement learning method that learns a value function Q(s, a) estimating the expected return of taking action a in state s.
QAT (Quantization-Aware Training)
Quantization-aware training trains a model while simulating quantization effects, improving accuracy after quantization compared to PTQ.
QKV (Query–Key–Value)
QKV refers to the Query (Q), Key (K), and Value (V) matrices used in transformer attention mechanisms.
QLoRA (Quantized Low-Rank Adaptation)
QLoRA is a fine-tuning approach that combines quantization with LoRA to adapt large models with lower memory usage.
Quadratic Attention Cost
Quadratic attention cost refers to the classic computational scaling of full self-attention, which grows roughly with the square of sequence length (O(n²)).
Quality-of-Answer Score
A quality-of-answer score is a composite metric that estimates how good an AI answer is (usefulness, correctness, clarity, groundedness, safety).
Quantization
Reducing numerical precision of model weights to decrease memory and compute requirements.
Quantum Machine Learning (QML)
Quantum machine learning explores using quantum computing concepts (qubits, superposition, entanglement) to accelerate or enhance certain ML computations.
Quarantine
Quarantine is isolating content, inputs, or events that are suspicious, unsafe, or low-trust so they cannot affect production outputs.
Query Embeddings
Query embeddings are vector representations of search queries used for semantic similarity matching against embedded documents/passages.
Query Expansion
Query expansion augments a query with additional terms or semantic signals to improve retrieval recall.
Query Fan-Out
Query fan-out is when one request triggers many downstream queries/tool calls to gather context or results.
Query Federation
Query federation executes a query across multiple systems/sources (databases, services, indexes) and combines results.
Query Likelihood Model
A query likelihood model is an information retrieval approach where documents are ranked by the probability that the document's language model would generate the query.
Query Reranking
Query reranking reorders search/retrieval results using a stronger scoring function (often a cross-encoder or LLM-based scorer) to improve relevance at the top.
Query Rewrite
Query rewrite is modifying a search query to improve retrieval quality (recall/precision), often by clarifying intent, expanding terms, or normalizing vocabulary.
Query Rewriting
Transforming a user query into a form that yields better retrieval results.
Query Routing
Query routing sends a query to the most appropriate engine, model, index, or workflow based on intent, confidence, and constraints.
Query Understanding Evaluation
Query understanding evaluation measures how well your system interprets user intent, entities, constraints, and risk level from queries.
Query-Time Filtering
Query-time filtering applies constraints during retrieval—such as permissions, tenant boundaries, recency windows, language, or document type.
Question Answering (QA)
Question Answering is a task where a system answers questions based on a corpus, knowledge base, or model knowledge.
Question Decomposition
Question decomposition breaks a complex question into smaller sub-questions that can be answered more reliably.
Quota-Aware Routing
Quota-aware routing chooses models/workflows based on remaining quota and cost budgets (e.g., route simple queries to cheaper modes when budget is low).
R
RAG (Retrieval-Augmented Generation)
Retrieval-Augmented Generation (RAG) is an architecture where an LLM generates an answer using retrieved external information (documents/chunks) as evidence, rather than relying only on its internal parameters.
RAG Chunking Strategy
A RAG chunking strategy defines how source documents are split into retrievable units (chunk size, overlap, structure preservation, metadata).
RAG Evaluation
The systematic evaluation of RAG systems across retrieval quality, answer relevancy, groundedness, and faithfulness.
RAG Poisoning
RAG poisoning is an attack or failure mode where the retrieval corpus is manipulated so that malicious or misleading content is retrieved as "evidence," degrading outputs or steering the system.
Ragas
Ragas is a popular evaluation approach/library for RAG systems that provides practical metrics and workflows to assess retrieval + generation quality.
Re-Embedding
Re-embedding is regenerating embeddings for a corpus (documents/chunks) using the same or a new embedding model, then updating the vector index accordingly.
ReAct (Reason + Act)
ReAct is an agentic pattern where a model alternates between reasoning and taking actions (tool calls), incorporating observations before continuing.
Recall@k
Recall@k measures how often the needed relevant item(s) appear within the top-k retrieved results.
Recency Bias
Recency bias is a tendency to overweight more recent information—either in human judgment or in system behavior (ranking, context usage).
Reciprocal Rank Fusion (RRF)
RRF combines multiple ranked result lists into one by summing reciprocal ranks, improving robustness when different retrieval methods excel on different queries.
Recommendation Engine
System that generates personalized recommendations based on user behavior.
Red Teaming
Red teaming is adversarial testing that intentionally tries to break a system to discover vulnerabilities and failure modes (security, safety, reliability).
Regression
ML method for predicting continuous numerical values.
Regression Testing
Regression testing ensures that changes (code, prompts, retrieval config, model versions) don't break existing behavior or quality.
Regularization
Techniques that prevent overfitting by constraining model complexity.
Reinforcement Learning
A learning paradigm where an agent learns by interacting with an environment to maximize rewards.
Reinforcement Learning (RL)
Reinforcement learning is a paradigm where an agent learns to make decisions by interacting with an environment and optimizing cumulative reward.
Reproducibility
Reproducibility is the ability to recreate the same (or equivalent) outputs and behavior given the same inputs, versions, and configuration.
Reranker
A reranker is a model that re-scores and reorders retrieved candidates (documents/chunks) to improve relevance at the top.
Reranking
Reordering retrieval results with a more powerful model for better relevance.
Response Generation
AI process for generating natural language responses.
Responsible AI
Responsible AI is designing and operating AI systems to be safe, fair, privacy-aware, transparent, and accountable throughout their lifecycle.
Retrieval Confidence
Retrieval confidence is a signal estimating whether retrieved results contain sufficient, relevant evidence to answer the query reliably.
Retrieval Drift
Retrieval drift is a change in retrieval behavior/quality over time due to corpus updates, embedding model changes, indexing settings, query distribution shifts, or metadata changes.
Retrieval-Augmented Generation (RAG)
A technique that combines LLM generation with external knowledge retrieval to provide more grounded and current responses.
Retrieval-First Policy
A retrieval-first policy forces the system to retrieve evidence before generating substantive answers, especially for factual or high-risk queries.
Retriever
A retriever is the component that selects candidate documents/chunks relevant to a query (keyword, vector, hybrid, or federated).
Retriever-Reranker Cascade
A retriever–reranker cascade is a two-stage retrieval approach: a fast retriever generates candidates, then a slower, more accurate reranker selects the best top-k.
Reward Hacking
Reward hacking occurs when a model/agent finds ways to maximize reward without actually achieving the intended real-world goal.
Reward Model
A reward model scores model outputs according to a preference objective (helpfulness, safety, format compliance), often used in alignment-style training or evaluation.
RLAIF (Reinforcement Learning from AI Feedback)
RLAIF uses AI-generated critiques or preferences (often from a judge model) as feedback signals to improve model behavior, reducing reliance on human labeling.
RLHF (Reinforcement Learning from Human Feedback)
RLHF is a post-training approach that uses human preference data to align model behavior toward desired outputs.
RNN (Recurrent Neural Network)
A Recurrent Neural Network (RNN) is a neural network architecture for sequential data where neurons use their own output as additional input for the next time step — preserving context across sequences.
Robustness Testing
Robustness testing evaluates how reliably a model or system performs under perturbations, edge cases, noise, or distribution shifts.
RoPE (Rotary Positional Embeddings)
RoPE is a positional encoding method that applies rotations to query/key vectors, enabling models to represent token positions in a way that supports relative position behavior.
ROUGE Score
Metrics for evaluating automatic text summarization.
Routing Policy
A routing policy is the rule set that decides which model/workflow/tools to use for a request based on intent, risk, confidence, and budgets.
S
Safety
Safety in AI systems is the set of measures that prevent harmful, insecure, or policy-violating outputs and actions—especially under adversarial or ambiguous inputs.
Safety Alignment
Safety alignment is shaping model/system behavior so it reliably follows safety constraints (refusals, safe defaults, policy adherence) across normal and adversarial inputs.
Safety Case
A safety case is a structured argument—supported by evidence—that a system is acceptably safe for a specific context and risk profile.
Safety Classifier
A safety classifier is a model/rule system that detects unsafe content or risky intent (e.g., self-harm, hate, data exfiltration attempts, policy violations).
Safety Evaluation
Safety evaluation is the systematic testing of an AI system for harmful, policy-violating, insecure, or privacy-risk behavior—across normal and adversarial inputs.
Safety Filters
Safety filters detect and block or transform unsafe outputs (or unsafe inputs) based on policy (e.g., sexual content, violence, hate, self-harm, illegal instructions).
Safety Guardrails
Safety guardrails are mechanisms that constrain an AI system's behavior to reduce harm (policies, validators, permission boundaries, rate limits, refusals).
Safety Incident Taxonomy
A safety incident taxonomy is a structured classification system for AI safety incidents (what happened, severity, impact, root cause, mitigation).
Sampling Steps
Sampling steps are the number of iterative denoising iterations used during diffusion inference to generate an output.
Sampling Temperature
Sampling temperature scales the model's output distribution: lower temperatures make outputs more deterministic; higher temperatures increase randomness.
Satisficing
Satisficing is choosing a solution that is 'good enough' to meet constraints, rather than optimizing for the absolute best.
Scaling Laws
Scaling laws are empirical relationships showing how model performance tends to improve predictably as you scale data, compute, and parameters.
Schema Drift
Schema drift is when the expected structure of data changes over time (fields added/removed/renamed, types change, enums expand), often breaking pipelines.
Schema Validation
Schema validation checks that structured data conforms to a defined schema (types, required fields, enums, constraints).
Seedance
AI video generator by ByteDance with controversial training data origins and photorealistic results.
Self-Attention
Attention mechanism where input elements are related to each other.
Self-Consistency
Self-consistency is a technique where you sample multiple reasoning paths/answers and aggregate them (e.g., majority vote) to improve reliability.
Self-Supervised Learning
Learning paradigm where the model generates labels from the data itself.
Semantic Caching
Semantic caching reuses past answers/results when a new query is semantically similar to a previous query, not necessarily identical.
Semantic Chunking
Semantic chunking splits documents into chunks based on meaning boundaries (topics/sections) rather than fixed token counts alone.
Semantic Router
A semantic router routes queries to the right workflow, toolset, or model using semantic signals (embeddings, intent classification, similarity to known categories).
Semantic Search
A search method that understands the meaning of queries and documents, rather than relying solely on keyword matching.
Semantic Segmentation
Pixel-level classification of image regions by object categories.
Sentence Transformers
A Python library and collection of models that produce semantically meaningful sentence embeddings – optimized for similarity search and clustering.
SFT (Supervised Fine-Tuning)
Supervised fine-tuning (SFT) adapts a pretrained model using labeled input→output examples to shape behavior (format, style, task performance).
SFT (Supervised Fine-Tuning)
Training a pre-trained model on curated (input, output) pairs to adapt it to specific tasks or formats.
SHAP (Shapley Additive Explanations)
SHAP is a model explainability method based on Shapley values from cooperative game theory that attributes a prediction to individual features.
Siamese Network
A Siamese network is a neural architecture with two (or more) identical subnetworks that learn to compare inputs by producing embeddings and measuring similarity.
Signal-to-Noise Ratio
Signal-to-noise ratio (SNR) is the proportion of meaningful information ("signal") relative to irrelevant or misleading information ("noise").
SimCLR
SimCLR (Simple Contrastive Learning of Visual Representations) is a framework for self-supervised learning that learns visual representations by comparing augmented image versions.
Similarity Score Calibration
Similarity score calibration maps raw similarity scores (from embeddings/rerankers) to more reliable confidence signals (e.g., probabilities or risk bands).
Similarity Search
Similarity search finds items most similar to a query under a similarity metric (cosine similarity, dot product, etc.), commonly used with embeddings.
Similarity Thresholding
Similarity thresholding sets cutoff values on similarity scores (embedding similarity, reranker scores) to decide actions like "use cache," "retrieve more," or "ask a clarifying question."
SimPO (Simple Preference Optimization)
A simplified version of DPO that works without a reference model and uses length-normalized reward.
Simulation
The imitation of a real or hypothetical system or process in a controlled virtual environment.
Sliding Window Attention
Sliding window attention restricts attention to a moving window of nearby tokens rather than the full sequence, reducing compute and memory costs.
Slot Filling
Extraction of specific parameters from user utterances for conversational AI.
Small Language Model
A Small Language Model (SLM) is a comparatively smaller LLM designed for lower latency, lower cost, and easier deployment—often used for narrow tasks or as part of a routed system.
Soft Prompt
A soft prompt is a learned vector representation (rather than human-written text) used to steer a model's behavior—often trained as a small set of prompt embeddings.
Softmax
Function that converts logits into probability distribution.
Solomonoff Induction
Solomonoff induction is a theoretical framework for optimal prediction that combines Bayesian inference with algorithmic complexity, weighting hypotheses by how simply they describe the data.
Sora 2
The second generation of OpenAI's text-to-video model with improved quality, longer clips, and more realistic physics simulation.
Source Attribution
Source attribution is explicitly indicating where information came from (documents, URLs, internal systems), often via citations or links.
Source Grounding
Source grounding is constraining an AI system to base its answers on provided sources (retrieved documents, tools, or approved references) rather than unverified model knowledge.
Sparse Attention
Sparse attention reduces attention computation by allowing tokens to attend only to a subset of other tokens (patterned or learned sparsity).
Sparse Autoencoder
A Sparse Autoencoder (SAE) is an autoencoder trained with a sparsity constraint so that only a small subset of features activate for any given input.
Sparse Retrieval
Sparse retrieval uses sparse representations (often term-frequency based) such as BM25 to retrieve documents by lexical match.
Speaker Diarization
Speaker diarization identifies "who spoke when" in an audio recording by segmenting audio into speaker-labeled turns.
Speculative Decoding
Speculative decoding accelerates LLM generation by using a smaller "draft" model to propose tokens that a larger model then verifies/accepts in batches.
Speech-to-Text
Technology for converting spoken language into written text – the foundation for voice assistants and transcription.
Speech-to-Text (STT)
Speech-to-Text (STT) converts spoken audio into written text using automatic speech recognition (ASR) models.
State Space Models (SSMs)
State Space Models (SSMs) are sequence models that maintain a latent "state" that evolves over time to process sequential data efficiently.
Statefulness
Statefulness describes whether a system retains information across interactions (stateful) or treats each request independently (stateless).
Steering Vector
A steering vector is a direction in a model's internal representation space that, when added or applied to activations, can bias outputs toward or away from certain behaviors or attributes.
Stochastic Parrot
Stochastic parrot is a critique framing that highlights how LLMs can generate fluent text by pattern-matching from training data without true understanding—raising concerns about bias, misinformation, and misuse.
Stop Sequence
A stop sequence is a token/string pattern that tells a model to stop generating when encountered.
Streaming ASR
Streaming ASR transcribes speech in near real-time as audio arrives, rather than after the full recording is complete.
STRIPS
STRIPS is a classical planning formalism where actions are defined by preconditions and effects (add/delete lists) over symbolic state predicates.
Structured Output
Structured output is requiring the model to produce outputs in a predefined structure (JSON, YAML, sections with strict headings), often enforced with validation.
Style Transfer
Style transfer modifies an image (or text) to match a target style while preserving core content.
Subject Consistency
The ability of an AI image generator to consistently render characters and objects across multiple images.
Summarization
Summarization is generating a shorter representation of content while preserving key meaning—extractive (selecting parts) or abstractive (rewriting).
Superposition
Superposition in neural networks describes how multiple features can be represented in overlapping directions within a limited-dimensional space, rather than one feature per neuron.
Supervised Learning
ML paradigm where the model learns from labeled examples (input-output pairs).
SWE-Bench (Software Engineering Benchmark)
A benchmark that tests LLMs by having them solve real bug reports from GitHub repositories – the most realistic test for AI coding abilities.
Sycophancy
Sycophancy is an LLM behavior where the model overly agrees with the user's stated beliefs or incorrect premises instead of correcting them.
Synthetic Data
Synthetic data is artificially generated data used to train, test, or evaluate systems when real data is scarce, sensitive, or costly.
System Prompt
A system prompt is the highest-priority instruction layer that defines the model's role, boundaries, and policies for a session.
T
Technological Singularity
A hypothetical point at which technological progress (especially AI) becomes so rapid and profound that it fundamentally and unpredictably transforms human civilization.
Temperature
A parameter that controls randomness in LLM output.
Temporal Graph Network
A GNN for time-evolving graphs that models the evolution of nodes and edges over time.
Text Generation
Text generation is the automatic creation of text by AI models, typically based on a prompt or context.
Text-to-Image
Text-to-image is generating images from text prompts using generative models (commonly diffusion-based or transformer-based approaches).
Text-to-Speech
Technology for converting written text into natural-sounding speech – today mostly using neural models.
Tokenization
Breaking text into smaller units (tokens) for processing by NLP models.
Top-k Sampling
A sampling parameter that restricts selection to the k most likely tokens, regardless of their absolute probabilities.
Top-p (Nucleus Sampling)
A sampling parameter that selects only from the most likely tokens whose cumulative probability does not exceed p.
Transfer Learning
Using knowledge learned from one task to improve performance on a related task.
Transformer
A neural network architecture that uses self-attention to model relationships between all positions in a sequence.
Tree of Thoughts (ToT)
Prompting strategy where the LLM explores multiple reasoning paths in parallel, evaluates them, and selects the best – like a decision tree for thought chains.
Triplet Loss
A loss function for metric learning that uses anchor, positive, and negative samples to train embeddings so similar items are closer and different ones further apart.
Trust & Safety
Trust & Safety is the practice of protecting users, platforms, and brands from harmful content, abuse, and unsafe outcomes—through policy, enforcement, and product design.
TruthfulQA
A benchmark that tests whether LLMs avoid popular misinformation and conspiracy theories.
U
U-Net
U-Net is a network architecture for image segmentation with encoder-decoder structure and skip connections.
Ultra-Long Context Window
An ultra-long context window is the ability to accept very large input contexts (tens or hundreds of thousands of tokens).
Uncertainty Quantification (UQ)
UQ estimates how uncertain a model is about an output.
Uncertainty-Aware Routing
Uncertainty-aware routing chooses workflows based on uncertainty signals (low-confidence → deeper retrieval).
Underfitting
Underfitting happens when a model is too simple to capture patterns—poor performance on both training and test.
Uniform Information Density
Prompt principle: keep "importance per token" consistent, avoid low-value text.
Unintended Memorization
Unintended memorization: models retain specific training examples and may reproduce them.
Universal Embeddings
Universal embeddings: general-purpose representations for many domains without domain-specific training.
Unlearning (Machine Unlearning)
Machine unlearning removes the influence of specific training data from a model (privacy, compliance).
Unsupervised Learning
ML paradigm where the model finds patterns in unlabeled data.
Untrusted Input Handling
Controls that treat external/user-provided content as potentially malicious.
Utility Function
A utility function maps outcomes to numeric values representing preference, enabling tradeoffs between competing objectives.
V
VAE (Variational Autoencoder)
VAE stands for Variational Autoencoder, a generative model that learns a probabilistic latent space for sampling and generation.
Value Alignment
Value alignment is ensuring an AI system's behavior reliably matches intended human/organizational values and constraints (safety, fairness, truth-seeking, privacy).
Value of Information (VoI)
Value of Information (VoI) quantifies how much benefit you gain by obtaining additional information before making a decision.
Vanishing Gradient
Vanishing gradient is a training problem where gradients become extremely small as they propagate backward through a network, slowing or preventing learning in early layers.
Variational Autoencoder (VAE)
A Variational Autoencoder (VAE) is a generative model that learns a probabilistic latent space, enabling sampling and generation of new data.
Veo 3
Google's third-generation video generation model with native audio, longer clips, and improved physics.
Verification
Checking whether LLM outputs are correct, factual, and source-supported.
Verification Layer
A verification layer is a system component that checks whether an AI output or action meets required correctness, safety, policy, and formatting constraints before it is delivered or executed.
Verification-First Policy
A verification-first policy requires AI outputs and high-impact actions to pass defined verification checks before being shown to users or executed.
Video AI
Video AI encompasses AI technologies for automatic analysis, generation, editing, and optimization of video content.
Vision Transformer (ViT)
A Vision Transformer (ViT) applies transformer architectures to images by representing them as sequences of patch embeddings.
Vision-Language Model (VLM)
A Vision-Language Model (VLM) processes both images and text to perform tasks like image understanding, captioning, document Q&A, and multimodal reasoning.
VQ-VAE
VQ-VAE is a variant of VAE that uses vector quantization to learn discrete latent representations via a learned codebook.
W
Warm Start
A warm start initializes training or optimization from a previously learned state (weights, embeddings, or parameters) rather than starting from scratch.
Watermarking
Watermarking is adding a detectable signal to content (text, image, audio, video) to indicate origin, authenticity, or provenance—often used to mark AI-generated outputs.
Weak Supervision
Weak supervision uses imperfect, noisy, or indirect signals (heuristics, rules, distant labels) to create training labels instead of manual annotation.
Weakly Supervised Learning
Weakly supervised learning trains models using weak supervision signals (noisy labels, partial labels, aggregated labels) rather than fully reliable labels.
Weavy
AI video platform with node-based editor for complex generative video workflows and multi-model pipelines.
Web Browsing Tool
A web browsing tool is an AI tool integration that fetches live web pages or search results to answer questions with up-to-date information.
Web Grounding
The ability of an AI model to access web search results in real-time to generate current and factually accurate content.
Weight Decay
Weight decay is a regularization technique that discourages large weights during training, often implemented as L2 regularization or decoupled weight decay (e.g., in AdamW).
WER (Word Error Rate)
Word Error Rate (WER) measures speech recognition accuracy as the proportion of substitutions, deletions, and insertions needed to transform a transcript into the ground truth.
Whisper
An open-source speech recognition model from OpenAI trained on 680,000 hours of multilingual audio.
Windowed Attention
Windowed attention restricts attention to a local token window instead of the full sequence, reducing compute and enabling longer contexts.
WinoGrande
A benchmark for pronominal reference resolution where small word changes flip the correct answer.
Word Embedding
A dense vector representation of a word that encodes its semantic meaning.
Word2Vec
Word2Vec is a technique for generating word embeddings that represents words as dense vectors, where semantically similar words have similar vectors.
World Model
An internal representation of the environment in an AI system that enables predictions about future states and the effects of actions.
X
x-Vector
An x-vector is a type of speaker embedding used in speech processing to represent speaker identity characteristics in a fixed-length vector.
XAI (Explainable AI)
Explainable AI (XAI) is the set of methods and practices used to make an AI system's outputs more understandable—showing why a prediction, recommendation, or decision happened.
Xavier Initialization (Glorot Initialization)
Xavier (Glorot) initialization is a weight initialization method designed to keep activations and gradients in a healthy range as they flow through a neural network.
XGBoost
XGBoost (Extreme Gradient Boosting) is a high-performance ensemble learning algorithm that combines gradient boosting with decision trees for superior prediction accuracy.
XLM (Cross-lingual Language Model)
XLM refers to cross-lingual language modeling approaches and model families designed to represent and process multiple languages in a shared embedding space.
XLM-R (Cross-lingual RoBERTa)
XLM-R is a multilingual transformer model family often used for cross-lingual understanding tasks (classification, NER, semantic similarity).
XLNet
XLNet is a transformer-based language model approach that uses permutation-based training to capture bidirectional context while preserving autoregressive properties.
XOR Problem
The XOR problem is a classic example showing that a single linear classifier cannot separate data that is not linearly separable.
Z
ZeRO (Zero Redundancy Optimizer)
ZeRO is a set of techniques for training very large models efficiently by partitioning optimizer states, gradients, and parameters across devices—reducing memory redundancy.
Zero-Shot Classification
Zero-shot classification assigns labels to text without training a task-specific classifier, usually by using natural language label descriptions.
Zero-Shot Learning
Zero-shot learning is when a model performs a task it was not explicitly trained on with task-specific labeled examples—relying on generalization from pretraining.
Zero-Shot Prompting
Zero-shot prompting is prompting a model with instructions and constraints without providing explicit examples.
Zero-Shot vs Few-Shot
Zero-shot uses no examples; few-shot includes a small number of examples in the prompt to steer behavior.
ZKML (Zero-Knowledge Machine Learning)
ZKML refers to applying zero-knowledge proof techniques to machine learning so one can prove properties about ML inference/training without revealing sensitive inputs or model internals.