Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Structured Pruning

    Also known as:
    Channel Pruning
    Filter Pruning
    Block Pruning
    Group Pruning
    Updated: 2/11/2026

    A pruning variant that removes entire structures (neurons, filters, attention heads, layers) instead of individual weights – delivers real speedups without specialized sparse hardware.

    Quick Summary

    Structured Pruning removes entire neurons, filters, or attention heads – delivers real speedups on standard hardware without sparse support.

    Explanation

    Unlike unstructured pruning (zeroing individual weights), structured pruning removes contiguous blocks: entire convolutional filters, attention heads, or even layers. The resulting model is a genuinely smaller model without sparse representation.

    Marketing Relevance

    Structured pruning is the most practically relevant pruning method since standard hardware (GPUs, CPUs) directly benefits from smaller models – no sparse support needed.

    Example

    LLM-Shearing (2023) selectively removes attention heads and FFN dimensions from Llama-2 7B, producing a 1.3B model that outperforms 1.3B models trained from scratch.

    Common Pitfalls

    Coarser granularity than unstructured pruning – may compress less. Harder to optimize which structures are removable. Requires retraining/fine-tuning after pruning.

    Origin & History

    Li et al. (2016) introduced filter pruning for CNNs. For transformers, head pruning was studied by Michel et al. (2019) – they showed many attention heads are removable. LLM-Shearing (2023) scaled this to LLMs.

    Comparisons & Differences

    Structured Pruning vs. Unstructured Pruning

    Unstructured pruning removes individual weights (higher compression possible); Structured pruning removes entire blocks (real speedups on standard hardware).

    Structured Pruning vs. Knowledge Distillation

    Structured pruning trims an existing model; Distillation trains a new smaller model from scratch.

    Related Services

    Related Terms

    👋Questions? Chat with us!