Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    BPE (Byte Pair Encoding)

    Updated: 2/10/2026

    Subword tokenization algorithm that iteratively merges frequent character pairs to create an optimal vocabulary.

    Quick Summary

    BPE creates a subword vocabulary by iteratively merging frequent character pairs – basis for GPT tokenizers (tiktoken) and most modern LLMs.

    Explanation

    BPE starts with individual characters and iteratively merges the most frequent pairs. "low", "lower", "lowest" share the subword "low". GPT models use BPE via tiktoken.

    Marketing Relevance

    BPE is the tokenizer standard for GPT models and the foundation for efficient text processing in LLMs.

    Common Pitfalls

    Vocabulary size must be chosen as hyperparameter. Greedy merging doesn't always find the optimal split. Not all languages benefit equally.

    Origin & History

    BPE originally comes from data compression (Gage, 1994). Sennrich et al. adapted BPE for neural machine translation in 2016. OpenAI used BPE for all GPT models. tiktoken (2022) optimized the BPE implementation for speed.

    Comparisons & Differences

    BPE (Byte Pair Encoding) vs. WordPiece

    BPE merges by frequency; WordPiece maximizes training corpus likelihood. BPE is used by GPT, WordPiece by BERT.

    BPE (Byte Pair Encoding) vs. SentencePiece

    SentencePiece is a framework that can use BPE or Unigram as algorithm; BPE is a specific algorithm.

    Related Services

    Related Terms

    👋Questions? Chat with us!