Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence
    (Detokenisierung)

    Detokenization

    Updated: 2/11/2026

    The process of converting tokens back into readable text – the reverse of tokenization.

    Quick Summary

    Detokenization converts token sequences back into readable text – removes subword markers and correctly reconstructs whitespace.

    Explanation

    Detokenization must correctly reconstruct whitespace, punctuation, and special characters. With subword tokenization, "▁" (SentencePiece) or "##" (WordPiece) markers are removed.

    Marketing Relevance

    Detokenization is essential for correctly displaying LLM outputs in applications.

    Common Pitfalls

    Whitespace reconstruction with subword tokens is complex. Special characters and Unicode can be problematic. Streaming detokenization with partial tokens.

    Origin & History

    Detokenization was trivial with word-level tokenization. Subword tokenization (BPE, 2016) made detokenization more complex. SentencePiece solved the problem with the "▁" marker for word starts. Streaming detokenization became critical for chat interfaces (ChatGPT, 2022).

    Comparisons & Differences

    Detokenization vs. Tokenization

    Tokenization splits text into tokens; detokenization reassembles tokens into readable text – not always losslessly possible.

    Related Services

    Related Terms

    👋Questions? Chat with us!