Two-Tower Model
An architecture with two separate encoders (user tower, item tower) whose embeddings are efficiently matched via similarity search.
Two-tower models encode users and items separately and match via similarity search – the standard architecture for RecSys at billions of items.
Explanation
Each tower independently produces embeddings. At inference, item embeddings are precomputed and efficiently searched via ANN (approximate nearest neighbor). Scales to billions of items.
Marketing Relevance
Two-tower is the standard architecture for candidate generation in large RecSys (YouTube, Google, Meta).
Example
Google Search uses two-tower for ad retrieval: user context and ad features are encoded separately, then matched via ANN.
Common Pitfalls
Dot product interaction is less expressive than cross-attention. Negative sampling strategy is crucial.
Origin & History
YouTube (Covington et al., 2016) popularized the architecture. Google published the dual encoder for retrieval in 2019. Meta's DLRM and Google's TF-Ranking formalized two-tower as industry standard.
Comparisons & Differences
Two-Tower Model vs. Cross-Encoder
Cross-encoder processes user+item jointly (more accurate but slow); two-tower encodes separately (fast, scalable).