Chapter 25: Query Evaluation Strategies (WAND/BMW)
Introduction
Top-k retrieval—finding the highest-scoring documents for a query—dominates search engine workloads. Naive evaluation scores every document containing any query term, an approach with complexity proportional to the sum of posting list lengths. For queries with millions of matching documents, this becomes prohibitively expensive.
The Weak AND (WAND) algorithm and its Block-Max variant (BMW) exploit a fundamental observation: we do not need to score documents that cannot enter the top-k. By maintaining score upper bounds and a dynamic threshold, these algorithms skip large portions of posting lists without sacrificing result quality.
This chapter explores Cognica's production implementation of WAND and BMW: the theoretical foundations of safe document skipping, the engineering of efficient upper bound computation, and the data-driven heuristics for algorithm selection.
25.1 The Top-K Problem
25.1.1 Problem Definition
Given:
- A query of terms
- A collection of documents
- A scoring function
- A result size
Find the documents with highest scores.
25.1.2 Naive Evaluation
Document-at-a-time (DAAT) evaluation processes documents in sorted order:
for each document d in union(posting_lists):
score = sum of s(d, t) for each term t in d
if score > threshold:
insert d into top-k heap
update threshold
Complexity: where is the posting list length for term .
For a 5-term query with 100,000 postings each, this evaluates 500,000 score contributions—most for documents that never enter the top-k.
25.1.3 The Upper Bound Insight
WAND exploits term-level upper bounds. If the maximum possible score contribution from term is , then a document can score at most:
Documents whose upper bound falls below the current -th highest score can be safely skipped.
25.2 WAND Algorithm
25.2.1 Algorithm Overview
WAND maintains posting list iterators sorted by current document ID. The algorithm identifies a "pivot" document and determines whether it could potentially enter the top-k based on cumulative upper bounds.
25.2.2 Pivot Selection
The pivot is the first document where cumulative upper bounds reach the threshold:
int WANDExecutor::find_pivot(
const std::vector<TermState>& sorted_states,
float threshold) {
float cumulative_ub = 0.0f;
for (size_t i = 0; i < sorted_states.size(); ++i) {
size_t term_idx = sorted_states[i].term_index;
cumulative_ub += scorers_[term_idx]->get_upper_bound();
if (cumulative_ub >= threshold) {
return static_cast<int>(i); // Pivot found
}
}
return -1; // No pivot: algorithm terminates
}
Termination Condition: When the sum of all upper bounds cannot reach the threshold, no document can enter the top-k, and evaluation terminates.
25.2.3 Document Evaluation
size_t WANDExecutor::execute() {
while (state_manager_->is_valid()) {
float threshold = collector_->get_threshold();
auto sorted_states = state_manager_->get_sorted_states();
int pivot_idx = find_pivot(sorted_states, threshold);
if (pivot_idx < 0) break;
DocID pivot_doc = sorted_states[pivot_idx].current_doc;
// Advance terms before pivot to pivot document
advance_to_pivot(sorted_states, pivot_idx);
// Collect terms now at pivot
collect_terms_at_pivot(pivot_doc);
// Verify upper bound still exceeds threshold
float sum_upper_bounds = 0.0f;
for (const auto& state : terms_at_pivot_buf_) {
sum_upper_bounds += scorers_[state.term_index]->get_upper_bound();
}
if (sum_upper_bounds >= threshold) {
// Compute exact score
float score = compute_exact_score(pivot_doc, terms_at_pivot_buf_);
stats_.docs_scored++;
collector_->try_insert(pivot_doc, score);
} else {
stats_.docs_skipped++;
}
// Advance past pivot
advance_past_pivot(pivot_doc);
}
return stats_.docs_scored;
}
25.2.4 Iterator Advancement
When terms before the pivot do not contain the pivot document, they advance to or past it:
void WANDExecutor::advance_to_pivot(
const std::vector<TermState>& sorted_states,
int pivot_idx) {
DocID pivot_doc = sorted_states[pivot_idx].current_doc;
// Terms 0 to pivot_idx-1 must advance to pivot_doc
for (int i = 0; i < pivot_idx; ++i) {
size_t term_idx = sorted_states[i].term_index;
auto& iterator = iterators_[term_idx];
if (iterator->doc_id() < pivot_doc) {
iterator->advance(pivot_doc);
}
}
state_manager_->refresh();
}
25.3 Upper Bound Computation
25.3.1 BM25 Score Analysis
The BM25 scoring formula:
As term frequency :
However, Cognica uses a numerically stable reformulation:
where . As :
This yields a tight upper bound: .
25.3.2 WAND Scorer Implementation
class BM25WANDScorer final : public WANDScorer {
public:
BM25WANDScorer(BM25SimScorer sim_scorer, float boost,
float idf, float k1)
: sim_scorer_(std::move(sim_scorer)),
boost_(boost),
upper_bound_(boost * idf) { // Tight upper bound
}
auto get_upper_bound() const -> float override {
return upper_bound_;
}
auto score(float freq, float norm) const -> float override {
return sim_scorer_.score(freq, norm);
}
private:
BM25SimScorer sim_scorer_;
float boost_;
float upper_bound_;
};
25.3.3 IDF Computation
The IDF component uses the Robertson-Sparck Jones formula:
float BM25Similarity::compute_idf(int64_t doc_freq,
int64_t doc_count) const {
// IDF = ln((N - df + 0.5) / (df + 0.5) + 1)
auto numerator = static_cast<float>(doc_count - doc_freq) + 0.5f;
auto denominator = static_cast<float>(doc_freq) + 0.5f;
return std::log(numerator / denominator + 1.0f);
}
Properties:
- Rare terms (): High IDF, high upper bound
- Common terms (): Low IDF, low upper bound
- IDF is always positive due to the term
25.4 Block-Max WAND (BMW)
25.4.1 Motivation
WAND's term-level upper bounds are loose—they assume maximum term frequency across the entire collection. Documents in regions with lower term frequencies have tighter actual bounds.
BMW partitions posting lists into blocks and precomputes per-block maximum scores, enabling finer-grained skipping.
25.4.2 Block Structure
struct BlockInfo {
DocID start_doc; // First document in block
DocID end_doc; // Last document in block
float max_score; // Maximum BM25 score in block
bool contains(DocID doc_id) const {
return doc_id >= start_doc && doc_id <= end_doc;
}
};
Block Generation:
std::vector<BlockInfo> generate_blocks(
const std::shared_ptr<Postings>& postings,
float idf, float k1, float b, float boost,
float avg_doc_size,
size_t block_size = 128) {
const size_t num_docs = postings->doc_ids.size();
const size_t num_blocks = (num_docs + block_size - 1) / block_size;
std::vector<BlockInfo> blocks;
blocks.reserve(num_blocks);
for (size_t block_idx = 0; block_idx < num_blocks; ++block_idx) {
const size_t start_idx = block_idx * block_size;
const size_t end_idx = std::min(start_idx + block_size, num_docs);
BlockInfo block;
block.start_doc = postings->doc_ids[start_idx];
block.end_doc = postings->doc_ids[end_idx - 1];
block.max_score = 0.0f;
// Compute max BM25 score in this block
for (size_t i = start_idx; i < end_idx; ++i) {
const float tf = static_cast<float>(postings->term_freqs[i]);
const float norm = postings->field_norms[i];
const float score = compute_bm25(tf, norm, idf, k1, b,
boost, avg_doc_size);
block.max_score = std::max(block.max_score, score);
}
blocks.push_back(block);
}
return blocks;
}
25.4.3 Block-Max Term State
struct BlockMaxTermState {
DocID current_doc; // Current document ID
size_t term_index; // Term identifier
DocID block_end; // End of current block
float block_max_score; // Max score in current block
bool exhausted; // Iterator exhausted flag
};
25.4.4 BMW Pivot Selection
BMW uses block-level upper bounds instead of term-level:
int BMWExecutor::find_pivot_bmw(
const std::vector<BlockMaxTermState>& sorted_states,
float threshold) {
float cumulative_block_max = 0.0f;
for (size_t i = 0; i < sorted_states.size(); ++i) {
// Use block-max score, not term-level upper bound
cumulative_block_max += sorted_states[i].block_max_score;
if (cumulative_block_max >= threshold) {
return static_cast<int>(i);
}
}
return -1; // No pivot found
}
Key Difference: Block-max scores are typically much smaller than term-level upper bounds, enabling more aggressive skipping.
25.4.5 Block Index Caching
Efficient block lookup is critical for BMW performance:
void BlockMaxTermStateManager::update_block_info(size_t term_index) {
auto& state = states_[term_index];
DocID doc_id = state.current_doc;
// 1. Check cached block (O(1) - ~99% hit rate)
int cached_idx = current_block_indices_[term_index];
if (cached_idx >= 0) {
const auto& cached_block = blocks_[term_index][cached_idx];
if (cached_block.contains(doc_id)) {
state.block_end = cached_block.end_doc;
state.block_max_score = cached_block.max_score;
return; // Cache hit
}
}
// 2. Check next block (O(1) - sequential access pattern)
if (cached_idx + 1 < static_cast<int>(blocks_[term_index].size())) {
const auto& next_block = blocks_[term_index][cached_idx + 1];
if (next_block.contains(doc_id)) {
current_block_indices_[term_index] = cached_idx + 1;
state.block_end = next_block.end_doc;
state.block_max_score = next_block.max_score;
return;
}
}
// 3. Binary search (O(log B) - rare, <0.1% of lookups)
int block_idx = binary_search_block(term_index, doc_id);
if (block_idx >= 0) {
current_block_indices_[term_index] = block_idx;
const auto& block = blocks_[term_index][block_idx];
state.block_end = block.end_doc;
state.block_max_score = block.max_score;
}
}
25.4.6 Block-Level Skipping
BMW can skip entire blocks when their maximum score is insufficient:
DocID BMWExecutor::next_candidate() {
// Find minimum block end across all terms
DocID min_block_end = kNoMoreDocs;
for (const auto& state : states_) {
if (!state.exhausted) {
min_block_end = std::min(min_block_end, state.block_end);
}
}
// If current pivot cannot reach threshold, skip to next block
float threshold = collector_->get_threshold();
float block_sum = compute_block_max_sum();
if (block_sum < threshold) {
// Skip all terms to min_block_end + 1
return min_block_end + 1;
}
return current_pivot_doc_;
}
25.5 Top-K Collection
25.5.1 Min-Heap Structure
The TopKCollector maintains a min-heap of documents:
struct ScoredDoc {
DocID doc_id;
float score;
// Min-heap comparator: lowest score at top
bool operator>(const ScoredDoc& other) const {
if (score != other.score) {
return score > other.score;
}
return doc_id < other.doc_id; // Tie-break by doc_id
}
};
class TopKCollector {
public:
explicit TopKCollector(size_t k, float min_score = 0.f)
: k_(k), min_score_(min_score) {}
bool try_insert(DocID doc_id, float score) {
if (score < min_score_) {
return false;
}
if (heap_.size() < k_) {
heap_.push({doc_id, score});
return true;
}
const auto& worst = heap_.top();
if (score > worst.score ||
(score == worst.score && doc_id < worst.doc_id)) {
heap_.pop();
heap_.push({doc_id, score});
return true;
}
return false;
}
float get_threshold() const {
if (heap_.size() < k_) {
return min_score_;
}
return heap_.top().score;
}
private:
size_t k_;
float min_score_;
std::priority_queue<ScoredDoc, std::vector<ScoredDoc>,
std::greater<ScoredDoc>> heap_;
};
25.5.2 Dynamic Threshold
The threshold increases as better documents are found:
| Heap State | Threshold | Effect |
|---|---|---|
min_score_ | Accept any qualified document | |
| -th highest score | Prune aggressively |
This dynamic threshold enables progressive pruning: early in query evaluation, most documents qualify; as the heap fills with high-scoring documents, the threshold rises, and more documents are skipped.
25.5.3 Tie-Breaking
Deterministic tie-breaking ensures reproducible results:
bool operator<(const ScoredDoc& other) const {
if (score != other.score) {
return score > other.score; // Descending by score
}
return doc_id < other.doc_id; // Ascending by doc_id
}
When two documents have identical scores, the one with lower doc_id ranks higher. This provides:
- Determinism: Same query always produces same ranking
- Stability: Repeated queries do not randomly reorder ties
- Efficiency: Single comparison chain, no secondary data needed
25.6 State Management
25.6.1 Term State Structure
struct TermState {
size_t term_index; // Index into scorer array
DocID current_doc; // Current iterator position
float upper_bound; // Term-level upper bound
bool operator<(const TermState& other) const {
return current_doc < other.current_doc;
}
};
25.6.2 State Manager
The TermStateManager coordinates iterator state across terms:
class TermStateManager {
public:
std::vector<TermState> get_sorted_states() {
if (!sorted_cache_valid_) {
sorted_states_ = states_;
std::sort(sorted_states_.begin(), sorted_states_.end());
sorted_cache_valid_ = true;
}
return sorted_states_;
}
void advance_term(size_t term_idx, DocID target) {
auto& iterator = iterators_[term_idx];
if (iterator->doc_id() < target) {
iterator->advance(target);
}
states_[term_idx].current_doc = iterator->doc_id();
sorted_cache_valid_ = false;
}
void refresh() {
for (size_t i = 0; i < states_.size(); ++i) {
states_[i].current_doc = iterators_[i]->doc_id();
}
sorted_cache_valid_ = false;
}
bool is_valid() const {
return std::any_of(iterators_.begin(), iterators_.end(),
[](const auto& it) { return it->is_valid(); });
}
private:
std::vector<TermState> states_;
std::vector<TermState> sorted_states_;
std::vector<std::shared_ptr<DocIDSetIterator>> iterators_;
bool sorted_cache_valid_ = false;
};
25.6.3 Iterator Interface
class DocIDSetIterator {
public:
virtual auto doc_id() const -> DocID = 0;
virtual auto is_valid() const -> bool = 0;
virtual auto next() -> DocID = 0;
virtual auto advance(DocID target) -> DocID = 0;
virtual auto index() const -> size_t = 0;
};
The advance(target) method is critical for WAND efficiency—it skips directly to the target document or the next document after it, without iterating through intermediate postings.
25.7 Performance Analysis
25.7.1 Complexity
WAND:
- Best case: when threshold rises quickly
- Worst case: when all documents score above threshold
- Typical: where is documents scored
BMW:
- Additional overhead: block metadata per term
- Benefit: Tighter upper bounds reduce significantly
- Block lookup: Amortized with caching
25.7.2 Benchmark Results
Small Posting Lists (100-250 documents):
| Terms | Docs/Term | k | WAND (us) | BMW (us) | Speedup |
|---|---|---|---|---|---|
| 2 | 100 | 10 | 18.3 | 10.6 | 1.73x |
| 5 | 250 | 10 | 89.3 | 70.8 | 1.26x |
Medium Posting Lists (500-1000 documents):
| Terms | Docs/Term | k | WAND (us) | BMW (us) | Speedup |
|---|---|---|---|---|---|
| 2 | 500 | 10 | 49.1 | 43.9 | 1.12x |
| 5 | 500 | 10 | 147.8 | 121.8 | 1.21x |
| 5 | 1000 | 10 | 287.5 | 262.4 | 1.10x |
25.7.3 Pruning Efficiency
| Algorithm | Docs | Terms | Scored | Skipped | Efficiency |
|---|---|---|---|---|---|
| WAND | 500 | 5 | 144 | 572 | 79.9% |
| BMW | 500 | 5 | 144 | 1070 | 88.1% |
BMW achieves higher efficiency by skipping documents at the block level before individual evaluation.
25.7.4 Skip Rate Analysis
The skip rate measures the fraction of posting list entries not scored:
Typical skip rates:
- Short queries (2-3 terms): 70-85%
- Medium queries (4-6 terms): 80-90%
- Long queries (7+ terms): 85-95%
Higher skip rates translate directly to lower query latency.
25.8 Algorithm Selection
25.8.1 Selection Heuristics
Cognica uses data-driven heuristics to choose between WAND and BMW:
bool should_use_bmw(size_t avg_posting_size,
size_t num_terms,
size_t k) {
// Small lists: BMW 20-43% faster
if (avg_posting_size <= 250) {
return true;
}
// Medium lists: BMW 7-17% faster
if (avg_posting_size <= 1000) {
return true;
}
// Large lists with few terms: BMW beneficial
if (avg_posting_size <= 5000 && num_terms <= 3) {
return true;
}
// Large k: BMW 37% faster
if (k >= 100) {
return true;
}
// Default: WAND for very large lists
return false;
}
25.8.2 Decision Rationale
| Condition | Recommendation | Reason |
|---|---|---|
| Small lists () | BMW | Block overhead amortized |
| Medium lists () | BMW | Tighter bounds beneficial |
| Large lists, few terms | BMW | Block skipping effective |
| Large lists, many terms | WAND | Lower per-doc overhead |
| Large k () | BMW | More pruning opportunities |
25.8.3 Memory Considerations
BMW Block Overhead:
For a term with 10,000 postings and block size 128:
- Blocks:
- Memory: bytes
This overhead is typically negligible compared to the posting list itself.
25.9 Integration with Query Pipeline
25.9.1 Boolean Query Integration
auto BooleanWeight::create_scorer(LeafReaderContext* ctx)
-> ScorerType {
// Collect term weights and scorers
auto term_weights = collect_term_weights(must_clauses_);
// Determine algorithm
auto avg_posting_size = compute_avg_posting_size(term_weights);
bool use_bmw = should_use_bmw(avg_posting_size, term_weights.size(),
search_options_.top_k);
// Create appropriate executor
if (use_bmw) {
return create_bmw_scorer(term_weights, ctx);
} else {
return create_wand_scorer(term_weights, ctx);
}
}
25.9.2 Hybrid Search Integration
For hybrid queries combining text and vector search:
// Text component uses WAND/BMW
auto text_scorer = create_wand_or_bmw(text_weights);
// Vector component uses direct HNSW
auto vector_scorer = create_vector_scorer(vector_weight);
// Combine with disjunction
return DisjunctionScorer{{text_scorer, vector_scorer}};
The WAND/BMW optimization applies only to the text component; vector search uses HNSW's native top-k retrieval.
25.10 Correctness Verification
25.10.1 Invariants
WAND and BMW maintain critical invariants:
- Safety: Every document in the result could potentially have entered top-k
- Completeness: Every document in true top-k is in the result
- Ordering: Results are sorted by score descending
25.10.2 Verification Strategy
std::vector<ScoredDoc> brute_force_top_k(
const std::vector<std::shared_ptr<Postings>>& posting_lists,
const std::vector<std::shared_ptr<WANDScorer>>& scorers,
size_t k) {
// Score ALL documents
std::unordered_map<DocID, float> scores;
for (size_t t = 0; t < posting_lists.size(); ++t) {
const auto& postings = posting_lists[t];
for (size_t i = 0; i < postings->doc_ids.size(); ++i) {
DocID doc_id = postings->doc_ids[i];
float tf = postings->term_freqs[i];
float norm = postings->field_norms[i];
scores[doc_id] += scorers[t]->score(tf, norm);
}
}
// Sort and take top-k
std::vector<ScoredDoc> results;
for (const auto& [doc_id, score] : scores) {
results.push_back({doc_id, score});
}
std::sort(results.begin(), results.end());
if (results.size() > k) {
results.resize(k);
}
return results;
}
// Verify WAND == brute force
TEST(WANDCorrectness, MatchesBruteForce) {
auto wand_results = wand_executor.execute();
auto brute_results = brute_force_top_k(...);
ASSERT_EQ(wand_results.size(), brute_results.size());
for (size_t i = 0; i < wand_results.size(); ++i) {
EXPECT_EQ(wand_results[i].doc_id, brute_results[i].doc_id);
EXPECT_FLOAT_EQ(wand_results[i].score, brute_results[i].score);
}
}
25.11 Bayesian BM25 Upper Bounds
25.11.1 Modified Upper Bounds for Probabilistic Scoring
Standard BM25 upper bounds (Section 25.3) do not directly transfer to Bayesian BM25 (Chapter 20) because the composite prior is document-dependent. A document with a lower BM25 score but higher prior can achieve a higher posterior probability.
The safe Bayesian WAND upper bound accounts for the worst case across both dimensions:
where is the maximum likelihood for term and is the global prior upper bound.
Since is a fixed constant, this bound can be precomputed at index time alongside the standard BM25 upper bound, incurring no additional runtime cost.
25.11.2 Block-Max Bayesian Bounds
For BMW, the block-max upper bound applies the posterior formula with the block-local maximum BM25 score:
Since for all blocks, Bayesian BMW pruning remains safe and exact while achieving higher skip rates than Bayesian WAND.
25.12 WAND/BMW as Exact Neural Pruning
25.12.1 The Neural Interpretation
Chapter 23 reveals that the multi-signal hybrid search computation has the structure of a feedforward neural network. Under this interpretation, WAND and BMW are exact neural pruning algorithms — they skip neurons (documents) that provably cannot affect the top-k output, with formal safety guarantees.
The correspondence:
| IR Concept | Neural Concept |
|---|---|
| Document | Input sample |
| Term score | Neuron activation |
| Upper bound | Maximum possible activation |
| Threshold | Current minimum output |
| WAND skip | Exact neuron pruning |
25.12.2 Safety Requires Bounded Activations
Theorem (Necessary Conditions for Exact Pruning). Exact safe pruning requires:
- Boundedness: The activation function must have a finite upper bound, so maximum possible contributions can be precomputed
- Monotonicity: The activation must be monotonically increasing, so upper bounds on inputs translate to upper bounds on outputs
Proof sketch. If the activation function is unbounded (like ReLU: ), no finite upper bound exists for any input, making it impossible to guarantee that a pruned document could not have entered the top-k. With the sigmoid activation (bounded by 1), the maximum contribution of any term is:
This bound is finite, tight, and computable at index time.
This result explains why WAND/BMW are fundamentally compatible with Bayesian BM25 (sigmoid activation) but incompatible with unbounded scoring functions. The boundedness property is a direct consequence of the sigmoid's probabilistic origin — relevance probabilities live in by definition.
25.12.3 Skip Rate Analysis Under Bayesian Scoring
Bayesian BM25 scoring changes the skip rate characteristics:
| Query Type | BM25 Skip Rate | Bayesian BM25 Skip Rate |
|---|---|---|
| Rare terms (IDF > 5) | 90-99% | 85-98% |
| Common terms (IDF < 2) | 10-30% | 8-25% |
| Mixed queries | 50-80% | 45-75% |
The slight reduction in skip rates is due to the document-dependent prior: the global is looser than the true per-document prior, making the upper bound less tight. However, the difference is typically small (3-5 percentage points) because the prior range is bounded and the worst-case prior is rarely achieved.
25.13 Summary
This chapter explored query evaluation strategies for efficient top-k retrieval:
-
WAND algorithm: Uses term-level upper bounds to identify pivot documents; skips documents whose cumulative upper bound cannot reach the threshold
-
Block-Max WAND: Partitions posting lists into blocks with precomputed maximum scores; enables finer-grained skipping with block-level upper bounds
-
Upper bound computation: BM25's convergence to provides tight, easily computed upper bounds
-
Dynamic threshold: Min-heap collector maintains the -th highest score; threshold rises as better documents are found
-
Block caching: Three-tier lookup (cached block, next block, binary search) achieves 99%+ cache hit rate
-
Algorithm selection: Data-driven heuristics choose WAND or BMW based on posting list size, term count, and result size
-
Bayesian upper bounds: Modified bounds account for document-dependent priors in Bayesian BM25; precomputable at index time with no runtime overhead
-
Exact neural pruning: WAND/BMW constitute provably exact pruning for the sigmoid-based neural structure derived in Chapter 23; safety requires bounded activations, which the sigmoid provides and ReLU does not
-
Performance: Skip rates of 70-95% reduce scoring work dramatically; BMW provides 10-73% speedup on typical workloads
These optimizations transform top-k retrieval from an operation proportional to total posting list size to one proportional to the documents actually scored — typically orders of magnitude smaller. Combined with the hybrid search architecture of Chapter 24, Cognica provides efficient, high-quality retrieval across both lexical and semantic signals.