Chapter 25: Query Evaluation Strategies (WAND/BMW)

Introduction

Top-k retrieval—finding the $k$ highest-scoring documents for a query—dominates search engine workloads. Naive evaluation scores every document containing any query term, an approach with complexity proportional to the sum of posting list lengths. For queries with millions of matching documents, this becomes prohibitively expensive.

The Weak AND (WAND) algorithm and its Block-Max variant (BMW) exploit a fundamental observation: we do not need to score documents that cannot enter the top-k. By maintaining score upper bounds and a dynamic threshold, these algorithms skip large portions of posting lists without sacrificing result quality.

This chapter explores Cognica's production implementation of WAND and BMW: the theoretical foundations of safe document skipping, the engineering of efficient upper bound computation, and the data-driven heuristics for algorithm selection.

25.1 The Top-K Problem

25.1.1 Problem Definition

Given:

A query $Q = \{t_1, t_2, \ldots, t_m\}$ of $m$ terms
A collection of $N$ documents
A scoring function $\text{score}(d, Q) = \sum_{t \in Q \cap d} s(d, t)$
A result size $k$

Find the $k$ documents with highest scores.

25.1.2 Naive Evaluation

Document-at-a-time (DAAT) evaluation processes documents in sorted order:

for each document d in union(posting_lists):
    score = sum of s(d, t) for each term t in d
    if score > threshold:
        insert d into top-k heap
        update threshold

Complexity: $O(\sum_{t \in Q} |L_t|)$ where $|L_t|$ is the posting list length for term $t$ .

For a 5-term query with 100,000 postings each, this evaluates 500,000 score contributions—most for documents that never enter the top-k.

25.1.3 The Upper Bound Insight

WAND exploits term-level upper bounds. If the maximum possible score contribution from term $t$ is $U_t$ , then a document can score at most:

\text{score}(d, Q) \leq \sum_{t \in Q \cap d} U_t

Documents whose upper bound falls below the current $k$ -th highest score can be safely skipped.

25.2 WAND Algorithm

25.2.1 Algorithm Overview

WAND maintains posting list iterators sorted by current document ID. The algorithm identifies a "pivot" document and determines whether it could potentially enter the top-k based on cumulative upper bounds.

Loading diagram...

25.2.2 Pivot Selection

The pivot is the first document where cumulative upper bounds reach the threshold:

int WANDExecutor::find_pivot(
    const std::vector<TermState>& sorted_states,
    float threshold) {

  float cumulative_ub = 0.0f;

  for (size_t i = 0; i < sorted_states.size(); ++i) {
    size_t term_idx = sorted_states[i].term_index;
    cumulative_ub += scorers_[term_idx]->get_upper_bound();

    if (cumulative_ub >= threshold) {
      return static_cast<int>(i);  // Pivot found
    }
  }
  return -1;  // No pivot: algorithm terminates
}

Termination Condition: When the sum of all upper bounds cannot reach the threshold, no document can enter the top-k, and evaluation terminates.

25.2.3 Document Evaluation

size_t WANDExecutor::execute() {
  while (state_manager_->is_valid()) {
    float threshold = collector_->get_threshold();
    auto sorted_states = state_manager_->get_sorted_states();

    int pivot_idx = find_pivot(sorted_states, threshold);
    if (pivot_idx < 0) break;

    DocID pivot_doc = sorted_states[pivot_idx].current_doc;

    // Advance terms before pivot to pivot document
    advance_to_pivot(sorted_states, pivot_idx);

    // Collect terms now at pivot
    collect_terms_at_pivot(pivot_doc);

    // Verify upper bound still exceeds threshold
    float sum_upper_bounds = 0.0f;
    for (const auto& state : terms_at_pivot_buf_) {
      sum_upper_bounds += scorers_[state.term_index]->get_upper_bound();
    }

    if (sum_upper_bounds >= threshold) {
      // Compute exact score
      float score = compute_exact_score(pivot_doc, terms_at_pivot_buf_);
      stats_.docs_scored++;
      collector_->try_insert(pivot_doc, score);
    } else {
      stats_.docs_skipped++;
    }

    // Advance past pivot
    advance_past_pivot(pivot_doc);
  }

  return stats_.docs_scored;
}

25.2.4 Iterator Advancement

When terms before the pivot do not contain the pivot document, they advance to or past it:

void WANDExecutor::advance_to_pivot(
    const std::vector<TermState>& sorted_states,
    int pivot_idx) {

  DocID pivot_doc = sorted_states[pivot_idx].current_doc;

  // Terms 0 to pivot_idx-1 must advance to pivot_doc
  for (int i = 0; i < pivot_idx; ++i) {
    size_t term_idx = sorted_states[i].term_index;
    auto& iterator = iterators_[term_idx];

    if (iterator->doc_id() < pivot_doc) {
      iterator->advance(pivot_doc);
    }
  }

  state_manager_->refresh();
}

25.3 Upper Bound Computation

25.3.1 BM25 Score Analysis

The BM25 scoring formula:

\text{score}(d, t) = \text{IDF}(t) \cdot \frac{f_{t,d} \cdot (k_1 + 1)}{f_{t,d} + k_1 \cdot (1 - b + b \cdot |d|/\text{avgdl})}

As term frequency $f_{t,d} \to \infty$ :

\lim_{f_{t,d} \to \infty} \text{score}(d, t) = \text{IDF}(t) \cdot (k_1 + 1) / (1) = \text{IDF}(t) \cdot (k_1 + 1)

However, Cognica uses a numerically stable reformulation:

\text{score} = w - \frac{w}{1 + f \cdot \text{inv\_norm}}

where $w = \text{boost} \cdot \text{IDF}$ . As $f \to \infty$ :

\lim_{f \to \infty} \text{score} = w - 0 = w

This yields a tight upper bound: $U_t = \text{boost} \cdot \text{IDF}(t)$ .

25.3.2 WAND Scorer Implementation

class BM25WANDScorer final : public WANDScorer {
public:
  BM25WANDScorer(BM25SimScorer sim_scorer, float boost,
                 float idf, float k1)
      : sim_scorer_(std::move(sim_scorer)),
        boost_(boost),
        upper_bound_(boost * idf) {  // Tight upper bound
  }

  auto get_upper_bound() const -> float override {
    return upper_bound_;
  }

  auto score(float freq, float norm) const -> float override {
    return sim_scorer_.score(freq, norm);
  }

private:
  BM25SimScorer sim_scorer_;
  float boost_;
  float upper_bound_;
};

25.3.3 IDF Computation

The IDF component uses the Robertson-Sparck Jones formula:

float BM25Similarity::compute_idf(int64_t doc_freq,
                                  int64_t doc_count) const {
  // IDF = ln((N - df + 0.5) / (df + 0.5) + 1)
  auto numerator = static_cast<float>(doc_count - doc_freq) + 0.5f;
  auto denominator = static_cast<float>(doc_freq) + 0.5f;
  return std::log(numerator / denominator + 1.0f);
}

Properties:

Rare terms ( $\text{df} \ll N$ ): High IDF, high upper bound
Common terms ( $\text{df} \approx N$ ): Low IDF, low upper bound
IDF is always positive due to the $+1$ term

25.4 Block-Max WAND (BMW)

25.4.1 Motivation

WAND's term-level upper bounds are loose—they assume maximum term frequency across the entire collection. Documents in regions with lower term frequencies have tighter actual bounds.

BMW partitions posting lists into blocks and precomputes per-block maximum scores, enabling finer-grained skipping.

25.4.2 Block Structure

struct BlockInfo {
  DocID start_doc;    // First document in block
  DocID end_doc;      // Last document in block
  float max_score;    // Maximum BM25 score in block

  bool contains(DocID doc_id) const {
    return doc_id >= start_doc && doc_id <= end_doc;
  }
};

Block Generation:

std::vector<BlockInfo> generate_blocks(
    const std::shared_ptr<Postings>& postings,
    float idf, float k1, float b, float boost,
    float avg_doc_size,
    size_t block_size = 128) {

  const size_t num_docs = postings->doc_ids.size();
  const size_t num_blocks = (num_docs + block_size - 1) / block_size;

  std::vector<BlockInfo> blocks;
  blocks.reserve(num_blocks);

  for (size_t block_idx = 0; block_idx < num_blocks; ++block_idx) {
    const size_t start_idx = block_idx * block_size;
    const size_t end_idx = std::min(start_idx + block_size, num_docs);

    BlockInfo block;
    block.start_doc = postings->doc_ids[start_idx];
    block.end_doc = postings->doc_ids[end_idx - 1];
    block.max_score = 0.0f;

    // Compute max BM25 score in this block
    for (size_t i = start_idx; i < end_idx; ++i) {
      const float tf = static_cast<float>(postings->term_freqs[i]);
      const float norm = postings->field_norms[i];
      const float score = compute_bm25(tf, norm, idf, k1, b,
                                       boost, avg_doc_size);
      block.max_score = std::max(block.max_score, score);
    }

    blocks.push_back(block);
  }

  return blocks;
}

25.4.3 Block-Max Term State

struct BlockMaxTermState {
  DocID current_doc;       // Current document ID
  size_t term_index;       // Term identifier
  DocID block_end;         // End of current block
  float block_max_score;   // Max score in current block
  bool exhausted;          // Iterator exhausted flag
};

25.4.4 BMW Pivot Selection

BMW uses block-level upper bounds instead of term-level:

int BMWExecutor::find_pivot_bmw(
    const std::vector<BlockMaxTermState>& sorted_states,
    float threshold) {

  float cumulative_block_max = 0.0f;

  for (size_t i = 0; i < sorted_states.size(); ++i) {
    // Use block-max score, not term-level upper bound
    cumulative_block_max += sorted_states[i].block_max_score;

    if (cumulative_block_max >= threshold) {
      return static_cast<int>(i);
    }
  }

  return -1;  // No pivot found
}

Key Difference: Block-max scores are typically much smaller than term-level upper bounds, enabling more aggressive skipping.

25.4.5 Block Index Caching

Efficient block lookup is critical for BMW performance:

void BlockMaxTermStateManager::update_block_info(size_t term_index) {
  auto& state = states_[term_index];
  DocID doc_id = state.current_doc;

  // 1. Check cached block (O(1) - ~99% hit rate)
  int cached_idx = current_block_indices_[term_index];
  if (cached_idx >= 0) {
    const auto& cached_block = blocks_[term_index][cached_idx];
    if (cached_block.contains(doc_id)) {
      state.block_end = cached_block.end_doc;
      state.block_max_score = cached_block.max_score;
      return;  // Cache hit
    }
  }

  // 2. Check next block (O(1) - sequential access pattern)
  if (cached_idx + 1 < static_cast<int>(blocks_[term_index].size())) {
    const auto& next_block = blocks_[term_index][cached_idx + 1];
    if (next_block.contains(doc_id)) {
      current_block_indices_[term_index] = cached_idx + 1;
      state.block_end = next_block.end_doc;
      state.block_max_score = next_block.max_score;
      return;
    }
  }

  // 3. Binary search (O(log B) - rare, <0.1% of lookups)
  int block_idx = binary_search_block(term_index, doc_id);
  if (block_idx >= 0) {
    current_block_indices_[term_index] = block_idx;
    const auto& block = blocks_[term_index][block_idx];
    state.block_end = block.end_doc;
    state.block_max_score = block.max_score;
  }
}

Loading diagram...

25.4.6 Block-Level Skipping

BMW can skip entire blocks when their maximum score is insufficient:

DocID BMWExecutor::next_candidate() {
  // Find minimum block end across all terms
  DocID min_block_end = kNoMoreDocs;
  for (const auto& state : states_) {
    if (!state.exhausted) {
      min_block_end = std::min(min_block_end, state.block_end);
    }
  }

  // If current pivot cannot reach threshold, skip to next block
  float threshold = collector_->get_threshold();
  float block_sum = compute_block_max_sum();

  if (block_sum < threshold) {
    // Skip all terms to min_block_end + 1
    return min_block_end + 1;
  }

  return current_pivot_doc_;
}

25.5 Top-K Collection

25.5.1 Min-Heap Structure

The TopKCollector maintains a min-heap of $k$ documents:

struct ScoredDoc {
  DocID doc_id;
  float score;

  // Min-heap comparator: lowest score at top
  bool operator>(const ScoredDoc& other) const {
    if (score != other.score) {
      return score > other.score;
    }
    return doc_id < other.doc_id;  // Tie-break by doc_id
  }
};

class TopKCollector {
public:
  explicit TopKCollector(size_t k, float min_score = 0.f)
      : k_(k), min_score_(min_score) {}

  bool try_insert(DocID doc_id, float score) {
    if (score < min_score_) {
      return false;
    }

    if (heap_.size() < k_) {
      heap_.push({doc_id, score});
      return true;
    }

    const auto& worst = heap_.top();
    if (score > worst.score ||
        (score == worst.score && doc_id < worst.doc_id)) {
      heap_.pop();
      heap_.push({doc_id, score});
      return true;
    }

    return false;
  }

  float get_threshold() const {
    if (heap_.size() < k_) {
      return min_score_;
    }
    return heap_.top().score;
  }

private:
  size_t k_;
  float min_score_;
  std::priority_queue<ScoredDoc, std::vector<ScoredDoc>,
                      std::greater<ScoredDoc>> heap_;
};

25.5.2 Dynamic Threshold

The threshold increases as better documents are found:

Heap State	Threshold	Effect
$\lvert H \rvert < k$	`min_score_`	Accept any qualified document
$\lvert H \rvert = k$	$k$ -th highest score	Prune aggressively

This dynamic threshold enables progressive pruning: early in query evaluation, most documents qualify; as the heap fills with high-scoring documents, the threshold rises, and more documents are skipped.

25.5.3 Tie-Breaking

Deterministic tie-breaking ensures reproducible results:

bool operator<(const ScoredDoc& other) const {
  if (score != other.score) {
    return score > other.score;  // Descending by score
  }
  return doc_id < other.doc_id;  // Ascending by doc_id
}

When two documents have identical scores, the one with lower doc_id ranks higher. This provides:

Determinism: Same query always produces same ranking
Stability: Repeated queries do not randomly reorder ties
Efficiency: Single comparison chain, no secondary data needed

25.6 State Management

25.6.1 Term State Structure

struct TermState {
  size_t term_index;    // Index into scorer array
  DocID current_doc;    // Current iterator position
  float upper_bound;    // Term-level upper bound

  bool operator<(const TermState& other) const {
    return current_doc < other.current_doc;
  }
};

25.6.2 State Manager

The TermStateManager coordinates iterator state across terms:

class TermStateManager {
public:
  std::vector<TermState> get_sorted_states() {
    if (!sorted_cache_valid_) {
      sorted_states_ = states_;
      std::sort(sorted_states_.begin(), sorted_states_.end());
      sorted_cache_valid_ = true;
    }
    return sorted_states_;
  }

  void advance_term(size_t term_idx, DocID target) {
    auto& iterator = iterators_[term_idx];
    if (iterator->doc_id() < target) {
      iterator->advance(target);
    }
    states_[term_idx].current_doc = iterator->doc_id();
    sorted_cache_valid_ = false;
  }

  void refresh() {
    for (size_t i = 0; i < states_.size(); ++i) {
      states_[i].current_doc = iterators_[i]->doc_id();
    }
    sorted_cache_valid_ = false;
  }

  bool is_valid() const {
    return std::any_of(iterators_.begin(), iterators_.end(),
        [](const auto& it) { return it->is_valid(); });
  }

private:
  std::vector<TermState> states_;
  std::vector<TermState> sorted_states_;
  std::vector<std::shared_ptr<DocIDSetIterator>> iterators_;
  bool sorted_cache_valid_ = false;
};

25.6.3 Iterator Interface

class DocIDSetIterator {
public:
  virtual auto doc_id() const -> DocID = 0;
  virtual auto is_valid() const -> bool = 0;
  virtual auto next() -> DocID = 0;
  virtual auto advance(DocID target) -> DocID = 0;
  virtual auto index() const -> size_t = 0;
};

The advance(target) method is critical for WAND efficiency—it skips directly to the target document or the next document after it, without iterating through intermediate postings.

25.7 Performance Analysis

25.7.1 Complexity

WAND:

Best case: $O(k \log k + m)$ when threshold rises quickly
Worst case: $O(\sum |L_t|)$ when all documents score above threshold
Typical: $O(k \log k \cdot m + S)$ where $S$ is documents scored

BMW:

Additional overhead: $O(B)$ block metadata per term
Benefit: Tighter upper bounds reduce $S$ significantly
Block lookup: Amortized $O(1)$ with caching

25.7.2 Benchmark Results

Small Posting Lists (100-250 documents):

Terms	Docs/Term	k	WAND (us)	BMW (us)	Speedup
2	100	10	18.3	10.6	1.73x
5	250	10	89.3	70.8	1.26x

Medium Posting Lists (500-1000 documents):

Terms	Docs/Term	k	WAND (us)	BMW (us)	Speedup
2	500	10	49.1	43.9	1.12x
5	500	10	147.8	121.8	1.21x
5	1000	10	287.5	262.4	1.10x

25.7.3 Pruning Efficiency

Algorithm	Docs	Terms	Scored	Skipped	Efficiency
WAND	500	5	144	572	79.9%
BMW	500	5	144	1070	88.1%

BMW achieves higher efficiency by skipping documents at the block level before individual evaluation.

25.7.4 Skip Rate Analysis

The skip rate measures the fraction of posting list entries not scored:

\text{Skip Rate} = 1 - \frac{\text{Documents Scored}}{\sum |L_t|}

Typical skip rates:

Short queries (2-3 terms): 70-85%
Medium queries (4-6 terms): 80-90%
Long queries (7+ terms): 85-95%

Higher skip rates translate directly to lower query latency.

25.8 Algorithm Selection

25.8.1 Selection Heuristics

Cognica uses data-driven heuristics to choose between WAND and BMW:

bool should_use_bmw(size_t avg_posting_size,
                    size_t num_terms,
                    size_t k) {

  // Small lists: BMW 20-43% faster
  if (avg_posting_size <= 250) {
    return true;
  }

  // Medium lists: BMW 7-17% faster
  if (avg_posting_size <= 1000) {
    return true;
  }

  // Large lists with few terms: BMW beneficial
  if (avg_posting_size <= 5000 && num_terms <= 3) {
    return true;
  }

  // Large k: BMW 37% faster
  if (k >= 100) {
    return true;
  }

  // Default: WAND for very large lists
  return false;
}

25.8.2 Decision Rationale

Condition	Recommendation	Reason
Small lists ( $\leq 250$ )	BMW	Block overhead amortized
Medium lists ( $\leq 1000$ )	BMW	Tighter bounds beneficial
Large lists, few terms	BMW	Block skipping effective
Large lists, many terms	WAND	Lower per-doc overhead
Large k ( $\geq 100$ )	BMW	More pruning opportunities

25.8.3 Memory Considerations

BMW Block Overhead:

\text{Memory} = \text{Terms} \times \text{Blocks/Term} \times \text{sizeof(BlockInfo)}

For a term with 10,000 postings and block size 128:

Blocks: $\lceil 10000/128 \rceil = 79$
Memory: $79 \times 16 = 1264$ bytes

This overhead is typically negligible compared to the posting list itself.

25.9 Integration with Query Pipeline

25.9.1 Boolean Query Integration

auto BooleanWeight::create_scorer(LeafReaderContext* ctx)
    -> ScorerType {

  // Collect term weights and scorers
  auto term_weights = collect_term_weights(must_clauses_);

  // Determine algorithm
  auto avg_posting_size = compute_avg_posting_size(term_weights);
  bool use_bmw = should_use_bmw(avg_posting_size, term_weights.size(),
                                 search_options_.top_k);

  // Create appropriate executor
  if (use_bmw) {
    return create_bmw_scorer(term_weights, ctx);
  } else {
    return create_wand_scorer(term_weights, ctx);
  }
}

25.9.2 Hybrid Search Integration

For hybrid queries combining text and vector search:

// Text component uses WAND/BMW
auto text_scorer = create_wand_or_bmw(text_weights);

// Vector component uses direct HNSW
auto vector_scorer = create_vector_scorer(vector_weight);

// Combine with disjunction
return DisjunctionScorer{{text_scorer, vector_scorer}};

The WAND/BMW optimization applies only to the text component; vector search uses HNSW's native top-k retrieval.

25.10 Correctness Verification

25.10.1 Invariants

WAND and BMW maintain critical invariants:

Safety: Every document in the result could potentially have entered top-k
Completeness: Every document in true top-k is in the result
Ordering: Results are sorted by score descending

25.10.2 Verification Strategy

std::vector<ScoredDoc> brute_force_top_k(
    const std::vector<std::shared_ptr<Postings>>& posting_lists,
    const std::vector<std::shared_ptr<WANDScorer>>& scorers,
    size_t k) {

  // Score ALL documents
  std::unordered_map<DocID, float> scores;

  for (size_t t = 0; t < posting_lists.size(); ++t) {
    const auto& postings = posting_lists[t];
    for (size_t i = 0; i < postings->doc_ids.size(); ++i) {
      DocID doc_id = postings->doc_ids[i];
      float tf = postings->term_freqs[i];
      float norm = postings->field_norms[i];
      scores[doc_id] += scorers[t]->score(tf, norm);
    }
  }

  // Sort and take top-k
  std::vector<ScoredDoc> results;
  for (const auto& [doc_id, score] : scores) {
    results.push_back({doc_id, score});
  }

  std::sort(results.begin(), results.end());
  if (results.size() > k) {
    results.resize(k);
  }

  return results;
}

// Verify WAND == brute force
TEST(WANDCorrectness, MatchesBruteForce) {
  auto wand_results = wand_executor.execute();
  auto brute_results = brute_force_top_k(...);

  ASSERT_EQ(wand_results.size(), brute_results.size());
  for (size_t i = 0; i < wand_results.size(); ++i) {
    EXPECT_EQ(wand_results[i].doc_id, brute_results[i].doc_id);
    EXPECT_FLOAT_EQ(wand_results[i].score, brute_results[i].score);
  }
}

25.11 Bayesian BM25 Upper Bounds

25.11.1 Modified Upper Bounds for Probabilistic Scoring

Standard BM25 upper bounds (Section 25.3) do not directly transfer to Bayesian BM25 (Chapter 20) because the composite prior is document-dependent. A document with a lower BM25 score but higher prior can achieve a higher posterior probability.

The safe Bayesian WAND upper bound accounts for the worst case across both dimensions:

\text{UB}_{\text{Bayes}}(t) = \frac{L_{\max}(t) \cdot p_{\max}}{L_{\max}(t) \cdot p_{\max} + (1 - L_{\max}(t)) \cdot (1 - p_{\max})}

where $L_{\max}(t) = \sigma(\alpha \cdot (\text{UB}_{\text{BM25}}(t) - \beta))$ is the maximum likelihood for term $t$ and $p_{\max} = 0.9$ is the global prior upper bound.

Since $p_{\max}$ is a fixed constant, this bound can be precomputed at index time alongside the standard BM25 upper bound, incurring no additional runtime cost.

25.11.2 Block-Max Bayesian Bounds

For BMW, the block-max upper bound applies the posterior formula with the block-local maximum BM25 score:

\text{BlockMax}_{\text{Bayes}}(t, j) = \frac{L_j(t) \cdot p_{\max}}{L_j(t) \cdot p_{\max} + (1 - L_j(t)) \cdot (1 - p_{\max})}

Since $\text{BlockMax}_{\text{BM25}}(t, j) \leq \text{UB}_{\text{BM25}}(t)$ for all blocks, Bayesian BMW pruning remains safe and exact while achieving higher skip rates than Bayesian WAND.

25.12 WAND/BMW as Exact Neural Pruning

25.12.1 The Neural Interpretation

Chapter 23 reveals that the multi-signal hybrid search computation has the structure of a feedforward neural network. Under this interpretation, WAND and BMW are exact neural pruning algorithms — they skip neurons (documents) that provably cannot affect the top-k output, with formal safety guarantees.

The correspondence:

IR Concept	Neural Concept
Document	Input sample
Term score	Neuron activation
Upper bound	Maximum possible activation
Threshold	Current minimum output
WAND skip	Exact neuron pruning

25.12.2 Safety Requires Bounded Activations

Theorem (Necessary Conditions for Exact Pruning). Exact safe pruning requires:

Boundedness: The activation function must have a finite upper bound, so maximum possible contributions can be precomputed
Monotonicity: The activation must be monotonically increasing, so upper bounds on inputs translate to upper bounds on outputs

Proof sketch. If the activation function is unbounded (like ReLU: $\max(0, x) \to \infty$ ), no finite upper bound exists for any input, making it impossible to guarantee that a pruned document could not have entered the top-k. With the sigmoid activation (bounded by 1), the maximum contribution of any term is:

\sigma(\alpha \cdot (\text{UB}_{\text{BM25}}(t) - \beta)) < 1

This bound is finite, tight, and computable at index time. $\square$

This result explains why WAND/BMW are fundamentally compatible with Bayesian BM25 (sigmoid activation) but incompatible with unbounded scoring functions. The boundedness property is a direct consequence of the sigmoid's probabilistic origin — relevance probabilities live in $[0, 1]$ by definition.

25.12.3 Skip Rate Analysis Under Bayesian Scoring

Bayesian BM25 scoring changes the skip rate characteristics:

Query Type	BM25 Skip Rate	Bayesian BM25 Skip Rate
Rare terms (IDF > 5)	90-99%	85-98%
Common terms (IDF < 2)	10-30%	8-25%
Mixed queries	50-80%	45-75%

The slight reduction in skip rates is due to the document-dependent prior: the global $p_{\max} = 0.9$ is looser than the true per-document prior, making the upper bound less tight. However, the difference is typically small (3-5 percentage points) because the prior range $[0.1, 0.9]$ is bounded and the worst-case prior is rarely achieved.

25.13 Summary

This chapter explored query evaluation strategies for efficient top-k retrieval:

WAND algorithm: Uses term-level upper bounds to identify pivot documents; skips documents whose cumulative upper bound cannot reach the threshold
Block-Max WAND: Partitions posting lists into blocks with precomputed maximum scores; enables finer-grained skipping with block-level upper bounds
Upper bound computation: BM25's convergence to $\text{weight} = \text{boost} \cdot \text{IDF}$ provides tight, easily computed upper bounds
Dynamic threshold: Min-heap collector maintains the $k$ -th highest score; threshold rises as better documents are found
Block caching: Three-tier lookup (cached block, next block, binary search) achieves 99%+ cache hit rate
Algorithm selection: Data-driven heuristics choose WAND or BMW based on posting list size, term count, and result size
Bayesian upper bounds: Modified bounds account for document-dependent priors in Bayesian BM25; precomputable at index time with no runtime overhead
Exact neural pruning: WAND/BMW constitute provably exact pruning for the sigmoid-based neural structure derived in Chapter 23; safety requires bounded activations, which the sigmoid provides and ReLU does not
Performance: Skip rates of 70-95% reduce scoring work dramatically; BMW provides 10-73% speedup on typical workloads

These optimizations transform top-k retrieval from an operation proportional to total posting list size to one proportional to the documents actually scored — typically orders of magnitude smaller. Combined with the hybrid search architecture of Chapter 24, Cognica provides efficient, high-quality retrieval across both lexical and semantic signals.