Chapter 25: Query Evaluation Strategies (WAND/BMW)

Introduction

Top-k retrieval—finding the kk highest-scoring documents for a query—dominates search engine workloads. Naive evaluation scores every document containing any query term, an approach with complexity proportional to the sum of posting list lengths. For queries with millions of matching documents, this becomes prohibitively expensive.

The Weak AND (WAND) algorithm and its Block-Max variant (BMW) exploit a fundamental observation: we do not need to score documents that cannot enter the top-k. By maintaining score upper bounds and a dynamic threshold, these algorithms skip large portions of posting lists without sacrificing result quality.

This chapter explores Cognica's production implementation of WAND and BMW: the theoretical foundations of safe document skipping, the engineering of efficient upper bound computation, and the data-driven heuristics for algorithm selection.

25.1 The Top-K Problem

25.1.1 Problem Definition

Given:

  • A query Q={t1,t2,,tm}Q = \{t_1, t_2, \ldots, t_m\} of mm terms
  • A collection of NN documents
  • A scoring function score(d,Q)=tQds(d,t)\text{score}(d, Q) = \sum_{t \in Q \cap d} s(d, t)
  • A result size kk

Find the kk documents with highest scores.

25.1.2 Naive Evaluation

Document-at-a-time (DAAT) evaluation processes documents in sorted order:

for each document d in union(posting_lists):
    score = sum of s(d, t) for each term t in d
    if score > threshold:
        insert d into top-k heap
        update threshold

Complexity: O(tQLt)O(\sum_{t \in Q} |L_t|) where Lt|L_t| is the posting list length for term tt.

For a 5-term query with 100,000 postings each, this evaluates 500,000 score contributions—most for documents that never enter the top-k.

25.1.3 The Upper Bound Insight

WAND exploits term-level upper bounds. If the maximum possible score contribution from term tt is UtU_t, then a document can score at most:

score(d,Q)tQdUt\text{score}(d, Q) \leq \sum_{t \in Q \cap d} U_t

Documents whose upper bound falls below the current kk-th highest score can be safely skipped.

25.2 WAND Algorithm

25.2.1 Algorithm Overview

WAND maintains posting list iterators sorted by current document ID. The algorithm identifies a "pivot" document and determines whether it could potentially enter the top-k based on cumulative upper bounds.

Loading diagram...

25.2.2 Pivot Selection

The pivot is the first document where cumulative upper bounds reach the threshold:

int WANDExecutor::find_pivot(
    const std::vector<TermState>& sorted_states,
    float threshold) {

  float cumulative_ub = 0.0f;

  for (size_t i = 0; i < sorted_states.size(); ++i) {
    size_t term_idx = sorted_states[i].term_index;
    cumulative_ub += scorers_[term_idx]->get_upper_bound();

    if (cumulative_ub >= threshold) {
      return static_cast<int>(i);  // Pivot found
    }
  }
  return -1;  // No pivot: algorithm terminates
}

Termination Condition: When the sum of all upper bounds cannot reach the threshold, no document can enter the top-k, and evaluation terminates.

25.2.3 Document Evaluation

size_t WANDExecutor::execute() {
  while (state_manager_->is_valid()) {
    float threshold = collector_->get_threshold();
    auto sorted_states = state_manager_->get_sorted_states();

    int pivot_idx = find_pivot(sorted_states, threshold);
    if (pivot_idx < 0) break;

    DocID pivot_doc = sorted_states[pivot_idx].current_doc;

    // Advance terms before pivot to pivot document
    advance_to_pivot(sorted_states, pivot_idx);

    // Collect terms now at pivot
    collect_terms_at_pivot(pivot_doc);

    // Verify upper bound still exceeds threshold
    float sum_upper_bounds = 0.0f;
    for (const auto& state : terms_at_pivot_buf_) {
      sum_upper_bounds += scorers_[state.term_index]->get_upper_bound();
    }

    if (sum_upper_bounds >= threshold) {
      // Compute exact score
      float score = compute_exact_score(pivot_doc, terms_at_pivot_buf_);
      stats_.docs_scored++;
      collector_->try_insert(pivot_doc, score);
    } else {
      stats_.docs_skipped++;
    }

    // Advance past pivot
    advance_past_pivot(pivot_doc);
  }

  return stats_.docs_scored;
}

25.2.4 Iterator Advancement

When terms before the pivot do not contain the pivot document, they advance to or past it:

void WANDExecutor::advance_to_pivot(
    const std::vector<TermState>& sorted_states,
    int pivot_idx) {

  DocID pivot_doc = sorted_states[pivot_idx].current_doc;

  // Terms 0 to pivot_idx-1 must advance to pivot_doc
  for (int i = 0; i < pivot_idx; ++i) {
    size_t term_idx = sorted_states[i].term_index;
    auto& iterator = iterators_[term_idx];

    if (iterator->doc_id() < pivot_doc) {
      iterator->advance(pivot_doc);
    }
  }

  state_manager_->refresh();
}

25.3 Upper Bound Computation

25.3.1 BM25 Score Analysis

The BM25 scoring formula:

score(d,t)=IDF(t)ft,d(k1+1)ft,d+k1(1b+bd/avgdl)\text{score}(d, t) = \text{IDF}(t) \cdot \frac{f_{t,d} \cdot (k_1 + 1)}{f_{t,d} + k_1 \cdot (1 - b + b \cdot |d|/\text{avgdl})}

As term frequency ft,df_{t,d} \to \infty:

limft,dscore(d,t)=IDF(t)(k1+1)/(1)=IDF(t)(k1+1)\lim_{f_{t,d} \to \infty} \text{score}(d, t) = \text{IDF}(t) \cdot (k_1 + 1) / (1) = \text{IDF}(t) \cdot (k_1 + 1)

However, Cognica uses a numerically stable reformulation:

score=ww1+finv_norm\text{score} = w - \frac{w}{1 + f \cdot \text{inv\_norm}}

where w=boostIDFw = \text{boost} \cdot \text{IDF}. As ff \to \infty:

limfscore=w0=w\lim_{f \to \infty} \text{score} = w - 0 = w

This yields a tight upper bound: Ut=boostIDF(t)U_t = \text{boost} \cdot \text{IDF}(t).

25.3.2 WAND Scorer Implementation

class BM25WANDScorer final : public WANDScorer {
public:
  BM25WANDScorer(BM25SimScorer sim_scorer, float boost,
                 float idf, float k1)
      : sim_scorer_(std::move(sim_scorer)),
        boost_(boost),
        upper_bound_(boost * idf) {  // Tight upper bound
  }

  auto get_upper_bound() const -> float override {
    return upper_bound_;
  }

  auto score(float freq, float norm) const -> float override {
    return sim_scorer_.score(freq, norm);
  }

private:
  BM25SimScorer sim_scorer_;
  float boost_;
  float upper_bound_;
};

25.3.3 IDF Computation

The IDF component uses the Robertson-Sparck Jones formula:

float BM25Similarity::compute_idf(int64_t doc_freq,
                                  int64_t doc_count) const {
  // IDF = ln((N - df + 0.5) / (df + 0.5) + 1)
  auto numerator = static_cast<float>(doc_count - doc_freq) + 0.5f;
  auto denominator = static_cast<float>(doc_freq) + 0.5f;
  return std::log(numerator / denominator + 1.0f);
}

Properties:

  • Rare terms (dfN\text{df} \ll N): High IDF, high upper bound
  • Common terms (dfN\text{df} \approx N): Low IDF, low upper bound
  • IDF is always positive due to the +1+1 term

25.4 Block-Max WAND (BMW)

25.4.1 Motivation

WAND's term-level upper bounds are loose—they assume maximum term frequency across the entire collection. Documents in regions with lower term frequencies have tighter actual bounds.

BMW partitions posting lists into blocks and precomputes per-block maximum scores, enabling finer-grained skipping.

25.4.2 Block Structure

struct BlockInfo {
  DocID start_doc;    // First document in block
  DocID end_doc;      // Last document in block
  float max_score;    // Maximum BM25 score in block

  bool contains(DocID doc_id) const {
    return doc_id >= start_doc && doc_id <= end_doc;
  }
};

Block Generation:

std::vector<BlockInfo> generate_blocks(
    const std::shared_ptr<Postings>& postings,
    float idf, float k1, float b, float boost,
    float avg_doc_size,
    size_t block_size = 128) {

  const size_t num_docs = postings->doc_ids.size();
  const size_t num_blocks = (num_docs + block_size - 1) / block_size;

  std::vector<BlockInfo> blocks;
  blocks.reserve(num_blocks);

  for (size_t block_idx = 0; block_idx < num_blocks; ++block_idx) {
    const size_t start_idx = block_idx * block_size;
    const size_t end_idx = std::min(start_idx + block_size, num_docs);

    BlockInfo block;
    block.start_doc = postings->doc_ids[start_idx];
    block.end_doc = postings->doc_ids[end_idx - 1];
    block.max_score = 0.0f;

    // Compute max BM25 score in this block
    for (size_t i = start_idx; i < end_idx; ++i) {
      const float tf = static_cast<float>(postings->term_freqs[i]);
      const float norm = postings->field_norms[i];
      const float score = compute_bm25(tf, norm, idf, k1, b,
                                       boost, avg_doc_size);
      block.max_score = std::max(block.max_score, score);
    }

    blocks.push_back(block);
  }

  return blocks;
}

25.4.3 Block-Max Term State

struct BlockMaxTermState {
  DocID current_doc;       // Current document ID
  size_t term_index;       // Term identifier
  DocID block_end;         // End of current block
  float block_max_score;   // Max score in current block
  bool exhausted;          // Iterator exhausted flag
};

25.4.4 BMW Pivot Selection

BMW uses block-level upper bounds instead of term-level:

int BMWExecutor::find_pivot_bmw(
    const std::vector<BlockMaxTermState>& sorted_states,
    float threshold) {

  float cumulative_block_max = 0.0f;

  for (size_t i = 0; i < sorted_states.size(); ++i) {
    // Use block-max score, not term-level upper bound
    cumulative_block_max += sorted_states[i].block_max_score;

    if (cumulative_block_max >= threshold) {
      return static_cast<int>(i);
    }
  }

  return -1;  // No pivot found
}

Key Difference: Block-max scores are typically much smaller than term-level upper bounds, enabling more aggressive skipping.

25.4.5 Block Index Caching

Efficient block lookup is critical for BMW performance:

void BlockMaxTermStateManager::update_block_info(size_t term_index) {
  auto& state = states_[term_index];
  DocID doc_id = state.current_doc;

  // 1. Check cached block (O(1) - ~99% hit rate)
  int cached_idx = current_block_indices_[term_index];
  if (cached_idx >= 0) {
    const auto& cached_block = blocks_[term_index][cached_idx];
    if (cached_block.contains(doc_id)) {
      state.block_end = cached_block.end_doc;
      state.block_max_score = cached_block.max_score;
      return;  // Cache hit
    }
  }

  // 2. Check next block (O(1) - sequential access pattern)
  if (cached_idx + 1 < static_cast<int>(blocks_[term_index].size())) {
    const auto& next_block = blocks_[term_index][cached_idx + 1];
    if (next_block.contains(doc_id)) {
      current_block_indices_[term_index] = cached_idx + 1;
      state.block_end = next_block.end_doc;
      state.block_max_score = next_block.max_score;
      return;
    }
  }

  // 3. Binary search (O(log B) - rare, <0.1% of lookups)
  int block_idx = binary_search_block(term_index, doc_id);
  if (block_idx >= 0) {
    current_block_indices_[term_index] = block_idx;
    const auto& block = blocks_[term_index][block_idx];
    state.block_end = block.end_doc;
    state.block_max_score = block.max_score;
  }
}
Loading diagram...

25.4.6 Block-Level Skipping

BMW can skip entire blocks when their maximum score is insufficient:

DocID BMWExecutor::next_candidate() {
  // Find minimum block end across all terms
  DocID min_block_end = kNoMoreDocs;
  for (const auto& state : states_) {
    if (!state.exhausted) {
      min_block_end = std::min(min_block_end, state.block_end);
    }
  }

  // If current pivot cannot reach threshold, skip to next block
  float threshold = collector_->get_threshold();
  float block_sum = compute_block_max_sum();

  if (block_sum < threshold) {
    // Skip all terms to min_block_end + 1
    return min_block_end + 1;
  }

  return current_pivot_doc_;
}

25.5 Top-K Collection

25.5.1 Min-Heap Structure

The TopKCollector maintains a min-heap of kk documents:

struct ScoredDoc {
  DocID doc_id;
  float score;

  // Min-heap comparator: lowest score at top
  bool operator>(const ScoredDoc& other) const {
    if (score != other.score) {
      return score > other.score;
    }
    return doc_id < other.doc_id;  // Tie-break by doc_id
  }
};

class TopKCollector {
public:
  explicit TopKCollector(size_t k, float min_score = 0.f)
      : k_(k), min_score_(min_score) {}

  bool try_insert(DocID doc_id, float score) {
    if (score < min_score_) {
      return false;
    }

    if (heap_.size() < k_) {
      heap_.push({doc_id, score});
      return true;
    }

    const auto& worst = heap_.top();
    if (score > worst.score ||
        (score == worst.score && doc_id < worst.doc_id)) {
      heap_.pop();
      heap_.push({doc_id, score});
      return true;
    }

    return false;
  }

  float get_threshold() const {
    if (heap_.size() < k_) {
      return min_score_;
    }
    return heap_.top().score;
  }

private:
  size_t k_;
  float min_score_;
  std::priority_queue<ScoredDoc, std::vector<ScoredDoc>,
                      std::greater<ScoredDoc>> heap_;
};

25.5.2 Dynamic Threshold

The threshold increases as better documents are found:

Heap StateThresholdEffect
H<k\lvert H \rvert < kmin_score_Accept any qualified document
H=k\lvert H \rvert = kkk-th highest scorePrune aggressively

This dynamic threshold enables progressive pruning: early in query evaluation, most documents qualify; as the heap fills with high-scoring documents, the threshold rises, and more documents are skipped.

25.5.3 Tie-Breaking

Deterministic tie-breaking ensures reproducible results:

bool operator<(const ScoredDoc& other) const {
  if (score != other.score) {
    return score > other.score;  // Descending by score
  }
  return doc_id < other.doc_id;  // Ascending by doc_id
}

When two documents have identical scores, the one with lower doc_id ranks higher. This provides:

  • Determinism: Same query always produces same ranking
  • Stability: Repeated queries do not randomly reorder ties
  • Efficiency: Single comparison chain, no secondary data needed

25.6 State Management

25.6.1 Term State Structure

struct TermState {
  size_t term_index;    // Index into scorer array
  DocID current_doc;    // Current iterator position
  float upper_bound;    // Term-level upper bound

  bool operator<(const TermState& other) const {
    return current_doc < other.current_doc;
  }
};

25.6.2 State Manager

The TermStateManager coordinates iterator state across terms:

class TermStateManager {
public:
  std::vector<TermState> get_sorted_states() {
    if (!sorted_cache_valid_) {
      sorted_states_ = states_;
      std::sort(sorted_states_.begin(), sorted_states_.end());
      sorted_cache_valid_ = true;
    }
    return sorted_states_;
  }

  void advance_term(size_t term_idx, DocID target) {
    auto& iterator = iterators_[term_idx];
    if (iterator->doc_id() < target) {
      iterator->advance(target);
    }
    states_[term_idx].current_doc = iterator->doc_id();
    sorted_cache_valid_ = false;
  }

  void refresh() {
    for (size_t i = 0; i < states_.size(); ++i) {
      states_[i].current_doc = iterators_[i]->doc_id();
    }
    sorted_cache_valid_ = false;
  }

  bool is_valid() const {
    return std::any_of(iterators_.begin(), iterators_.end(),
        [](const auto& it) { return it->is_valid(); });
  }

private:
  std::vector<TermState> states_;
  std::vector<TermState> sorted_states_;
  std::vector<std::shared_ptr<DocIDSetIterator>> iterators_;
  bool sorted_cache_valid_ = false;
};

25.6.3 Iterator Interface

class DocIDSetIterator {
public:
  virtual auto doc_id() const -> DocID = 0;
  virtual auto is_valid() const -> bool = 0;
  virtual auto next() -> DocID = 0;
  virtual auto advance(DocID target) -> DocID = 0;
  virtual auto index() const -> size_t = 0;
};

The advance(target) method is critical for WAND efficiency—it skips directly to the target document or the next document after it, without iterating through intermediate postings.

25.7 Performance Analysis

25.7.1 Complexity

WAND:

  • Best case: O(klogk+m)O(k \log k + m) when threshold rises quickly
  • Worst case: O(Lt)O(\sum |L_t|) when all documents score above threshold
  • Typical: O(klogkm+S)O(k \log k \cdot m + S) where SS is documents scored

BMW:

  • Additional overhead: O(B)O(B) block metadata per term
  • Benefit: Tighter upper bounds reduce SS significantly
  • Block lookup: Amortized O(1)O(1) with caching

25.7.2 Benchmark Results

Small Posting Lists (100-250 documents):

TermsDocs/TermkWAND (us)BMW (us)Speedup
21001018.310.61.73x
52501089.370.81.26x

Medium Posting Lists (500-1000 documents):

TermsDocs/TermkWAND (us)BMW (us)Speedup
25001049.143.91.12x
550010147.8121.81.21x
5100010287.5262.41.10x

25.7.3 Pruning Efficiency

AlgorithmDocsTermsScoredSkippedEfficiency
WAND500514457279.9%
BMW5005144107088.1%

BMW achieves higher efficiency by skipping documents at the block level before individual evaluation.

25.7.4 Skip Rate Analysis

The skip rate measures the fraction of posting list entries not scored:

Skip Rate=1Documents ScoredLt\text{Skip Rate} = 1 - \frac{\text{Documents Scored}}{\sum |L_t|}

Typical skip rates:

  • Short queries (2-3 terms): 70-85%
  • Medium queries (4-6 terms): 80-90%
  • Long queries (7+ terms): 85-95%

Higher skip rates translate directly to lower query latency.

25.8 Algorithm Selection

25.8.1 Selection Heuristics

Cognica uses data-driven heuristics to choose between WAND and BMW:

bool should_use_bmw(size_t avg_posting_size,
                    size_t num_terms,
                    size_t k) {

  // Small lists: BMW 20-43% faster
  if (avg_posting_size <= 250) {
    return true;
  }

  // Medium lists: BMW 7-17% faster
  if (avg_posting_size <= 1000) {
    return true;
  }

  // Large lists with few terms: BMW beneficial
  if (avg_posting_size <= 5000 && num_terms <= 3) {
    return true;
  }

  // Large k: BMW 37% faster
  if (k >= 100) {
    return true;
  }

  // Default: WAND for very large lists
  return false;
}

25.8.2 Decision Rationale

ConditionRecommendationReason
Small lists (250\leq 250)BMWBlock overhead amortized
Medium lists (1000\leq 1000)BMWTighter bounds beneficial
Large lists, few termsBMWBlock skipping effective
Large lists, many termsWANDLower per-doc overhead
Large k (100\geq 100)BMWMore pruning opportunities

25.8.3 Memory Considerations

BMW Block Overhead:

Memory=Terms×Blocks/Term×sizeof(BlockInfo)\text{Memory} = \text{Terms} \times \text{Blocks/Term} \times \text{sizeof(BlockInfo)}

For a term with 10,000 postings and block size 128:

  • Blocks: 10000/128=79\lceil 10000/128 \rceil = 79
  • Memory: 79×16=126479 \times 16 = 1264 bytes

This overhead is typically negligible compared to the posting list itself.

25.9 Integration with Query Pipeline

25.9.1 Boolean Query Integration

auto BooleanWeight::create_scorer(LeafReaderContext* ctx)
    -> ScorerType {

  // Collect term weights and scorers
  auto term_weights = collect_term_weights(must_clauses_);

  // Determine algorithm
  auto avg_posting_size = compute_avg_posting_size(term_weights);
  bool use_bmw = should_use_bmw(avg_posting_size, term_weights.size(),
                                 search_options_.top_k);

  // Create appropriate executor
  if (use_bmw) {
    return create_bmw_scorer(term_weights, ctx);
  } else {
    return create_wand_scorer(term_weights, ctx);
  }
}

25.9.2 Hybrid Search Integration

For hybrid queries combining text and vector search:

// Text component uses WAND/BMW
auto text_scorer = create_wand_or_bmw(text_weights);

// Vector component uses direct HNSW
auto vector_scorer = create_vector_scorer(vector_weight);

// Combine with disjunction
return DisjunctionScorer{{text_scorer, vector_scorer}};

The WAND/BMW optimization applies only to the text component; vector search uses HNSW's native top-k retrieval.

25.10 Correctness Verification

25.10.1 Invariants

WAND and BMW maintain critical invariants:

  1. Safety: Every document in the result could potentially have entered top-k
  2. Completeness: Every document in true top-k is in the result
  3. Ordering: Results are sorted by score descending

25.10.2 Verification Strategy

std::vector<ScoredDoc> brute_force_top_k(
    const std::vector<std::shared_ptr<Postings>>& posting_lists,
    const std::vector<std::shared_ptr<WANDScorer>>& scorers,
    size_t k) {

  // Score ALL documents
  std::unordered_map<DocID, float> scores;

  for (size_t t = 0; t < posting_lists.size(); ++t) {
    const auto& postings = posting_lists[t];
    for (size_t i = 0; i < postings->doc_ids.size(); ++i) {
      DocID doc_id = postings->doc_ids[i];
      float tf = postings->term_freqs[i];
      float norm = postings->field_norms[i];
      scores[doc_id] += scorers[t]->score(tf, norm);
    }
  }

  // Sort and take top-k
  std::vector<ScoredDoc> results;
  for (const auto& [doc_id, score] : scores) {
    results.push_back({doc_id, score});
  }

  std::sort(results.begin(), results.end());
  if (results.size() > k) {
    results.resize(k);
  }

  return results;
}

// Verify WAND == brute force
TEST(WANDCorrectness, MatchesBruteForce) {
  auto wand_results = wand_executor.execute();
  auto brute_results = brute_force_top_k(...);

  ASSERT_EQ(wand_results.size(), brute_results.size());
  for (size_t i = 0; i < wand_results.size(); ++i) {
    EXPECT_EQ(wand_results[i].doc_id, brute_results[i].doc_id);
    EXPECT_FLOAT_EQ(wand_results[i].score, brute_results[i].score);
  }
}

25.11 Bayesian BM25 Upper Bounds

25.11.1 Modified Upper Bounds for Probabilistic Scoring

Standard BM25 upper bounds (Section 25.3) do not directly transfer to Bayesian BM25 (Chapter 20) because the composite prior is document-dependent. A document with a lower BM25 score but higher prior can achieve a higher posterior probability.

The safe Bayesian WAND upper bound accounts for the worst case across both dimensions:

UBBayes(t)=Lmax(t)pmaxLmax(t)pmax+(1Lmax(t))(1pmax)\text{UB}_{\text{Bayes}}(t) = \frac{L_{\max}(t) \cdot p_{\max}}{L_{\max}(t) \cdot p_{\max} + (1 - L_{\max}(t)) \cdot (1 - p_{\max})}

where Lmax(t)=σ(α(UBBM25(t)β))L_{\max}(t) = \sigma(\alpha \cdot (\text{UB}_{\text{BM25}}(t) - \beta)) is the maximum likelihood for term tt and pmax=0.9p_{\max} = 0.9 is the global prior upper bound.

Since pmaxp_{\max} is a fixed constant, this bound can be precomputed at index time alongside the standard BM25 upper bound, incurring no additional runtime cost.

25.11.2 Block-Max Bayesian Bounds

For BMW, the block-max upper bound applies the posterior formula with the block-local maximum BM25 score:

BlockMaxBayes(t,j)=Lj(t)pmaxLj(t)pmax+(1Lj(t))(1pmax)\text{BlockMax}_{\text{Bayes}}(t, j) = \frac{L_j(t) \cdot p_{\max}}{L_j(t) \cdot p_{\max} + (1 - L_j(t)) \cdot (1 - p_{\max})}

Since BlockMaxBM25(t,j)UBBM25(t)\text{BlockMax}_{\text{BM25}}(t, j) \leq \text{UB}_{\text{BM25}}(t) for all blocks, Bayesian BMW pruning remains safe and exact while achieving higher skip rates than Bayesian WAND.

25.12 WAND/BMW as Exact Neural Pruning

25.12.1 The Neural Interpretation

Chapter 23 reveals that the multi-signal hybrid search computation has the structure of a feedforward neural network. Under this interpretation, WAND and BMW are exact neural pruning algorithms — they skip neurons (documents) that provably cannot affect the top-k output, with formal safety guarantees.

The correspondence:

IR ConceptNeural Concept
DocumentInput sample
Term scoreNeuron activation
Upper boundMaximum possible activation
ThresholdCurrent minimum output
WAND skipExact neuron pruning

25.12.2 Safety Requires Bounded Activations

Theorem (Necessary Conditions for Exact Pruning). Exact safe pruning requires:

  1. Boundedness: The activation function must have a finite upper bound, so maximum possible contributions can be precomputed
  2. Monotonicity: The activation must be monotonically increasing, so upper bounds on inputs translate to upper bounds on outputs

Proof sketch. If the activation function is unbounded (like ReLU: max(0,x)\max(0, x) \to \infty), no finite upper bound exists for any input, making it impossible to guarantee that a pruned document could not have entered the top-k. With the sigmoid activation (bounded by 1), the maximum contribution of any term is:

σ(α(UBBM25(t)β))<1\sigma(\alpha \cdot (\text{UB}_{\text{BM25}}(t) - \beta)) < 1

This bound is finite, tight, and computable at index time. \square

This result explains why WAND/BMW are fundamentally compatible with Bayesian BM25 (sigmoid activation) but incompatible with unbounded scoring functions. The boundedness property is a direct consequence of the sigmoid's probabilistic origin — relevance probabilities live in [0,1][0, 1] by definition.

25.12.3 Skip Rate Analysis Under Bayesian Scoring

Bayesian BM25 scoring changes the skip rate characteristics:

Query TypeBM25 Skip RateBayesian BM25 Skip Rate
Rare terms (IDF > 5)90-99%85-98%
Common terms (IDF < 2)10-30%8-25%
Mixed queries50-80%45-75%

The slight reduction in skip rates is due to the document-dependent prior: the global pmax=0.9p_{\max} = 0.9 is looser than the true per-document prior, making the upper bound less tight. However, the difference is typically small (3-5 percentage points) because the prior range [0.1,0.9][0.1, 0.9] is bounded and the worst-case prior is rarely achieved.

25.13 Summary

This chapter explored query evaluation strategies for efficient top-k retrieval:

  1. WAND algorithm: Uses term-level upper bounds to identify pivot documents; skips documents whose cumulative upper bound cannot reach the threshold

  2. Block-Max WAND: Partitions posting lists into blocks with precomputed maximum scores; enables finer-grained skipping with block-level upper bounds

  3. Upper bound computation: BM25's convergence to weight=boostIDF\text{weight} = \text{boost} \cdot \text{IDF} provides tight, easily computed upper bounds

  4. Dynamic threshold: Min-heap collector maintains the kk-th highest score; threshold rises as better documents are found

  5. Block caching: Three-tier lookup (cached block, next block, binary search) achieves 99%+ cache hit rate

  6. Algorithm selection: Data-driven heuristics choose WAND or BMW based on posting list size, term count, and result size

  7. Bayesian upper bounds: Modified bounds account for document-dependent priors in Bayesian BM25; precomputable at index time with no runtime overhead

  8. Exact neural pruning: WAND/BMW constitute provably exact pruning for the sigmoid-based neural structure derived in Chapter 23; safety requires bounded activations, which the sigmoid provides and ReLU does not

  9. Performance: Skip rates of 70-95% reduce scoring work dramatically; BMW provides 10-73% speedup on typical workloads

These optimizations transform top-k retrieval from an operation proportional to total posting list size to one proportional to the documents actually scored — typically orders of magnitude smaller. Combined with the hybrid search architecture of Chapter 24, Cognica provides efficient, high-quality retrieval across both lexical and semantic signals.

Copyright (c) 2023-2026 Cognica, Inc.