Chapter 24: Hybrid Search Architecture

Introduction

Modern information retrieval demands both precision and semantic understanding. Lexical search excels at exact matching—a query for "PostgreSQL wire protocol" retrieves documents containing those exact terms. Semantic search excels at conceptual matching—a query about "database communication standards" retrieves relevant documents even without term overlap. Neither approach alone satisfies all user needs.

Hybrid search combines these complementary paradigms, fusing lexical relevance signals with semantic similarity scores. This chapter explores Cognica's hybrid search architecture: the theoretical foundations of multi-signal fusion, the engineering challenges of combining scores from different distributions, and the implementation patterns that enable efficient unified ranking.

24.1 The Hybrid Search Problem

24.1.1 Complementary Search Paradigms

Lexical Search (BM25) leverages term frequency statistics:

  • Matches exact terms and variants (through stemming)
  • Respects term rarity through IDF weighting
  • Output range: [0,+)[0, +\infty) (unbounded positive reals)
  • Strengths: Precision, interpretability, no training required

Semantic Search (Vector) leverages learned embeddings:

  • Matches conceptual similarity regardless of term overlap
  • Captures synonymy, paraphrase, and semantic relationships
  • Output range: [0,1][0, 1] after similarity conversion
  • Strengths: Recall, semantic understanding, cross-lingual capability

24.1.2 The Score Fusion Challenge

Combining these signals poses fundamental challenges:

  1. Incompatible ranges: BM25 scores are unbounded; vector similarities are bounded
  2. Different distributions: BM25 follows a power-law; similarities may be approximately normal
  3. Semantic mismatch: High BM25 does not imply high semantic relevance and vice versa

A naive approach of simply summing scores favors whichever signal has larger magnitude, producing arbitrary rankings.

24.1.3 Fusion Strategies

Three principal approaches address score fusion:

Rank Fusion (RRF): Combines rankings rather than scores:

RRF(d)=rR1k+rankr(d)\text{RRF}(d) = \sum_{r \in R} \frac{1}{k + \text{rank}_r(d)}

where kk is a smoothing constant (typically 60). Robust but discards score magnitude information.

Linear Combination: Weighted sum after normalization:

score(d)=αnorm(BM25(d))+(1α)sim(d)\text{score}(d) = \alpha \cdot \text{norm}(\text{BM25}(d)) + (1-\alpha) \cdot \text{sim}(d)

Requires careful tuning of α\alpha and normalization strategy.

Probabilistic Fusion: Interprets scores as probabilities and combines using probability theory:

P(Rd)=P(Rtext)P(Rvector)P(R|d) = P(R|\text{text}) \cdot P(R|\text{vector})

Cognica implements probabilistic fusion with Bayesian calibration, enabling principled combination of heterogeneous signals.

24.2 Query Composition Architecture

24.2.1 Query Type Hierarchy

All query types derive from a common interface:

class Query {
public:
  virtual auto rewrite() -> std::shared_ptr<Query> = 0;
  virtual auto create_weight(Transaction* txn,
                             const IndexSearcher* searcher) const
      -> std::shared_ptr<Weight> = 0;
  virtual void visit(QueryVisitor* visitor) const = 0;
  virtual auto transform(QueryTransformer* transformer)
      -> std::shared_ptr<Query> = 0;
};

Concrete implementations include:

Query TypePurposeScore Source
TermQuerySingle term matchBM25
PhraseQueryExact phrase matchBM25
DenseVectorQueryVector similarityHNSW distance
BooleanQueryCompound queriesScorer composition
FixedScoreQueryConstant score filterFixed value

24.2.2 Boolean Query Structure

The BooleanQuery class enables arbitrary query composition:

class BooleanQuery final : public Query {
public:
  enum class Occur {
    kMust,     // Required (AND semantics)
    kShould,   // Optional (OR semantics)
    kMustNot,  // Excluded (NOT semantics)
    kFilter    // Required but not scored
  };

private:
  int32_t min_should_match_;
  std::vector<BooleanClause> clauses_;
  std::unordered_map<Occur, std::vector<std::shared_ptr<Query>>>
      occur_query_map_;
};

struct BooleanClause {
  std::shared_ptr<Query> query;
  Occur occur;
};

Hybrid Query Example:

A query combining text and vector search:

auto hybrid_query = BooleanQuery::Builder{}
    .add(TermQuery("machine"), Occur::kMust)
    .add(TermQuery("learning"), Occur::kMust)
    .add(DenseVectorQuery(embedding), Occur::kShould)
    .set_min_should_match(0)
    .build();

This requires documents to contain both "machine" and "learning" while optionally boosting those with similar embeddings.

Loading diagram...

24.2.3 Weight and Scorer Creation

Query execution follows a two-phase pattern:

Phase 1: Weight Creation

Each query creates a Weight object containing precomputed statistics:

auto DenseVectorQuery::create_weight(Transaction* txn,
                                     const IndexSearcher* searcher) const
    -> std::shared_ptr<Weight> {

  const auto& context = searcher->get_context();
  const auto* hnsw_index = context->get_hnsw_index();

  // Search parameters
  auto top_k = static_cast<int32_t>(
      options_->top_k.value_or(100));
  auto ef_search = get_field_option_value_<int32_t>("ef_search");

  // Execute HNSW search
  auto result = hnsw_index->search(term_, top_k, ef_search);

  // Sort by doc_id for merge compatibility
  ranges::sort(
      ranges::views::zip(result.doc_ids, result.distances),
      std::less<>{},
      [](const auto& row) { return std::get<0>(row); });

  return std::make_shared<DenseVectorWeight>(
      std::move(result.doc_ids),
      std::move(result.distances));
}

Phase 2: Scorer Creation

Weights create Scorer objects for actual scoring:

auto DenseVectorWeight::create_scorer(LeafReaderContext* ctx)
    -> ScorerType {
  return DenseVectorScorer{doc_ids_, distances_};
}

24.3 Scorer Composition

24.3.1 Type-Erased Scorer Interface

Cognica uses type erasure for zero-overhead polymorphism:

using ScorerType = te::poly<Scorer>;

class Scorer {
public:
  auto doc_id() const -> DocID;
  auto score() const -> std::optional<float>;
  auto iterator() const -> std::shared_ptr<DocIDSetIterator>;
  auto is_probabilistic() const -> bool;
};

The te::poly template provides:

  • Value semantics (copyable, movable)
  • 32-byte inline storage (avoids heap allocation)
  • No virtual function dispatch overhead
  • Compile-time concept checking

24.3.2 Conjunction Scorer (AND)

The ConjunctionScorer requires all child scorers to match. In probabilistic mode, it uses the log-odds conjunction framework (Chapter 23) to avoid the conjunction shrinkage problem where naive probability multiplication drives scores toward zero:

class ConjunctionScorer final {
public:
  auto score() const -> std::optional<float> {
    // Non-probabilistic mode: sum scores
    if (!is_probabilistic_) {
      auto total = 0.f;
      for (const auto& scorer : scorers_) {
        if (auto s = scorer.score(); s.has_value()) {
          total += s.value();
        }
      }
      return total;
    }

    // Probabilistic mode: log-odds conjunction
    auto n = static_cast<float>(scorers_.size());
    auto logit_sum = 0.0f;
    for (const auto& scorer : scorers_) {
      if (auto s = scorer.score(); s.has_value()) {
        auto prob = std::clamp(s.value(), 1e-7f, 1.0f - 1e-7f);
        logit_sum += std::log(prob / (1.0f - prob));  // logit
      }
    }

    // sqrt(n) confidence scaling (alpha = 0.5)
    auto adjusted = logit_sum / std::sqrt(n);
    return 1.0f / (1.0f + std::exp(-adjusted));  // sigmoid
  }

private:
  std::vector<ScorerType> scorers_;
  bool is_probabilistic_;
};

Non-Probabilistic Mode: Simple sum of scores:

scoreAND=i=1nsi\text{score}_{\text{AND}} = \sum_{i=1}^{n} s_i

Probabilistic Mode: Log-odds conjunction with n\sqrt{n} confidence scaling:

Pfinal=σ ⁣(1ni=1nlogit(Pi))P_{\text{final}} = \sigma\!\left(\frac{1}{\sqrt{n}} \sum_{i=1}^{n} \text{logit}(P_i)\right)

where logit(p)=lnp1p\text{logit}(p) = \ln\frac{p}{1-p} and σ\sigma is the sigmoid function.

This framework aggregates evidence in log-odds space — the natural parameter space of Bernoulli random variables — where Bayesian updates are additive. The n\sqrt{n} scaling prevents the combined score from collapsing toward 0 or 1 as the number of signals grows (see Chapter 23 for the full derivation).

The naive product rule P=iPiP = \prod_i P_i is a special case that arises when α=1\alpha = 1 (no confidence scaling), but it suffers from exponential shrinkage: even when all signals strongly indicate relevance (e.g., Pi=0.9P_i = 0.9), the product 0.95=0.590.9^5 = 0.59 suggests weak relevance — a semantic mismatch between evidence accumulation and joint satisfaction.

24.3.3 Disjunction Scorer (OR)

The DisjunctionScorer matches documents with any child scorer:

class DisjunctionScorer final {
public:
  auto score() const -> std::optional<float> {
    // Non-probabilistic: sum of matching scorers
    if (!is_probabilistic_) {
      auto total = 0.f;
      for (const auto& scorer : scorers_) {
        if (scorer.doc_id() == current_doc_id_) {
          if (auto s = scorer.score(); s.has_value()) {
            total += s.value();
          }
        }
      }
      return total;
    }

    // Probabilistic: log-odds mean (arithmetic mean of logits)
    auto sum_logit = 0.0f;
    auto n = 0;
    for (const auto& scorer : scorers_) {
      if (scorer.doc_id() == current_doc_id_) {
        if (auto s = scorer.score(); s.has_value()) {
          auto prob = std::clamp(s.value(), 1e-7f, 1.0f - 1e-7f);
          sum_logit += std::log(prob / (1.0f - prob));
          ++n;
        }
      }
    }
    if (n == 0) return std::nullopt;
    auto mean_log_odds = sum_logit / static_cast<float>(n);
    return 1.0f / (1.0f + std::exp(-mean_log_odds));
  }

private:
  std::vector<ScorerType> scorers_;
  DocID current_doc_id_;
  bool is_probabilistic_;
};

Probabilistic Disjunction uses the log-odds mean — the arithmetic mean of logits:

POR=σ ⁣(1ni=1nlogit(Pi))P_{\text{OR}} = \sigma\!\left(\frac{1}{n} \sum_{i=1}^{n} \text{logit}(P_i)\right)

The Noisy-OR formula P=1(1Pi)P = 1 - \prod(1 - P_i) saturates toward 1.0 when individual probabilities are high, destroying discrimination among results. The log-odds mean prevents this by normalizing fully by nn, keeping the output in a discriminative range regardless of how many scorers contribute. When n=1n = 1, the identity property holds: σ(logit(P1))=P1\sigma(\text{logit}(P_1)) = P_1.

Note that the disjunction scorer uses full normalization by nn (arithmetic mean), while the conjunction scorer (Section 24.3.2) uses n\sqrt{n} confidence scaling. The difference reflects their semantic roles: conjunction amplifies agreement among required signals, while disjunction averages the evidence from whichever signals match.

24.3.4 Scorer Mode Selection

The is_probabilistic() method determines combination mode:

auto ConjunctionScorer::is_probabilistic() const -> bool {
  // All scorers must be probabilistic
  return std::all_of(scorers_.begin(), scorers_.end(),
      [](const auto& s) { return s.is_probabilistic(); });
}

auto DisjunctionScorer::is_probabilistic() const -> bool {
  // All scorers must be probabilistic
  return std::all_of(scorers_.begin(), scorers_.end(),
      [](const auto& s) { return s.is_probabilistic(); });
}

This enables automatic mode selection: when all component scorers produce calibrated probabilities, the composite uses probabilistic combination; otherwise, it falls back to score summation.

24.3.5 Required-Optional Scorer (SHOULD)

The ReqOptScorer handles Boolean SHOULD clauses — a required scorer combined with an optional scorer that boosts the score when it matches. In probabilistic mode, it uses the log-odds mean with n\sqrt{n} confidence scaling:

auto ReqOptScorer::score_prob_() const -> std::optional<float> {
  auto req_prob = std::clamp(req_score.value(), 1e-7f, 1.0f - 1e-7f);
  auto sum_logit = std::log(req_prob / (1.0f - req_prob));
  auto n = 1;

  // If optional scorer matches current document, add its evidence
  if (opt_scorer_doc == curr_doc) {
    auto opt_prob = std::clamp(opt_score.value(), 1e-7f, 1.0f - 1e-7f);
    sum_logit += std::log(opt_prob / (1.0f - opt_prob));
    ++n;
  }

  auto adjusted_log_odds = sum_logit / std::sqrt(static_cast<float>(n));
  return 1.0f / (1.0f + std::exp(-adjusted_log_odds));
}

When the optional scorer does not match, n=1n = 1 and 1=1\sqrt{1} = 1, so the required score is returned unchanged (identity property). When it does match, both signals combine via the log-odds mean with 2\sqrt{2} confidence scaling.

24.3.6 Required-Excluded Scorer (NOT)

The ReqExclScorer handles Boolean NOT (MUST_NOT) clauses and supports two modes:

Hard NOT (non-probabilistic): Documents matching the excluded clause are completely removed from iteration via RelativeComplementDISI. This is the traditional Boolean NOT.

Soft NOT (probabilistic): Documents matching the excluded clause remain in iteration but receive a penalty via logit subtraction. In log-odds space, NOT is sign negation: logit(1P)=logit(P)\text{logit}(1 - P) = -\text{logit}(P). The combined evidence is:

Psoft NOT=σ ⁣(logit(Preq)logit(Pexcl)2)P_{\text{soft NOT}} = \sigma\!\left(\frac{\text{logit}(P_{\text{req}}) - \text{logit}(P_{\text{excl}})}{\sqrt{2}}\right)

When the excluded scorer does not match a document, the required score is returned unchanged. Soft NOT mode is used automatically when the scorers are probabilistic, routing through the full scorer rather than just the iterator.

24.3.7 Boost Scorer

The BoostScorer applies a multiplicative boost to a child scorer. In probabilistic mode, it operates in log-odds space to preserve the [0,1][0, 1] probability range:

auto score() const -> std::optional<float> {
  if (!scorer_->is_probabilistic()) {
    return scorer_->score().transform([boost](auto s) { return s * boost; });
  }

  // Log-odds space: sigmoid(boost * logit(P))
  return scorer_->score().transform([boost](auto score) -> float {
    auto prob = std::clamp(score, 1e-7f, 1.0f - 1e-7f);
    auto logit = std::log(prob / (1.0f - prob));
    return 1.0f / (1.0f + std::exp(-boost * logit));
  });
}

The transformation σ(boostlogit(P))\sigma(\text{boost} \cdot \text{logit}(P)) exponentiates the odds ratio: oddsnew=oddsboost\text{odds}_{\text{new}} = \text{odds}^{\text{boost}}. Key properties:

  • Output is always in (0,1)(0, 1) for any boost >0> 0
  • P=0.5P = 0.5 is a fixed point (neutral evidence stays neutral)
  • boost >1> 1 pushes PP away from 0.5 (amplifies confidence)
  • boost <1< 1 pushes PP toward 0.5 (dampens confidence)

Multiplying a probability directly by a boost >1> 1 can push it above 1.0, which then gets clamped and destroys score discrimination. The log-odds space approach avoids this entirely.

24.3.8 Vector Score Calibration

The DenseVectorScorer maps cosine similarity to a calibrated probability using logistic (Platt) scaling:

P=σ(κs)=11+eκsP = \sigma(\kappa \cdot s) = \frac{1}{1 + e^{-\kappa \cdot s}}

where s=1ds = 1 - d is the cosine similarity (1 minus cosine distance) and κ=2\kappa = 2 is the Fisher-matched scaling constant. The midpoint P=0.5P = 0.5 at s=0s = 0 preserves the neutral-prior property for orthogonal vectors.

The linear mapping P=(1+s)/2P = (1 + s) / 2 produces probabilities in the range 0.925--0.975 for typical vector search results (cosine distances 0.05--0.15), causing score saturation when combined with other probabilistic signals. The sigmoid with κ=2\kappa = 2 compresses extreme similarities toward moderate probabilities while preserving the same slope at s=0s = 0 as the linear mapping — preventing saturation without sacrificing discriminative power near the midpoint.

24.4 Score Normalization

24.4.1 The Normalization Problem

BM25 and vector scores require normalization before combination:

Score TypeRangeDistribution
BM25[0,+)[0, +\infty)Power-law tail
Vector L2[0,2][0, 2]Dataset-dependent
Vector IP[1,1][-1, 1]Approximately normal

24.4.2 Softmax Normalization

Converts scores to a probability distribution:

auto compute_score_prob_softmax(const std::vector<float>& values,
                                float temperature = 1.f)
    -> std::vector<float> {
  auto probs = std::vector<float>{};
  probs.reserve(values.size());

  // Numerical stability: subtract max
  auto max_value = *std::max_element(values.cbegin(), values.cend());

  // Temperature-scaled exponentials
  std::transform(values.cbegin(), values.cend(),
                 std::back_inserter(probs),
                 [max_value, temperature](auto value) {
                   return std::exp((value - max_value) / temperature);
                 });

  // Normalize to sum to 1
  auto sum = std::accumulate(probs.cbegin(), probs.cend(), 0.f);
  if (sum > 0.f) {
    std::transform(probs.cbegin(), probs.cend(), probs.begin(),
                   [sum](auto p) { return p / sum; });
  }

  return probs;
}

Softmax Formula:

pi=exp((sismax)/T)j=1nexp((sjsmax)/T)p_i = \frac{\exp((s_i - s_{\max}) / T)}{\sum_{j=1}^{n} \exp((s_j - s_{\max}) / T)}

Temperature Effects:

Temperature TTDistributionUse Case
T0T \to 0One-hotHard selection
T=1T = 1StandardBalanced
T>1T > 1SmoothedUncertainty modeling

24.4.3 Min-Max Normalization

Linear scaling to a target range:

auto compute_min_max_norm(const std::vector<float>& values,
                          float target_min = 0.f,
                          float target_max = 1.f)
    -> std::vector<float> {
  auto [min_it, max_it] = std::minmax_element(
      values.begin(), values.end());

  auto min_val = *min_it;
  auto max_val = *max_it;
  auto range = max_val - min_val;

  if (range < 1e-10f) {
    // Degenerate case: all values equal
    auto mid = (target_min + target_max) / 2.f;
    return std::vector<float>(values.size(), mid);
  }

  auto target_range = target_max - target_min;
  auto normalized = std::vector<float>{};
  normalized.reserve(values.size());

  std::transform(values.cbegin(), values.cend(),
                 std::back_inserter(normalized),
                 [=](auto v) {
                   return (v - min_val) / range * target_range + target_min;
                 });

  return normalized;
}

Formula:

norm(s)=ssminsmaxsmin(tmaxtmin)+tmin\text{norm}(s) = \frac{s - s_{\min}}{s_{\max} - s_{\min}} \cdot (t_{\max} - t_{\min}) + t_{\min}

24.4.4 Sigmoid Transform

Maps unbounded scores to (0,1)(0, 1):

auto compute_sigmoid_transform(const std::vector<float>& values)
    -> std::vector<float> {
  auto transformed = std::vector<float>{};
  transformed.reserve(values.size());

  std::transform(values.cbegin(), values.cend(),
                 std::back_inserter(transformed),
                 [](auto value) {
                   return 1.f / (1.f + std::exp(-value));
                 });

  return transformed;
}

Formula:

σ(s)=11+es\sigma(s) = \frac{1}{1 + e^{-s}}

24.5 Unified Probabilistic Fusion

24.5.1 The Calibration Foundation

Hybrid search in Cognica builds on three calibration layers, each detailed in its own chapter:

  1. Bayesian BM25 (Chapter 20): Transforms unbounded BM25 scores into calibrated probabilities via a sigmoid likelihood model and three-term posterior decomposition
  2. Vector Score Calibration (Chapter 22): Transforms vector similarity scores into relevance probabilities via likelihood ratios over index-derived distance distributions
  3. Log-Odds Conjunction (Chapter 23): Combines calibrated probabilities through additive aggregation in log-odds space, resolving the conjunction shrinkage problem

24.5.2 The Complete Log-Odds Decomposition

When both lexical and semantic signals are available, the unified posterior for a document is:

logitP(Rsbm25,dvec)=logf^R(dvec)fG(dvec)vector evidence+α(sbm25β)lexical evidence+logitPbasecorpus prior\text{logit}\,P(R \mid s_{\text{bm25}}, d_{\text{vec}}) = \underbrace{\log \frac{\hat{f}_R(d_{\text{vec}})}{f_G(d_{\text{vec}})}}_{\text{vector evidence}} + \underbrace{\alpha(s_{\text{bm25}} - \beta)}_{\text{lexical evidence}} + \underbrace{\text{logit}\,P_{\text{base}}}_{\text{corpus prior}}

Each term contributes independent Bayesian evidence in log-odds space. This additive structure means:

  • Adding a new signal type (e.g., a click-through model) requires only computing its own calibrated evidence term
  • No retuning of existing signal weights is needed
  • The computation has the structure of a feedforward neural network (Chapter 23)

24.5.3 Signal Integration Pipeline

Loading diagram...

24.5.4 Comparison with Alternative Fusion Methods

MethodPreserves ScoresPrincipledExtensibleRequires Tuning
RRFNo (rank-based)No (k=60k = 60 arbitrary)LimitedNo
Linear combinationYesNo (ad-hoc α\alpha)LimitedYes
Log-odds fusionYesYes (Bayesian)Yes (additive)No

The log-odds fusion approach is the only method that preserves score magnitude information, has a principled Bayesian derivation, and extends naturally to additional signals without parameter retuning.

24.6 Result Ranking and Merging

24.6.1 Top-K Collection

The TopKCollector maintains the kk highest-scoring documents using a min-heap:

struct ScoredDoc {
  DocID doc_id;
  float score;

  // Min-heap comparator: lowest score at top for efficient eviction
  bool operator>(const ScoredDoc& other) const {
    if (score != other.score) {
      return score > other.score;
    }
    // Tie-breaking: prefer lower doc_id
    return doc_id < other.doc_id;
  }
};

class TopKCollector {
public:
  bool try_insert(DocID doc_id, float score) {
    if (heap_.size() < k_) {
      heap_.push({doc_id, score});
      return true;
    }

    const auto& worst = heap_.top();
    if (score > worst.score ||
        (score == worst.score && doc_id < worst.doc_id)) {
      heap_.pop();
      heap_.push({doc_id, score});
      return true;
    }
    return false;
  }

  auto get_threshold() const -> float {
    return heap_.size() < k_ ? min_score_ : heap_.top().score;
  }

private:
  size_t k_;
  float min_score_;
  std::priority_queue<ScoredDoc, std::vector<ScoredDoc>,
                      std::greater<ScoredDoc>> heap_;
};
Loading diagram...

24.6.2 Tie-Breaking Strategy

Deterministic tie-breaking ensures reproducible results:

bool operator<(const ScoredDoc& other) const {
  if (score != other.score) {
    return score > other.score;  // Descending by score
  }
  return doc_id < other.doc_id;  // Ascending by doc_id for ties
}

Properties:

  • Deterministic: Same query always produces same ordering
  • Stable: Documents with equal scores maintain consistent relative order
  • Efficient: Single comparison chain, no secondary sorting

24.6.3 Dynamic Threshold for Pruning

The collector provides a dynamic threshold for WAND optimization:

auto get_threshold() const -> float {
  if (heap_.size() < k_) {
    return min_score_;  // Accept any document above minimum
  }
  return heap_.top().score;  // k-th highest score
}

This threshold enables early termination: documents whose upper-bound score cannot exceed the threshold are skipped without full scoring.

24.7 Hybrid Search Execution

24.7.1 Execution Pipeline

Loading diagram...

24.7.2 Step-by-Step Execution

Step 1: Query Parsing

// Input: "machine learning" AND vector_similar(embedding)
auto query = BooleanQuery::Builder{}
    .add(TermQuery("machine"), Occur::kMust)
    .add(TermQuery("learning"), Occur::kMust)
    .add(DenseVectorQuery(embedding), Occur::kShould)
    .build();

Step 2: Weight Creation

// TermQuery weights lookup postings and compute IDF
auto term_weight = term_query->create_weight(txn, searcher);
// term_weight contains: postings list, IDF, collection stats

// DenseVectorQuery weight executes HNSW search
auto vector_weight = vector_query->create_weight(txn, searcher);
// vector_weight contains: doc_ids, distances, probabilities

Step 3: Scorer Creation

// Create leaf scorers
auto term_scorer = term_weight->create_scorer(leaf_ctx);
// term_scorer: BM25SimScorer or BayesianBM25SimScorer

auto vector_scorer = vector_weight->create_scorer(leaf_ctx);
// vector_scorer: returns sigmoid(kappa * (1 - distance)) for matching docs

Step 4: Scorer Composition

// MUST clauses: conjunction
auto must_scorers = std::vector<ScorerType>{};
must_scorers.push_back(std::move(term_scorer_machine));
must_scorers.push_back(std::move(term_scorer_learning));
auto conjunction = ConjunctionScorer{std::move(must_scorers)};

// Combine with SHOULD clauses: disjunction
auto all_scorers = std::vector<ScorerType>{};
all_scorers.push_back(std::move(conjunction));
all_scorers.push_back(std::move(vector_scorer));
auto final_scorer = DisjunctionScorer{std::move(all_scorers)};

Step 5: Result Collection

auto collector = TopKCollector{k, min_score};

while (final_scorer.advance() != kNoMoreDocs) {
  auto doc_id = final_scorer.doc_id();
  if (auto score = final_scorer.score(); score.has_value()) {
    collector.try_insert(doc_id, score.value());
  }
}

auto results = collector.extract_sorted();

24.7.3 Score Combination Example

Consider a document matching both text and vector queries:

ComponentRaw ScoreProbabilistic?Calibrated P
TermScorer("machine")3.2 (BM25)Yes (Bayesian)0.78
TermScorer("learning")2.8 (BM25)Yes (Bayesian)0.72
DenseVectorScorer0.85 (similarity)Yes0.81

Conjunction (MUST terms) — log-odds mean with n\sqrt{n} scaling:

logit(0.78)=1.266,logit(0.72)=0.944\text{logit}(0.78) = 1.266, \quad \text{logit}(0.72) = 0.944 Pmust=σ ⁣(1.266+0.9442)=σ(1.563)=0.827P_{\text{must}} = \sigma\!\left(\frac{1.266 + 0.944}{\sqrt{2}}\right) = \sigma(1.563) = 0.827

ReqOpt (MUST + SHOULD vector) — log-odds mean with n\sqrt{n} scaling:

logit(0.827)=1.563,logit(0.81)=1.448\text{logit}(0.827) = 1.563, \quad \text{logit}(0.81) = 1.448 Pfinal=σ ⁣(1.563+1.4482)=σ(2.129)=0.894P_{\text{final}} = \sigma\!\left(\frac{1.563 + 1.448}{\sqrt{2}}\right) = \sigma(2.129) = 0.894

Compare this with the naive product rule, which would give P=0.78×0.72×0.81=0.455P = 0.78 \times 0.72 \times 0.81 = 0.455 — suggesting weak relevance despite all three signals strongly indicating relevance.

24.8 Configuration and Options

24.8.1 Search Options

struct SearchOptions {
  std::optional<int64_t> top_k;           // Result count
  float min_score = 0.f;                   // Score threshold
  std::unordered_map<std::string, float>
      field_boosts;                        // Per-field weights
  std::unordered_map<std::string,
      std::vector<FieldOption>> field_options;  // Field-specific
  bool use_wand = true;                    // WAND optimization
};

24.8.2 Field-Specific Options

struct FieldOption {
  std::string name;        // Option name
  FieldOptionValue value;  // Variant type
};

// For dense vector queries:
auto ef_search = get_field_option_value_<int32_t>("ef_search");
auto top_k = get_field_option_value_<int32_t>("top_k");
auto random_seed = get_field_option_value_<int32_t>("random_seed");

24.8.3 Similarity Selection

class SimilarityFactory {
public:
  static auto create(SimilarityType type) -> SimilarityType {
    switch (type) {
      case SimilarityType::kBM25:
        return BM25Similarity{};
      case SimilarityType::kBayesianBM25:
        return BayesianBM25Similarity{};
      case SimilarityType::kConstant:
        return ConstantSimilarity{};
    }
  }
};

24.9 Numerical Stability Patterns

24.9.1 Log-Odds Clamping

All probabilistic scorers clamp probabilities to [ϵ,1ϵ][\epsilon, 1 - \epsilon] where ϵ=107\epsilon = 10^{-7} before computing logits. This epsilon is the smallest value where 1.0fϵ1.0\text{f} - \epsilon is representable as a float32 value distinct from 1.0f1.0\text{f}:

auto sum_logit = 0.0f;
for (const auto& scorer : scorers_) {
  if (auto s = scorer.score(); s.has_value()) {
    // 1e-7 is the smallest epsilon where (1.0f - epsilon) is
    // representable as a float distinct from 1.0f
    auto prob = std::clamp(s.value(), 1e-7f, 1.0f - 1e-7f);
    sum_logit += std::log(prob / (1.0f - prob));  // logit
  }
}

Using a smaller epsilon (e.g., 101010^{-10}) causes 1.0f1010f1.0\text{f} - 10^{-10}\text{f} to round to 1.0f1.0\text{f} due to the float32 unit of least precision (ULP) at 1.0 being approximately 5.96×1085.96 \times 10^{-8}. This makes logit(1.0)=+\text{logit}(1.0) = +\infty, which poisons the entire log-odds sum and can produce NaN or saturated scores.

24.9.2 Log-Odds Space Arithmetic

The log-odds mean framework (Sections 24.3.2--24.3.7) operates in logit space rather than probability space, which provides natural numerical stability:

  • Logit range: logit(107)16.1\text{logit}(10^{-7}) \approx -16.1 to logit(1107)16.1\text{logit}(1 - 10^{-7}) \approx 16.1
  • Sum of nn logits: [16.1n,+16.1n][-16.1n, +16.1n] (representable in float32 for practical nn)
  • Mean of logits: always within [16.1,+16.1][-16.1, +16.1] regardless of nn
  • Final sigmoid: maps the bounded logit range back to (0,1)(0, 1)

This avoids the underflow problem of log-space probability multiplication and the saturation problem of Noisy-OR.

24.9.3 Softmax Stability

Subtracting the maximum prevents overflow in exponentials:

// Unstable: exp(1000) overflows
auto max_val = *std::max_element(values.begin(), values.end());

// Stable: exp(0) = 1
std::transform(values.begin(), values.end(), ...,
    [max_val, T](auto v) {
      return std::exp((v - max_val) / T);
    });

Mathematical Equivalence:

esijesj=esismaxjesjsmax\frac{e^{s_i}}{\sum_j e^{s_j}} = \frac{e^{s_i - s_{\max}}}{\sum_j e^{s_j - s_{\max}}}

24.10 Performance Considerations

24.10.1 Query Planning

The query optimizer considers hybrid query characteristics:

Query PatternOptimization
Text-only MUSTWAND with BM25 upper bounds
Vector-onlyDirect HNSW top-k
Hybrid ANDIntersect text matches, re-rank with vector
Hybrid ORUnion with dynamic threshold

24.10.2 Early Termination

WAND enables early termination for text queries:

auto threshold = collector.get_threshold();

// Skip document if upper bound cannot exceed threshold
auto upper_bound = compute_upper_bound(doc_id);
if (upper_bound < threshold) {
  continue;  // Skip to next candidate
}

24.10.3 Vector Pre-filtering

For selective text predicates, pre-filter before vector search:

// Collect doc_ids matching text predicate
auto text_matches = execute_text_query(text_query);

// Filter HNSW results to text matches
auto vector_results = hnsw_index->search_with_filter(
    query_vector, top_k, text_matches);

24.11 Summary

This chapter explored Cognica's hybrid search architecture:

  1. Score fusion challenge: BM25 and vector scores have incompatible ranges and distributions; principled fusion requires probabilistic calibration rather than ad-hoc normalization

  2. Query composition: Boolean queries combine text and vector queries with AND/OR/NOT semantics; scorers compose hierarchically through type-erased interfaces

  3. Unified log-odds framework: All Boolean scorers — conjunction, disjunction, required-optional, required-excluded, and boost — operate in log-odds space. Conjunction uses n\sqrt{n} confidence scaling; disjunction uses arithmetic mean; soft NOT uses logit subtraction; and boost exponentiates the odds ratio. This replaces the earlier mix of Noisy-OR, conditional probability, and direct multiplication

  4. Score saturation prevention: Three layers of saturation are addressed: float32 epsilon clamping at 10710^{-7} (not 101010^{-10}), log-odds space boosting (not probability multiplication), and arithmetic mean normalization for disjunction (not n\sqrt{n})

  5. Calibrated signal sources: Bayesian BM25 (Chapter 20) uses log-compressed sigmoid scoring with neutral prior; vector calibration uses logistic (Platt) scaling with κ=2\kappa = 2. Both produce calibrated probabilities that combine additively in log-odds space

  6. Numerical stability: Log-odds space arithmetic with ϵ=107\epsilon = 10^{-7} prevents both underflow and float32 precision loss; max-subtraction prevents overflow in softmax

  7. Result ranking: Min-heap maintains top-k with dynamic threshold; deterministic tie-breaking ensures reproducibility

  8. Execution pipeline: Query parsing, weight creation, scorer composition, and result collection form a clean separation of concerns

The next chapter examines query evaluation strategies — WAND and Block-Max WAND algorithms that exploit score upper bounds to skip irrelevant documents without full scoring.

Copyright (c) 2023-2026 Cognica, Inc.