Chapter 24: Hybrid Search Architecture
Introduction
Modern information retrieval demands both precision and semantic understanding. Lexical search excels at exact matching—a query for "PostgreSQL wire protocol" retrieves documents containing those exact terms. Semantic search excels at conceptual matching—a query about "database communication standards" retrieves relevant documents even without term overlap. Neither approach alone satisfies all user needs.
Hybrid search combines these complementary paradigms, fusing lexical relevance signals with semantic similarity scores. This chapter explores Cognica's hybrid search architecture: the theoretical foundations of multi-signal fusion, the engineering challenges of combining scores from different distributions, and the implementation patterns that enable efficient unified ranking.
24.1 The Hybrid Search Problem
24.1.1 Complementary Search Paradigms
Lexical Search (BM25) leverages term frequency statistics:
- Matches exact terms and variants (through stemming)
- Respects term rarity through IDF weighting
- Output range: (unbounded positive reals)
- Strengths: Precision, interpretability, no training required
Semantic Search (Vector) leverages learned embeddings:
- Matches conceptual similarity regardless of term overlap
- Captures synonymy, paraphrase, and semantic relationships
- Output range: after similarity conversion
- Strengths: Recall, semantic understanding, cross-lingual capability
24.1.2 The Score Fusion Challenge
Combining these signals poses fundamental challenges:
- Incompatible ranges: BM25 scores are unbounded; vector similarities are bounded
- Different distributions: BM25 follows a power-law; similarities may be approximately normal
- Semantic mismatch: High BM25 does not imply high semantic relevance and vice versa
A naive approach of simply summing scores favors whichever signal has larger magnitude, producing arbitrary rankings.
24.1.3 Fusion Strategies
Three principal approaches address score fusion:
Rank Fusion (RRF): Combines rankings rather than scores:
where is a smoothing constant (typically 60). Robust but discards score magnitude information.
Linear Combination: Weighted sum after normalization:
Requires careful tuning of and normalization strategy.
Probabilistic Fusion: Interprets scores as probabilities and combines using probability theory:
Cognica implements probabilistic fusion with Bayesian calibration, enabling principled combination of heterogeneous signals.
24.2 Query Composition Architecture
24.2.1 Query Type Hierarchy
All query types derive from a common interface:
class Query {
public:
virtual auto rewrite() -> std::shared_ptr<Query> = 0;
virtual auto create_weight(Transaction* txn,
const IndexSearcher* searcher) const
-> std::shared_ptr<Weight> = 0;
virtual void visit(QueryVisitor* visitor) const = 0;
virtual auto transform(QueryTransformer* transformer)
-> std::shared_ptr<Query> = 0;
};
Concrete implementations include:
| Query Type | Purpose | Score Source |
|---|---|---|
TermQuery | Single term match | BM25 |
PhraseQuery | Exact phrase match | BM25 |
DenseVectorQuery | Vector similarity | HNSW distance |
BooleanQuery | Compound queries | Scorer composition |
FixedScoreQuery | Constant score filter | Fixed value |
24.2.2 Boolean Query Structure
The BooleanQuery class enables arbitrary query composition:
class BooleanQuery final : public Query {
public:
enum class Occur {
kMust, // Required (AND semantics)
kShould, // Optional (OR semantics)
kMustNot, // Excluded (NOT semantics)
kFilter // Required but not scored
};
private:
int32_t min_should_match_;
std::vector<BooleanClause> clauses_;
std::unordered_map<Occur, std::vector<std::shared_ptr<Query>>>
occur_query_map_;
};
struct BooleanClause {
std::shared_ptr<Query> query;
Occur occur;
};
Hybrid Query Example:
A query combining text and vector search:
auto hybrid_query = BooleanQuery::Builder{}
.add(TermQuery("machine"), Occur::kMust)
.add(TermQuery("learning"), Occur::kMust)
.add(DenseVectorQuery(embedding), Occur::kShould)
.set_min_should_match(0)
.build();
This requires documents to contain both "machine" and "learning" while optionally boosting those with similar embeddings.
24.2.3 Weight and Scorer Creation
Query execution follows a two-phase pattern:
Phase 1: Weight Creation
Each query creates a Weight object containing precomputed statistics:
auto DenseVectorQuery::create_weight(Transaction* txn,
const IndexSearcher* searcher) const
-> std::shared_ptr<Weight> {
const auto& context = searcher->get_context();
const auto* hnsw_index = context->get_hnsw_index();
// Search parameters
auto top_k = static_cast<int32_t>(
options_->top_k.value_or(100));
auto ef_search = get_field_option_value_<int32_t>("ef_search");
// Execute HNSW search
auto result = hnsw_index->search(term_, top_k, ef_search);
// Sort by doc_id for merge compatibility
ranges::sort(
ranges::views::zip(result.doc_ids, result.distances),
std::less<>{},
[](const auto& row) { return std::get<0>(row); });
return std::make_shared<DenseVectorWeight>(
std::move(result.doc_ids),
std::move(result.distances));
}
Phase 2: Scorer Creation
Weights create Scorer objects for actual scoring:
auto DenseVectorWeight::create_scorer(LeafReaderContext* ctx)
-> ScorerType {
return DenseVectorScorer{doc_ids_, distances_};
}
24.3 Scorer Composition
24.3.1 Type-Erased Scorer Interface
Cognica uses type erasure for zero-overhead polymorphism:
using ScorerType = te::poly<Scorer>;
class Scorer {
public:
auto doc_id() const -> DocID;
auto score() const -> std::optional<float>;
auto iterator() const -> std::shared_ptr<DocIDSetIterator>;
auto is_probabilistic() const -> bool;
};
The te::poly template provides:
- Value semantics (copyable, movable)
- 32-byte inline storage (avoids heap allocation)
- No virtual function dispatch overhead
- Compile-time concept checking
24.3.2 Conjunction Scorer (AND)
The ConjunctionScorer requires all child scorers to match. In probabilistic mode, it uses the log-odds conjunction framework (Chapter 23) to avoid the conjunction shrinkage problem where naive probability multiplication drives scores toward zero:
class ConjunctionScorer final {
public:
auto score() const -> std::optional<float> {
// Non-probabilistic mode: sum scores
if (!is_probabilistic_) {
auto total = 0.f;
for (const auto& scorer : scorers_) {
if (auto s = scorer.score(); s.has_value()) {
total += s.value();
}
}
return total;
}
// Probabilistic mode: log-odds conjunction
auto n = static_cast<float>(scorers_.size());
auto logit_sum = 0.0f;
for (const auto& scorer : scorers_) {
if (auto s = scorer.score(); s.has_value()) {
auto prob = std::clamp(s.value(), 1e-7f, 1.0f - 1e-7f);
logit_sum += std::log(prob / (1.0f - prob)); // logit
}
}
// sqrt(n) confidence scaling (alpha = 0.5)
auto adjusted = logit_sum / std::sqrt(n);
return 1.0f / (1.0f + std::exp(-adjusted)); // sigmoid
}
private:
std::vector<ScorerType> scorers_;
bool is_probabilistic_;
};
Non-Probabilistic Mode: Simple sum of scores:
Probabilistic Mode: Log-odds conjunction with confidence scaling:
where and is the sigmoid function.
This framework aggregates evidence in log-odds space — the natural parameter space of Bernoulli random variables — where Bayesian updates are additive. The scaling prevents the combined score from collapsing toward 0 or 1 as the number of signals grows (see Chapter 23 for the full derivation).
The naive product rule is a special case that arises when (no confidence scaling), but it suffers from exponential shrinkage: even when all signals strongly indicate relevance (e.g., ), the product suggests weak relevance — a semantic mismatch between evidence accumulation and joint satisfaction.
24.3.3 Disjunction Scorer (OR)
The DisjunctionScorer matches documents with any child scorer:
class DisjunctionScorer final {
public:
auto score() const -> std::optional<float> {
// Non-probabilistic: sum of matching scorers
if (!is_probabilistic_) {
auto total = 0.f;
for (const auto& scorer : scorers_) {
if (scorer.doc_id() == current_doc_id_) {
if (auto s = scorer.score(); s.has_value()) {
total += s.value();
}
}
}
return total;
}
// Probabilistic: log-odds mean (arithmetic mean of logits)
auto sum_logit = 0.0f;
auto n = 0;
for (const auto& scorer : scorers_) {
if (scorer.doc_id() == current_doc_id_) {
if (auto s = scorer.score(); s.has_value()) {
auto prob = std::clamp(s.value(), 1e-7f, 1.0f - 1e-7f);
sum_logit += std::log(prob / (1.0f - prob));
++n;
}
}
}
if (n == 0) return std::nullopt;
auto mean_log_odds = sum_logit / static_cast<float>(n);
return 1.0f / (1.0f + std::exp(-mean_log_odds));
}
private:
std::vector<ScorerType> scorers_;
DocID current_doc_id_;
bool is_probabilistic_;
};
Probabilistic Disjunction uses the log-odds mean — the arithmetic mean of logits:
The Noisy-OR formula saturates toward 1.0 when individual probabilities are high, destroying discrimination among results. The log-odds mean prevents this by normalizing fully by , keeping the output in a discriminative range regardless of how many scorers contribute. When , the identity property holds: .
Note that the disjunction scorer uses full normalization by (arithmetic mean), while the conjunction scorer (Section 24.3.2) uses confidence scaling. The difference reflects their semantic roles: conjunction amplifies agreement among required signals, while disjunction averages the evidence from whichever signals match.
24.3.4 Scorer Mode Selection
The is_probabilistic() method determines combination mode:
auto ConjunctionScorer::is_probabilistic() const -> bool {
// All scorers must be probabilistic
return std::all_of(scorers_.begin(), scorers_.end(),
[](const auto& s) { return s.is_probabilistic(); });
}
auto DisjunctionScorer::is_probabilistic() const -> bool {
// All scorers must be probabilistic
return std::all_of(scorers_.begin(), scorers_.end(),
[](const auto& s) { return s.is_probabilistic(); });
}
This enables automatic mode selection: when all component scorers produce calibrated probabilities, the composite uses probabilistic combination; otherwise, it falls back to score summation.
24.3.5 Required-Optional Scorer (SHOULD)
The ReqOptScorer handles Boolean SHOULD clauses — a required scorer combined with an optional scorer that boosts the score when it matches. In probabilistic mode, it uses the log-odds mean with confidence scaling:
auto ReqOptScorer::score_prob_() const -> std::optional<float> {
auto req_prob = std::clamp(req_score.value(), 1e-7f, 1.0f - 1e-7f);
auto sum_logit = std::log(req_prob / (1.0f - req_prob));
auto n = 1;
// If optional scorer matches current document, add its evidence
if (opt_scorer_doc == curr_doc) {
auto opt_prob = std::clamp(opt_score.value(), 1e-7f, 1.0f - 1e-7f);
sum_logit += std::log(opt_prob / (1.0f - opt_prob));
++n;
}
auto adjusted_log_odds = sum_logit / std::sqrt(static_cast<float>(n));
return 1.0f / (1.0f + std::exp(-adjusted_log_odds));
}
When the optional scorer does not match, and , so the required score is returned unchanged (identity property). When it does match, both signals combine via the log-odds mean with confidence scaling.
24.3.6 Required-Excluded Scorer (NOT)
The ReqExclScorer handles Boolean NOT (MUST_NOT) clauses and supports two modes:
Hard NOT (non-probabilistic): Documents matching the excluded clause are completely removed from iteration via RelativeComplementDISI. This is the traditional Boolean NOT.
Soft NOT (probabilistic): Documents matching the excluded clause remain in iteration but receive a penalty via logit subtraction. In log-odds space, NOT is sign negation: . The combined evidence is:
When the excluded scorer does not match a document, the required score is returned unchanged. Soft NOT mode is used automatically when the scorers are probabilistic, routing through the full scorer rather than just the iterator.
24.3.7 Boost Scorer
The BoostScorer applies a multiplicative boost to a child scorer. In probabilistic mode, it operates in log-odds space to preserve the probability range:
auto score() const -> std::optional<float> {
if (!scorer_->is_probabilistic()) {
return scorer_->score().transform([boost](auto s) { return s * boost; });
}
// Log-odds space: sigmoid(boost * logit(P))
return scorer_->score().transform([boost](auto score) -> float {
auto prob = std::clamp(score, 1e-7f, 1.0f - 1e-7f);
auto logit = std::log(prob / (1.0f - prob));
return 1.0f / (1.0f + std::exp(-boost * logit));
});
}
The transformation exponentiates the odds ratio: . Key properties:
- Output is always in for any boost
- is a fixed point (neutral evidence stays neutral)
- boost pushes away from 0.5 (amplifies confidence)
- boost pushes toward 0.5 (dampens confidence)
Multiplying a probability directly by a boost can push it above 1.0, which then gets clamped and destroys score discrimination. The log-odds space approach avoids this entirely.
24.3.8 Vector Score Calibration
The DenseVectorScorer maps cosine similarity to a calibrated probability using logistic (Platt) scaling:
where is the cosine similarity (1 minus cosine distance) and is the Fisher-matched scaling constant. The midpoint at preserves the neutral-prior property for orthogonal vectors.
The linear mapping produces probabilities in the range 0.925--0.975 for typical vector search results (cosine distances 0.05--0.15), causing score saturation when combined with other probabilistic signals. The sigmoid with compresses extreme similarities toward moderate probabilities while preserving the same slope at as the linear mapping — preventing saturation without sacrificing discriminative power near the midpoint.
24.4 Score Normalization
24.4.1 The Normalization Problem
BM25 and vector scores require normalization before combination:
| Score Type | Range | Distribution |
|---|---|---|
| BM25 | Power-law tail | |
| Vector L2 | Dataset-dependent | |
| Vector IP | Approximately normal |
24.4.2 Softmax Normalization
Converts scores to a probability distribution:
auto compute_score_prob_softmax(const std::vector<float>& values,
float temperature = 1.f)
-> std::vector<float> {
auto probs = std::vector<float>{};
probs.reserve(values.size());
// Numerical stability: subtract max
auto max_value = *std::max_element(values.cbegin(), values.cend());
// Temperature-scaled exponentials
std::transform(values.cbegin(), values.cend(),
std::back_inserter(probs),
[max_value, temperature](auto value) {
return std::exp((value - max_value) / temperature);
});
// Normalize to sum to 1
auto sum = std::accumulate(probs.cbegin(), probs.cend(), 0.f);
if (sum > 0.f) {
std::transform(probs.cbegin(), probs.cend(), probs.begin(),
[sum](auto p) { return p / sum; });
}
return probs;
}
Softmax Formula:
Temperature Effects:
| Temperature | Distribution | Use Case |
|---|---|---|
| One-hot | Hard selection | |
| Standard | Balanced | |
| Smoothed | Uncertainty modeling |
24.4.3 Min-Max Normalization
Linear scaling to a target range:
auto compute_min_max_norm(const std::vector<float>& values,
float target_min = 0.f,
float target_max = 1.f)
-> std::vector<float> {
auto [min_it, max_it] = std::minmax_element(
values.begin(), values.end());
auto min_val = *min_it;
auto max_val = *max_it;
auto range = max_val - min_val;
if (range < 1e-10f) {
// Degenerate case: all values equal
auto mid = (target_min + target_max) / 2.f;
return std::vector<float>(values.size(), mid);
}
auto target_range = target_max - target_min;
auto normalized = std::vector<float>{};
normalized.reserve(values.size());
std::transform(values.cbegin(), values.cend(),
std::back_inserter(normalized),
[=](auto v) {
return (v - min_val) / range * target_range + target_min;
});
return normalized;
}
Formula:
24.4.4 Sigmoid Transform
Maps unbounded scores to :
auto compute_sigmoid_transform(const std::vector<float>& values)
-> std::vector<float> {
auto transformed = std::vector<float>{};
transformed.reserve(values.size());
std::transform(values.cbegin(), values.cend(),
std::back_inserter(transformed),
[](auto value) {
return 1.f / (1.f + std::exp(-value));
});
return transformed;
}
Formula:
24.5 Unified Probabilistic Fusion
24.5.1 The Calibration Foundation
Hybrid search in Cognica builds on three calibration layers, each detailed in its own chapter:
- Bayesian BM25 (Chapter 20): Transforms unbounded BM25 scores into calibrated probabilities via a sigmoid likelihood model and three-term posterior decomposition
- Vector Score Calibration (Chapter 22): Transforms vector similarity scores into relevance probabilities via likelihood ratios over index-derived distance distributions
- Log-Odds Conjunction (Chapter 23): Combines calibrated probabilities through additive aggregation in log-odds space, resolving the conjunction shrinkage problem
24.5.2 The Complete Log-Odds Decomposition
When both lexical and semantic signals are available, the unified posterior for a document is:
Each term contributes independent Bayesian evidence in log-odds space. This additive structure means:
- Adding a new signal type (e.g., a click-through model) requires only computing its own calibrated evidence term
- No retuning of existing signal weights is needed
- The computation has the structure of a feedforward neural network (Chapter 23)
24.5.3 Signal Integration Pipeline
24.5.4 Comparison with Alternative Fusion Methods
| Method | Preserves Scores | Principled | Extensible | Requires Tuning |
|---|---|---|---|---|
| RRF | No (rank-based) | No ( arbitrary) | Limited | No |
| Linear combination | Yes | No (ad-hoc ) | Limited | Yes |
| Log-odds fusion | Yes | Yes (Bayesian) | Yes (additive) | No |
The log-odds fusion approach is the only method that preserves score magnitude information, has a principled Bayesian derivation, and extends naturally to additional signals without parameter retuning.
24.6 Result Ranking and Merging
24.6.1 Top-K Collection
The TopKCollector maintains the highest-scoring documents using a min-heap:
struct ScoredDoc {
DocID doc_id;
float score;
// Min-heap comparator: lowest score at top for efficient eviction
bool operator>(const ScoredDoc& other) const {
if (score != other.score) {
return score > other.score;
}
// Tie-breaking: prefer lower doc_id
return doc_id < other.doc_id;
}
};
class TopKCollector {
public:
bool try_insert(DocID doc_id, float score) {
if (heap_.size() < k_) {
heap_.push({doc_id, score});
return true;
}
const auto& worst = heap_.top();
if (score > worst.score ||
(score == worst.score && doc_id < worst.doc_id)) {
heap_.pop();
heap_.push({doc_id, score});
return true;
}
return false;
}
auto get_threshold() const -> float {
return heap_.size() < k_ ? min_score_ : heap_.top().score;
}
private:
size_t k_;
float min_score_;
std::priority_queue<ScoredDoc, std::vector<ScoredDoc>,
std::greater<ScoredDoc>> heap_;
};
24.6.2 Tie-Breaking Strategy
Deterministic tie-breaking ensures reproducible results:
bool operator<(const ScoredDoc& other) const {
if (score != other.score) {
return score > other.score; // Descending by score
}
return doc_id < other.doc_id; // Ascending by doc_id for ties
}
Properties:
- Deterministic: Same query always produces same ordering
- Stable: Documents with equal scores maintain consistent relative order
- Efficient: Single comparison chain, no secondary sorting
24.6.3 Dynamic Threshold for Pruning
The collector provides a dynamic threshold for WAND optimization:
auto get_threshold() const -> float {
if (heap_.size() < k_) {
return min_score_; // Accept any document above minimum
}
return heap_.top().score; // k-th highest score
}
This threshold enables early termination: documents whose upper-bound score cannot exceed the threshold are skipped without full scoring.
24.7 Hybrid Search Execution
24.7.1 Execution Pipeline
24.7.2 Step-by-Step Execution
Step 1: Query Parsing
// Input: "machine learning" AND vector_similar(embedding)
auto query = BooleanQuery::Builder{}
.add(TermQuery("machine"), Occur::kMust)
.add(TermQuery("learning"), Occur::kMust)
.add(DenseVectorQuery(embedding), Occur::kShould)
.build();
Step 2: Weight Creation
// TermQuery weights lookup postings and compute IDF
auto term_weight = term_query->create_weight(txn, searcher);
// term_weight contains: postings list, IDF, collection stats
// DenseVectorQuery weight executes HNSW search
auto vector_weight = vector_query->create_weight(txn, searcher);
// vector_weight contains: doc_ids, distances, probabilities
Step 3: Scorer Creation
// Create leaf scorers
auto term_scorer = term_weight->create_scorer(leaf_ctx);
// term_scorer: BM25SimScorer or BayesianBM25SimScorer
auto vector_scorer = vector_weight->create_scorer(leaf_ctx);
// vector_scorer: returns sigmoid(kappa * (1 - distance)) for matching docs
Step 4: Scorer Composition
// MUST clauses: conjunction
auto must_scorers = std::vector<ScorerType>{};
must_scorers.push_back(std::move(term_scorer_machine));
must_scorers.push_back(std::move(term_scorer_learning));
auto conjunction = ConjunctionScorer{std::move(must_scorers)};
// Combine with SHOULD clauses: disjunction
auto all_scorers = std::vector<ScorerType>{};
all_scorers.push_back(std::move(conjunction));
all_scorers.push_back(std::move(vector_scorer));
auto final_scorer = DisjunctionScorer{std::move(all_scorers)};
Step 5: Result Collection
auto collector = TopKCollector{k, min_score};
while (final_scorer.advance() != kNoMoreDocs) {
auto doc_id = final_scorer.doc_id();
if (auto score = final_scorer.score(); score.has_value()) {
collector.try_insert(doc_id, score.value());
}
}
auto results = collector.extract_sorted();
24.7.3 Score Combination Example
Consider a document matching both text and vector queries:
| Component | Raw Score | Probabilistic? | Calibrated P |
|---|---|---|---|
| TermScorer("machine") | 3.2 (BM25) | Yes (Bayesian) | 0.78 |
| TermScorer("learning") | 2.8 (BM25) | Yes (Bayesian) | 0.72 |
| DenseVectorScorer | 0.85 (similarity) | Yes | 0.81 |
Conjunction (MUST terms) — log-odds mean with scaling:
ReqOpt (MUST + SHOULD vector) — log-odds mean with scaling:
Compare this with the naive product rule, which would give — suggesting weak relevance despite all three signals strongly indicating relevance.
24.8 Configuration and Options
24.8.1 Search Options
struct SearchOptions {
std::optional<int64_t> top_k; // Result count
float min_score = 0.f; // Score threshold
std::unordered_map<std::string, float>
field_boosts; // Per-field weights
std::unordered_map<std::string,
std::vector<FieldOption>> field_options; // Field-specific
bool use_wand = true; // WAND optimization
};
24.8.2 Field-Specific Options
struct FieldOption {
std::string name; // Option name
FieldOptionValue value; // Variant type
};
// For dense vector queries:
auto ef_search = get_field_option_value_<int32_t>("ef_search");
auto top_k = get_field_option_value_<int32_t>("top_k");
auto random_seed = get_field_option_value_<int32_t>("random_seed");
24.8.3 Similarity Selection
class SimilarityFactory {
public:
static auto create(SimilarityType type) -> SimilarityType {
switch (type) {
case SimilarityType::kBM25:
return BM25Similarity{};
case SimilarityType::kBayesianBM25:
return BayesianBM25Similarity{};
case SimilarityType::kConstant:
return ConstantSimilarity{};
}
}
};
24.9 Numerical Stability Patterns
24.9.1 Log-Odds Clamping
All probabilistic scorers clamp probabilities to where before computing logits. This epsilon is the smallest value where is representable as a float32 value distinct from :
auto sum_logit = 0.0f;
for (const auto& scorer : scorers_) {
if (auto s = scorer.score(); s.has_value()) {
// 1e-7 is the smallest epsilon where (1.0f - epsilon) is
// representable as a float distinct from 1.0f
auto prob = std::clamp(s.value(), 1e-7f, 1.0f - 1e-7f);
sum_logit += std::log(prob / (1.0f - prob)); // logit
}
}
Using a smaller epsilon (e.g., ) causes to round to due to the float32 unit of least precision (ULP) at 1.0 being approximately . This makes , which poisons the entire log-odds sum and can produce NaN or saturated scores.
24.9.2 Log-Odds Space Arithmetic
The log-odds mean framework (Sections 24.3.2--24.3.7) operates in logit space rather than probability space, which provides natural numerical stability:
- Logit range: to
- Sum of logits: (representable in float32 for practical )
- Mean of logits: always within regardless of
- Final sigmoid: maps the bounded logit range back to
This avoids the underflow problem of log-space probability multiplication and the saturation problem of Noisy-OR.
24.9.3 Softmax Stability
Subtracting the maximum prevents overflow in exponentials:
// Unstable: exp(1000) overflows
auto max_val = *std::max_element(values.begin(), values.end());
// Stable: exp(0) = 1
std::transform(values.begin(), values.end(), ...,
[max_val, T](auto v) {
return std::exp((v - max_val) / T);
});
Mathematical Equivalence:
24.10 Performance Considerations
24.10.1 Query Planning
The query optimizer considers hybrid query characteristics:
| Query Pattern | Optimization |
|---|---|
| Text-only MUST | WAND with BM25 upper bounds |
| Vector-only | Direct HNSW top-k |
| Hybrid AND | Intersect text matches, re-rank with vector |
| Hybrid OR | Union with dynamic threshold |
24.10.2 Early Termination
WAND enables early termination for text queries:
auto threshold = collector.get_threshold();
// Skip document if upper bound cannot exceed threshold
auto upper_bound = compute_upper_bound(doc_id);
if (upper_bound < threshold) {
continue; // Skip to next candidate
}
24.10.3 Vector Pre-filtering
For selective text predicates, pre-filter before vector search:
// Collect doc_ids matching text predicate
auto text_matches = execute_text_query(text_query);
// Filter HNSW results to text matches
auto vector_results = hnsw_index->search_with_filter(
query_vector, top_k, text_matches);
24.11 Summary
This chapter explored Cognica's hybrid search architecture:
-
Score fusion challenge: BM25 and vector scores have incompatible ranges and distributions; principled fusion requires probabilistic calibration rather than ad-hoc normalization
-
Query composition: Boolean queries combine text and vector queries with AND/OR/NOT semantics; scorers compose hierarchically through type-erased interfaces
-
Unified log-odds framework: All Boolean scorers — conjunction, disjunction, required-optional, required-excluded, and boost — operate in log-odds space. Conjunction uses confidence scaling; disjunction uses arithmetic mean; soft NOT uses logit subtraction; and boost exponentiates the odds ratio. This replaces the earlier mix of Noisy-OR, conditional probability, and direct multiplication
-
Score saturation prevention: Three layers of saturation are addressed: float32 epsilon clamping at (not ), log-odds space boosting (not probability multiplication), and arithmetic mean normalization for disjunction (not )
-
Calibrated signal sources: Bayesian BM25 (Chapter 20) uses log-compressed sigmoid scoring with neutral prior; vector calibration uses logistic (Platt) scaling with . Both produce calibrated probabilities that combine additively in log-odds space
-
Numerical stability: Log-odds space arithmetic with prevents both underflow and float32 precision loss; max-subtraction prevents overflow in softmax
-
Result ranking: Min-heap maintains top-k with dynamic threshold; deterministic tie-breaking ensures reproducibility
-
Execution pipeline: Query parsing, weight creation, scorer composition, and result collection form a clean separation of concerns
The next chapter examines query evaluation strategies — WAND and Block-Max WAND algorithms that exploit score upper bounds to skip irrelevant documents without full scoring.