Chapter 21: Vector Search and HNSW Index
Introduction
The emergence of deep learning has fundamentally transformed information retrieval. Dense vector embeddings—learned representations that capture semantic meaning in continuous vector spaces—enable similarity search that transcends lexical matching. A query about "automobile repairs" can match documents discussing "car maintenance" because their vector representations lie close together in embedding space, even though they share no common terms.
This chapter explores Cognica's vector search infrastructure, built around the Hierarchical Navigable Small World (HNSW) algorithm. We examine the theoretical foundations of approximate nearest neighbor search, the mathematical properties that make HNSW effective, and the engineering decisions that enable production-scale deployment. The implementation supports multiple backends, quantization strategies, and hybrid indexing approaches, providing flexibility for diverse workloads from small embeddings to billion-scale indices.
21.1 Vector Similarity Search Foundations
21.1.1 The Curse of Dimensionality
Exact nearest neighbor search in high-dimensional spaces faces fundamental computational barriers. For vectors in dimensions, naive linear scan requires distance computations per query. Tree-based methods like KD-trees degrade to linear scan when due to the curse of dimensionality—the phenomenon where high-dimensional spaces become increasingly sparse, rendering spatial partitioning ineffective.
The curse manifests mathematically in the concentration of distances. For uniformly distributed points in a -dimensional hypercube, the ratio of maximum to minimum distance converges to 1 as increases:
This concentration renders distance-based discrimination meaningless for exact methods in high dimensions.
21.1.2 Approximate Nearest Neighbor Search
Approximate Nearest Neighbor (ANN) algorithms trade exactness for efficiency, accepting a small probability of missing the true nearest neighbors in exchange for sublinear query time. The quality-performance tradeoff is characterized by recall—the fraction of true nearest neighbors returned:
where denotes the approximate result set and the true nearest neighbors.
Modern ANN algorithms achieve 95-99% recall with query times 100-1000x faster than exact search, making semantic similarity search practical at scale.
21.1.3 Distance Metrics
Cognica supports three distance metrics for vector similarity:
enum class MetricType : int32_t {
kInnerProduct, // Maximum inner product search (MIPS)
kL2, // Squared Euclidean distance
kL1, // Manhattan distance
};
Squared Euclidean Distance (L2)
The most common metric for embedding similarity:
Squared distance avoids the square root computation while preserving ranking order.
Inner Product (IP)
For normalized embeddings, inner product equals cosine similarity:
When :
The implementation converts inner product similarity to distance for consistent semantics:
if (options_.metric_type == MetricType::kInnerProduct) {
std::transform(dists.cbegin(), dists.cend(), dists.begin(),
[](auto dist) noexcept {
return 1.f - dist; // Convert similarity to distance
});
}
Manhattan Distance (L1)
Less common but useful for sparse embeddings:
21.2 HNSW Algorithm Theory
21.2.1 Navigable Small World Graphs
The HNSW algorithm builds on the small-world network phenomenon—the observation that most nodes in real-world networks can be reached through a small number of hops. A navigable small world (NSW) graph augments this property with greedy routing: starting from any node, repeatedly moving to the neighbor closest to the target eventually reaches the target or its neighborhood.
For an NSW graph with nodes, the expected number of hops for greedy search scales as:
provided the graph maintains appropriate connectivity structure.
21.2.2 Hierarchical Structure
HNSW extends NSW with a hierarchical layer structure that provides logarithmic entry point selection. The hierarchy consists of layers where:
- Layer contains all vectors
- Each successive layer contains approximately of the nodes from the layer below
- The maximum layer for a node is determined probabilistically
The layer assignment follows an exponential distribution:
where is the level multiplier. This ensures that on average:
21.2.3 Search Algorithm
HNSW search proceeds in two phases:
Phase 1: Coarse Search (Layers to )
Starting from the entry point at the top layer, greedy search descends through layers:
function SEARCH_LAYER(q, entry, ef, layer):
candidates = min-heap({entry})
visited = {entry}
result = max-heap({entry})
while candidates not empty:
c = candidates.pop_min()
f = result.peek_max()
if distance(c, q) > distance(f, q):
break // All remaining candidates are farther
for neighbor in get_neighbors(c, layer):
if neighbor not in visited:
visited.add(neighbor)
f = result.peek_max()
if distance(neighbor, q) < distance(f, q) or |result| < ef:
candidates.push(neighbor)
result.push(neighbor)
if |result| > ef:
result.pop_max()
return result
Phase 2: Fine Search (Layer )
At the base layer, the search expands with larger beam width (ef_search) to improve recall.
21.2.4 Key Parameters
M (Connectivity)
The maximum number of bidirectional connections per node. Cognica sets:
connectivity = Mfor layers and aboveconnectivity_base = 2Mfor layer
config.connectivity = options_.M; // Default: 32
config.connectivity_base = options_.M * 2; // Layer 0: 64
Higher M increases recall but also memory usage and construction time:
ef_construction
The beam width during index construction. Larger values improve graph quality at the cost of construction time:
config.expansion_add = options_.ef_construction; // Default: 40
ef_search
The beam width during search. This parameter directly controls the recall-latency tradeoff:
config.expansion_search = options_.ef_search; // Default: 16
The relationship between ef_search and recall is approximately:
where is a dataset-dependent constant and is the number of neighbors requested.
21.3 Index Architecture
21.3.1 Index Type Hierarchy
Cognica supports three HNSW index variants:
enum class HNSWIndexType : int32_t {
kHNSW, // Pure HNSW graph index
kIVF_HNSW, // Inverted File + HNSW hybrid
kHNSW_USEARCH, // USearch native implementation
};
The configuration structure encapsulates all index parameters:
struct HNSWOptions {
HNSWIndexType index_type = HNSWIndexType::kHNSW;
// Core HNSW parameters
int32_t dims = 512; // Vector dimensionality
int32_t M = 32; // Max connections per node
int32_t ef_construction = 40; // Construction beam width
int32_t ef_search = 16; // Search beam width
// IVF parameters (for kIVF_HNSW)
int32_t num_inv_lists = 65536; // Number of partitions
int32_t num_segments = 32; // PQ code segments
int32_t num_bits = 8; // Bits per PQ segment
int32_t num_probe = 10; // Partitions to search
MetricType metric_type = MetricType::kL2;
QuantizerType quantizer_type = QuantizerType::kFlat;
bool normalize = false;
int32_t shards = 1;
};
21.3.2 Quantization Strategies
Vector quantization reduces memory footprint and can improve search speed through cache efficiency:
enum class QuantizerType : int32_t {
kFlat, // Full precision (32-bit float)
k4Bit, // 4-bit scalar quantization
k6Bit, // 6-bit scalar quantization
k8Bit, // 8-bit scalar quantization
k8BitDirect, // Direct 8-bit (no scaling)
k16Bit, // 16-bit precision (float16)
};
Scalar Quantization
Maps each dimension independently to a fixed-bit representation:
where is the number of bits. The memory savings are substantial:
| Quantizer | Bits/Dim | Memory (512D) | Relative |
|---|---|---|---|
| kFlat | 32 | 2048 bytes | 100% |
| k16Bit | 16 | 1024 bytes | 50% |
| k8Bit | 8 | 512 bytes | 25% |
| k4Bit | 4 | 256 bytes | 12.5% |
Product Quantization (PQ)
For IVF_HNSW indices, Product Quantization provides aggressive compression:
enum class PQTrainType : int32_t {
kDefault, // Standard k-means training
kHotStart, // Pre-initialized centroids
kShared, // Shared codebook across segments
kHypercube, // Hypercube initialization
kHypercubePCA, // Hypercube + PCA rotation
};
PQ divides the vector into segments and quantizes each segment to one of centroids:
Memory per vector: bits, enabling billion-scale indices in memory.
21.3.3 USearch Backend Integration
The primary backend uses the USearch library for efficient HNSW operations:
class HNSWIndexProxyUSearch final : public HNSWIndexProxy {
private:
std::unique_ptr<unum::usearch::index_dense_t> index_;
HNSWOptions options_;
// Buffered insertion for batching
std::vector<float> vectors_;
std::vector<DocID> doc_ids_;
absl::flat_hash_set<DocID> doc_ids_set_;
// Training thresholds
static constexpr auto kMinTrainingSetSize = 10'000;
static constexpr auto kBatchSizeForBulkLoading = 100;
};
Index initialization configures USearch with Cognica's options:
HNSWIndexProxyUSearch::HNSWIndexProxyUSearch(...)
: index_([&]() {
using namespace unum::usearch;
// Map metric type
auto metric_type = metric_kind_t::l2sq_k;
if (options_.metric_type == MetricType::kInnerProduct) {
metric_type = metric_kind_t::ip_k;
}
// Map scalar precision
auto scalar_type = scalar_kind_t::f32_k;
switch (options_.quantizer_type) {
case QuantizerType::k8Bit:
scalar_type = scalar_kind_t::i8_k;
break;
case QuantizerType::k16Bit:
scalar_type = scalar_kind_t::f16_k;
break;
}
// Configure HNSW parameters
auto config = index_dense_config_t {};
config.connectivity = options_.M;
config.connectivity_base = options_.M * 2;
config.expansion_add = options_.ef_construction;
config.expansion_search = options_.ef_search;
auto metric = index_dense_t::metric_t {
static_cast<size_t>(options_.dims),
metric_type,
scalar_type,
};
return std::make_unique<index_dense_t>(
index_dense_t::make(metric, config));
}()) {}
21.3.4 FAISS Backend Integration
For advanced use cases, FAISS provides additional capabilities:
class HNSWIndexProxyFaiss final : public HNSWIndexProxy {
private:
std::vector<std::unique_ptr<faiss::Index>> shards_hnsw_;
std::vector<std::unique_ptr<faiss::IndexIDMap>> shards_index_;
std::unique_ptr<faiss::IndexShards> index_;
std::unique_ptr<DynamicRoaringBitSet> deleted_doc_ids_;
};
FAISS integration enables:
- Sharded indices for parallel search
- Custom inverted list storage via KV database
- IVF+PQ compression for billion-scale indices
21.4 Vector Storage and Analysis
21.4.1 Vector Encoding
Vectors are stored as raw binary float arrays for efficient access:
struct DenseVectorCodec {
static auto encode(const std::vector<float>& vector,
std::optional<int32_t> dims = std::nullopt)
-> std::expected<std::string, Status> {
auto vector_dims = static_cast<int32_t>(vector.size());
// Validate dimension alignment
if (dims.has_value() && vector_dims % dims.value()) {
return std::unexpected {
Status::InvalidArgument("Invalid dimensions")
};
}
auto output = std::string {};
auto size = sizeof(float) * vector_dims;
output.reserve(size);
encoding::put_binary(&output, vector.data(), size);
return output;
}
static int32_t get_dim(const std::string_view& data) {
return data.size() / sizeof(float);
}
};
The binary format stores vectors contiguously without headers, enabling zero-copy access when aligned:
21.4.2 Dense Vector Analyzer
The analyzer pipeline converts JSON arrays to binary vector format:
class DenseVectorAnalyzer final : public Analyzer {
public:
Tokens tokenize(const Value& value) const override {
if (!value.IsArray()) {
THROW_EXCEPTION(std::invalid_argument(
"Vector must be an array"));
}
const auto& array = value.GetArray();
auto vectors = std::vector<float> {};
// Support 1D vectors: [v1, v2, v3, ...]
if (is_valid_1d_(value)) {
vectors.reserve(array.Size());
for (const auto& e : array) {
vectors.emplace_back(e.GetFloat());
}
}
// Support 2D vectors: [[v1, v2], [v3, v4], ...]
else if (is_valid_2d_(value)) {
vectors.reserve(array.Size() * dims_);
for (const auto& v : array) {
for (const auto& e : v.GetArray()) {
vectors.emplace_back(e.GetFloat());
}
}
}
// Validate dimensions
if (vectors.size() % static_cast<uint32_t>(dims_)) {
THROW_EXCEPTION(std::invalid_argument(
fmt::format("Invalid dimension: expected {}, got {}",
dims_, vectors.size())));
}
auto output = DenseVectorCodec::encode(vectors, dims_);
return Tokens {
Token {
TokenType::kDenseVector,
std::move(output.value()),
std::nullopt,
0,
Offset {0, size, size},
Offset {0, size, size},
},
};
}
private:
int32_t dims_;
};
21.5 K-NN Query Execution
21.5.1 Query Processing Pipeline
Vector queries flow through the standard FTS query pipeline:
class DenseVectorQuery final : public Query {
public:
auto create_weight(Transaction* txn,
const IndexSearcher* searcher) const
-> std::shared_ptr<Weight> override {
const auto& context = searcher->get_context();
const auto* hnsw_index = context->get_hnsw_index();
// Configure search parameters
constexpr auto kMaxSearchSize = 100uz;
auto top_k = static_cast<int32_t>(
options_->top_k.value_or(kMaxSearchSize));
auto ef_search = get_field_option_value_<int32_t>("ef_search");
// Execute HNSW search
auto result = hnsw_index->search(term_, top_k, ef_search);
// Sort by doc_id for boolean query integration
ranges::sort(
ranges::views::zip(result.doc_ids, result.distances),
std::less<> {},
[](const auto& row) { return std::get<0>(row); });
return std::make_shared<DenseVectorWeight>(
index_stats.value(),
std::move(result.doc_ids),
std::move(result.distances));
}
};
21.5.2 Search Result Structure
Search results include both distances and probabilistic scores:
struct HNSWSearchResult {
std::vector<DocID> doc_ids {}; // Document identifiers
std::vector<float> distances {}; // Distances to query
std::vector<float> probs {}; // Probability scores
};
21.5.3 K-NN Search Implementation
The search algorithm orchestrates buffer flushing, dimension validation, and result conversion:
HNSWSearchResult HNSWIndexProxyUSearch::search(
const std::string_view& vector,
int32_t k,
std::optional<int32_t> ef_search) const {
// Flush buffered vectors for consistency
if (options.db.fts.hnsw_index.flush_before_search) {
const_cast<HNSWIndexProxyUSearch*>(this)->flush(true);
}
// Validate query dimensions
auto dim = DenseVectorCodec::get_dim(vector);
if (dim != static_cast<int32_t>(index_->dimensions())) {
THROW_EXCEPTION(std::runtime_error(
fmt::format("Dimension mismatch: expected {}, got {}",
index_->dimensions(), dim)));
}
// Execute USearch search
const float* query = reinterpret_cast<const float*>(vector.data());
auto result = index_->search(query, k);
// Convert to internal format
auto doc_ids = std::vector<DocID> {};
auto dists = std::vector<float> {};
for (auto i = 0uz; i < result.size(); ++i) {
doc_ids.emplace_back(result[i].member.key);
dists.emplace_back(result[i].distance);
}
// Convert IP similarity to distance
if (options_.metric_type == MetricType::kInnerProduct) {
std::transform(dists.cbegin(), dists.cend(), dists.begin(),
[](auto d) { return 1.f - d; });
}
// Compute softmax probabilities
auto probs = compute_softmax_probs(dists);
return {
std::move(doc_ids),
std::move(dists),
std::move(probs),
};
}
21.5.4 Probabilistic Scoring via Softmax
Distances are converted to probability distributions using temperature-scaled softmax:
auto compute_score_prob_softmax(const std::vector<float>& values,
float temperature = 1.f)
-> std::vector<float> {
auto probs = std::vector<float> {};
probs.reserve(values.size());
// Numerically stable softmax: subtract max
auto max_value = *std::max_element(values.cbegin(), values.cend());
// Compute scaled exponentials
std::transform(values.cbegin(), values.cend(),
std::back_inserter(probs),
[max_value, temperature](auto value) {
return std::exp((value - max_value) / temperature);
});
// Normalize
auto sum = std::accumulate(probs.cbegin(), probs.cend(), 0.f);
if (sum > 0.f) {
std::transform(probs.cbegin(), probs.cend(), probs.begin(),
[sum](auto p) { return p / sum; });
}
return probs;
}
The softmax formula for converting distances to probabilities:
where is the distance to the -th result and is the temperature parameter.
Temperature Effects:
| Temperature | Distribution | Interpretation |
|---|---|---|
| One-hot | Winner-take-all | |
| Standard | Balanced confidence | |
| Uniform | Maximum entropy |
21.6 Index Maintenance
21.6.1 Batched Insertion
Vectors are buffered before batch insertion for efficiency:
Status HNSWIndexProxyUSearch::add(DocID doc_id,
const std::string_view& data,
bool write_to_wal) {
auto vector_dim = static_cast<int32_t>(index_->dimensions());
auto dim = DenseVectorCodec::get_dim(data);
if (dim != vector_dim) {
return Status::Error("Invalid vector size");
}
// Buffer vector
doc_ids_.emplace_back(doc_id);
doc_ids_set_.emplace(doc_id);
auto offset = vectors_.size();
vectors_.resize(offset + vector_dim);
memcpy(vectors_.data() + offset, data.data(),
vector_dim * sizeof(float));
// Write-ahead logging
if (write_to_wal) {
wal_->append(HNSWIndexWALOperationType::kAdd, doc_id,
data.data(), vector_dim * sizeof(float));
}
dirty_ = true;
need_flush_ = true;
return Status::OK();
}
21.6.2 Parallel Flush
Batch insertion uses thread pools for parallel HNSW graph construction:
Status HNSWIndexProxyUSearch::flush(bool force) {
if (!need_flush_) return Status::OK();
auto n = doc_ids_.size();
if (n == 0) {
need_flush_ = false;
return Status::OK();
}
// Minimum training set requirement
auto min_training = options_.training_set_size
.value_or(kMinTrainingSetSize); // Default: 10,000
if (!force && n < min_training) {
return Status::OK();
}
// Reserve capacity
auto dims = index_->dimensions();
auto count = vectors_.size() / dims;
auto new_capacity = index_->size() + count;
if (index_->capacity() < new_capacity) {
index_->reserve(new_capacity + count); // Over-allocate
}
// Parallel insertion (256 vectors per partition)
constexpr auto kPartitionSize = 256uz;
auto partitions = (count + kPartitionSize - 1) / kPartitionSize;
auto& pool = NamedThreadPools::get(ThreadPoolName::kBatchWrite);
auto futures = std::vector<std::future<void>> {};
for (auto i = 0uz; i < partitions; ++i) {
futures.emplace_back(pool.invoke([&, i]() {
for (auto j = 0uz; j < kPartitionSize; ++j) {
auto idx = i * kPartitionSize + j;
if (idx >= count) break;
const auto* vec = vectors_.data() + idx * dims;
auto doc_id = doc_ids_[idx];
index_->add(doc_id, vec,
unum::usearch::index_dense_t::any_thread(),
false);
}
}));
}
for (auto& f : futures) f.get();
vectors_.clear();
doc_ids_.clear();
doc_ids_set_.clear();
need_flush_ = false;
return Status::OK();
}
21.6.3 Write-Ahead Logging
Durability is ensured through WAL before index modification:
enum class HNSWIndexWALOperationType : uint8_t {
kAdd = 0,
kRemove = 1,
};
class HNSWIndexWAL {
public:
Status append(HNSWIndexWALOperationType op,
DocID doc_id,
const char* buffer,
size_t size);
Status apply(const FunctionRef<
Status(HNSWIndexWALOperationType, DocID,
const char*, size_t)>& callback);
Status flush();
};
Recovery replays the WAL to reconstruct buffered state:
Status HNSWIndexProxyUSearch::recover() {
return wal_->apply([this](auto op, auto doc_id,
const char* data, size_t size) {
switch (op) {
case HNSWIndexWALOperationType::kAdd:
return add(doc_id, {data, size}, false);
case HNSWIndexWALOperationType::kRemove:
return remove(doc_id, false);
}
return Status::OK();
});
}
21.6.4 Index Persistence
Committed indices are written atomically using temporary files:
Status HNSWIndexProxyUSearch::commit(const std::string_view& field) {
if (!dirty_) return Status::OK();
// Flush remaining vectors
auto status = flush(true);
if (!status.ok()) return status;
// Parallel disk I/O
auto& pool = NamedThreadPools::get(ThreadPoolName::kDiskIO);
auto futures = std::vector<std::future<Status>> {};
for (auto shard_id = 0; shard_id < get_shard_count(); ++shard_id) {
futures.emplace_back(pool.invoke([this, field, shard_id]() {
auto filename = get_index_filename(field, shard_id);
auto tmp = std::filesystem::path{filename}
.replace_extension(".tmp");
// Write to temp file
auto config = unum::usearch::
index_dense_serialization_config_t {};
auto result = index_->save(tmp.c_str(), config);
if (!result) {
return Status::IOError("Failed to save HNSW index");
}
return Status::OK();
}));
}
// Collect results
for (auto& f : futures) {
status = f.get();
if (!status.ok()) return status;
}
// Atomic rename
auto txn = HNSWIndexTransaction{get_filenames(field)};
status = txn.prepare();
if (!status.ok()) return status;
status = txn.commit();
if (!status.ok()) return status;
// Flush WAL after successful commit
status = wal_->flush();
dirty_ = false;
return status;
}
21.6.5 Document Deletion
HNSW supports soft deletion with graph connectivity preservation:
Status HNSWIndexProxyUSearch::remove(DocID doc_id,
bool write_to_wal) {
if (doc_id == kInvalidDocID) {
return Status::OK();
}
// Remove from buffer
remove_doc_from_buffer_(doc_id);
// WAL entry
if (write_to_wal) {
wal_->append(HNSWIndexWALOperationType::kRemove,
doc_id, nullptr, 0);
}
// Remove from index (marks as deleted)
index_->remove(doc_id);
dirty_ = true;
need_flush_ = true;
return Status::OK();
}
Updates are implemented as delete-then-insert:
Status HNSWIndex::update(const DocValue& doc) {
auto status = remove(doc);
if (!status.ok()) return status;
return add(doc);
}
21.7 IVF-HNSW Hybrid Index
21.7.1 Architecture Overview
The IVF-HNSW hybrid combines Inverted File (IVF) partitioning with HNSW for coarse quantization:
21.7.2 Custom Inverted Lists
FAISS inverted lists are backed by the document database:
class InvertedLists final : public faiss::InvertedLists {
public:
size_t add_entries(size_t list_no, size_t n_entry,
const faiss::idx_t* ids,
const uint8_t* code) override {
auto txn = db_.begin_write_batch();
for (auto i = 0uz; i < n_entry; ++i) {
auto key = HNSWIndexIVFKeyCodecV1::encode(
guid_, field_name_, list_no, ids[i]);
auto value_view = Slice {
reinterpret_cast<const char*>(code + i * code_size),
code_size,
};
auto value = HNSWIndexIVFValueCodecV1::encode(value_view);
txn->put(key, value);
}
return txn->commit().ok() ? n_entry : 0;
}
auto get_iterator(size_t list_no, void* ctx) const
-> faiss::InvertedListsIterator* override {
return new InvertedListsIterator(
&db_, guid_, field_name_, list_no, code_size);
}
};
21.7.3 Search with IVF
IVF search probes multiple partitions:
where are the nearest centroids to .
21.8 Performance Characteristics
21.8.1 Complexity Analysis
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Build | ||
| Search | ||
| Insert | ||
| Delete |
where is the average layer count per node.
21.8.2 Memory Footprint
Memory per vector in HNSW:
For 512-dimensional vectors with M=32 and float32 storage:
With 8-bit quantization:
21.8.3 Recall vs Latency Tradeoff
The ef_search parameter controls the recall-latency tradeoff:
| ef_search | Recall@10 | Latency (ms) |
|---|---|---|
| 16 | ~85% | 0.1 |
| 64 | ~95% | 0.3 |
| 256 | ~99% | 1.0 |
| 1024 | ~99.9% | 4.0 |
Values are representative for 1M vectors in 512 dimensions.
21.9 Configuration and Best Practices
21.9.1 Parameter Selection Guidelines
Dimensionality (dims)
Match the embedding model output dimension. Common values:
- 384: Sentence transformers (MiniLM)
- 512: Custom models, reduced BERT
- 768: BERT base, RoBERTa
- 1024: BERT large
- 1536: OpenAI text-embedding-3-small
Connectivity (M)
Higher M increases recall at the cost of memory:
- M=16: Memory-constrained, lower recall acceptable
- M=32: Balanced (default)
- M=64: High-recall applications
ef_construction
Higher values improve graph quality:
- ef_construction=40: Fast indexing (default)
- ef_construction=100: Balanced
- ef_construction=200+: Maximum quality
ef_search
Tune per query based on latency budget:
- ef_search=16: Lowest latency (default)
- ef_search=64: Balanced
- ef_search=256+: Maximum recall
21.9.2 Index Configuration Example
{
"vector_field": {
"analyzer": {
"type": "DenseVectorAnalyzer",
"options": {
"dims": 512
}
},
"hnsw": {
"index_type": "HNSW_USEARCH",
"dims": 512,
"M": 32,
"ef_construction": 100,
"ef_search": 64,
"metric_type": "InnerProduct",
"quantizer_type": "k8Bit",
"normalize": true
}
}
}
21.10 Summary
This chapter explored Cognica's vector search infrastructure:
-
Theoretical foundations: The curse of dimensionality motivates approximate methods; HNSW provides logarithmic search through hierarchical small-world graphs
-
Algorithm parameters: M controls connectivity, ef_construction determines graph quality, ef_search trades recall for latency
-
Multi-backend architecture: USearch provides the default implementation; FAISS enables advanced features like IVF-PQ compression
-
Quantization strategies: Scalar quantization (4-16 bit) reduces memory; Product Quantization enables billion-scale indices
-
Probabilistic scoring: Temperature-scaled softmax converts distances to calibrated probability distributions
-
Durability guarantees: WAL-based recovery and atomic commits ensure crash consistency
-
Performance characteristics: Sublinear search complexity enables real-time similarity search at scale
The next chapter examines hybrid search, combining the semantic understanding of vector search with the precision of lexical matching through principled score fusion.