Engine Architecture

A layered architecture where every query paradigm compiles to the same algebraic structure. Built from scratch in C++23 with a register-based virtual machine and JIT compiler.

System Architecture

Five layers from client protocol to persistent storage. Each layer is designed for cross-paradigm optimization.

Client
PostgreSQL Wire ProtocolApache FlightSQL
Query Processing
SQL ParserQuery PlannerCost-Based OptimizerCVM Compiler
Execution
CVM InterpreterCopy-and-Patch JITVolcano IteratorVectorized ExecutorHybrid Executor
PostingList Operators
ANDORNOTTermPhraseRangeKNNBM25PageRankLog-Odds Fusion
Storage & Federation
Storage
Posting ListsClustered Term IndexHNSW IndexDoc ValuesLSM-Tree
Federation
MySQLPostgreSQLApache FlightSQL ServerAWS S3GCP Cloud StorageAzure Blob Storage

Unified Query Algebra

Every paradigm compiles to the same algebraic interface. Boolean operations (union, intersection, complement) compose freely across paradigms.

opOperator,  op.execute()PostingList\forall\, op \in \texttt{Operator},\; op.\texttt{execute}() \to \texttt{PostingList}

Relational SQL

Full ACID transactions with joins, aggregations, window functions (ROW_NUMBER, RANK, LAG, LEAD), recursive CTEs, and subquery unnesting. DPccp algorithm for optimal bushy join trees.

SELECT p.name,
       COUNT(*) AS order_count
  FROM products p
  JOIN orders o
    ON p.id = o.product_id
 WHERE o.created_at > NOW() - INTERVAL '30 days'
 GROUP BY p.name
 ORDER BY order_count DESC
 LIMIT 10;

Full-Text Search

BM25 and Bayesian BM25 scoring with WAND/BMW safe pruning. 13 tokenizers including MeCab for CJK. Spell check, autocomplete, and synonym expansion. Custom tokenizer and token filter support.

SELECT title, _score
  FROM documents
 WHERE content @@ 'bayesian & probabilistic & ranking'
 ORDER BY _score DESC
 LIMIT 10;

Vector Search

HNSW-based ANN search with cosine, inner product, and L2 distance. Probabilistic score calibration using index-derived density ratios for principled hybrid fusion.

SELECT title, _score
  FROM documents
 ORDER BY embedding <=> query_embedding
 LIMIT 10;

Graph Queries

Apache AGE-compatible graph operations as SQL table functions. BFS/DFS traversal, bidirectional shortest path, LRU adjacency cache for multi-hop optimization.

SELECT *
  FROM cypher('social', $$
    MATCH (a:Person)-[:FOLLOWS*1..3]->(b:Person)
    WHERE a.name = 'Alice'
    RETURN b.name, b.role
  $$) AS (name TEXT, role TEXT);

Multiple Paradigms in One Query

Bayesian text matching, vector similarity, graph centrality, and relational filtering composed through a single SQL statement via log-odds fusion.

SELECT title, year, field, _score
  FROM papers
 WHERE fuse_log_odds(
         bayesian_match(title, 'attention'),
         knn_match(embedding, $1, 10),
         pagerank()
       )
   AND year >= 2019
 ORDER BY _score DESC
 LIMIT 5;

Cognica Virtual Machine

A register-based bytecode interpreter with tiered JIT compilation for high-performance query execution.

Register-Based VM

16 general-purpose + 8 floating-point registers. 256 opcodes across 6 instruction formats. Computed goto dispatch for 10-15% throughput improvement.

Copy-and-Patch JIT

Stencil-based JIT generating native x86-64 and ARM64 code from pre-compiled templates. Tiered compilation: interpreted at Tier 0, baseline JIT at 1K executions, optimized JIT at 50K.

Toolchain

Built-in disassembler and stencil CPU emulator for cross-platform JIT development. Full debugger with breakpoints, step execution, and execution history.

Flexible Deployment

Run Cognica wherever your application needs it.

Server Mode

Deploy as a standalone database server with PostgreSQL wire protocol. Connect from any language or tool that supports PostgreSQL.

Embedded Mode

Embed directly into your application as a library, like SQLite3 or DuckDB. Zero network overhead, single-process simplicity.