Graph Queries in a Unified Database: From Cypher to Posting Lists | Blog

The Polyglot Problem

Modern applications increasingly need to traverse relationships. Social networks need friends-of-friends queries. Fraud detection systems need to follow transaction chains. Recommendation engines need to walk purchase-and-preference graphs.

The conventional solution is straightforward: deploy a graph database alongside your relational database. Neo4j or Amazon Neptune handles the graph workload, while PostgreSQL handles everything else.

This works until it doesn't. The moment you need a query that spans both systems, you face a fundamental architectural problem:

Data duplication: Entity data must exist in both the relational store and the graph store.
Synchronization: Changes in one system must propagate to the other, typically through Kafka pipelines or CDC connectors.
No cross-system transactions: An ACID guarantee that spans a relational INSERT and a graph MERGE does not exist in a polyglot architecture.
Two query languages: Application code must context-switch between SQL and Cypher (or Gremlin), often for what is logically a single question.

We built Cognica to eliminate this boundary entirely.

Graph Operations as Posting Lists

Cognica's engine is built on a single mathematical abstraction: the posting list. Every query paradigm — relational SQL, full-text search, vector similarity, and graph traversal — compiles to operations that produce and compose posting lists through Boolean algebra.

For graph queries, the key insight is that graph traversal results are structurally identical to document retrieval results: both are sets of entity IDs that satisfy some predicate.

Operation	Input	Output
`WHERE age > 30`	Table scan	Posting list of matching row IDs
`WHERE content @@ 'database'`	Inverted index	Posting list of matching doc IDs
`ORDER BY embedding <=> $vec`	HNSW index	Posting list of nearest neighbor IDs
`MATCH (a)-[:FOLLOWS*1..3]->(b)`	Adjacency traversal	Posting list of reachable vertex IDs

Because every operation returns a posting list, they compose through the same Boolean operators: AND (intersection), OR (union), NOT (complement). A query that combines full-text search with graph traversal requires no special fusion logic — it is just two posting lists intersected.

Property Graph on Document Storage

Cognica stores graphs as pairs of document collections. A graph named social produces two collections:

social_nodes stores vertices:

{
  "_id": "social:Person:1001",
  "_label": "Person",
  "_graph": "social",
  "name": "Alice",
  "age": 30,
  "department": "Engineering"
}

social_edges stores edges:

{
  "_id": "social:FOLLOWS:2001",
  "_source": "social:Person:1001",
  "_target": "social:Person:1002",
  "_type": "FOLLOWS",
  "_graph": "social",
  "since": "2024-01-15"
}

This dual-collection model has a critical advantage: it reuses the entire document storage infrastructure. Indexes, ACID transactions, LSM-tree compaction, cursor iteration, and federation — all of it works on graph data without a single line of graph-specific storage code.

Composite indexes on (_source, _type) and (_target, _type) enable efficient adjacency lookups in both directions. An LRU adjacency cache further accelerates multi-hop traversals by keeping pre-computed neighbor relationships in memory.

Cypher Through Parse-Time Rewriting

Cognica supports Apache AGE-compatible Cypher syntax. But rather than building a separate graph query engine, we translate Cypher into SQL at parse time.

SELECT *
  FROM cypher('social', $$
    MATCH (a:Person)-[:FOLLOWS*1..3]->(b:Person)
    WHERE a.name = 'Alice'
    RETURN b.name, b.role
  $$) AS (name TEXT, role TEXT);

When the SQL parser encounters the cypher() function in a FROM clause, it triggers a translation pipeline:

Cypher Lexer — A hand-written scanner tokenizes pattern arrows (-[:TYPE]->) and case-insensitive keywords.
Cypher Parser — Builds a Cypher-specific AST from the token stream.
Cypher-to-SQL Rewriter — Transforms the Cypher AST into standard SQL AST nodes that reference graph table functions.
Standard Optimizer — The rewritten SQL re-enters the same cost-based optimizer used for all other queries.

This design means the SQL optimizer has full visibility into the graph query structure. Predicate pushdown, join reordering, and index selection all work across graph-relational boundaries, because by the time optimization runs, there is no distinction between "graph operations" and "relational operations" — they are all SQL plan nodes.

Composing Paradigms in a Single Query

The real power emerges when graph operations compose with other paradigms. Consider a research paper discovery query that combines text matching, vector similarity, graph centrality, and relational filtering:

SELECT title, year, field, _score
  FROM papers
 WHERE fuse_log_odds(
         bayesian_match(title, 'attention'),
         knn_match(embedding, $1, 10),
         pagerank()
       )
   AND year >= 2019
 ORDER BY _score DESC
 LIMIT 5;

This single SQL statement:

Searches paper titles using Bayesian BM25 probabilistic scoring
Finds semantically similar papers via HNSW vector search
Computes citation graph centrality through PageRank
Fuses all three signals through log-odds conjunction
Filters by publication year
Returns the top 5 results

In a polyglot architecture, this query would require three separate systems, application-level score merging, and no transactional consistency. In Cognica, it is one query, one transaction, one optimizer.

Why This Matters

The unified approach eliminates an entire class of infrastructure:

No sync pipelines between relational and graph stores
No data duplication across systems
No impedance mismatch between query languages
ACID transactions spanning relational and graph operations
Single optimizer that understands the full query structure

Graph queries are not a bolted-on feature. They are a natural consequence of the posting list algebra — one more operation that produces sets of IDs composable through Boolean logic, just like every other paradigm in the engine.

For teams building applications that need relationship-aware queries alongside traditional SQL, full-text search, or vector similarity, this means deploying one system instead of three or four, with no compromise on query expressiveness or transactional guarantees.

Tags:

#Graph #Cypher #SQL #Unified Query Algebra #Cognica #Database Architecture