Chapter 3: Extending the Algebra to Graph Structures
This chapter extends the unified mathematical framework to incorporate graph operations while preserving the algebraic properties established in Chapter 2. We demonstrate that graph posting lists form isomorphic structures to document posting lists, enabling graph traversal, pattern matching, and cross-paradigm queries within the same algebraic framework.
3.1 Motivation for Graph Integration
Graph databases have emerged as essential tools for modeling relationships: social networks, knowledge graphs, fraud detection networks, recommendation systems, and supply chain dependencies. Yet traditional graph databases operate as isolated systems, requiring separate data synchronization and offering no transactional consistency with relational or search workloads.
3.1.1 The Relationship Modeling Challenge
Consider a product recommendation system that must:
- Find products matching user search terms (full-text search)
- Filter by inventory and pricing constraints (relational predicates)
- Rank by embedding similarity to user preferences (vector search)
- Traverse purchase history and social connections (graph traversal)
- Return results with explanation paths (graph + relational join)
In a polyglot architecture, the graph component requires:
- Duplicating entity data from the relational store
- Maintaining referential integrity across systems
- Orchestrating cross-system queries with no shared transaction
- Merging results with incompatible data models
3.1.2 Graph-Posting List Unification
The key insight enabling graph integration is that graph operations also produce sets of identifiers:
Traversal: Starting from vertex , find all vertices reachable in hops
Pattern Match: Find all vertices matching a structural pattern
Path Query: Find vertices connected by paths matching a regular expression
Each operation returns a set - a posting list of vertex identifiers. These posting lists combine with document posting lists through the same Boolean operations, enabling unified optimization across paradigms.
3.2 Graph Type System
We formalize the graph type system as an extension of the document type system from Chapter 2.
3.2.1 Property Graph Model
A property graph is a tuple where:
- is the set of vertices (nodes)
- is the set of directed edges
- assigns properties (key-value pairs)
- assigns labels from label set
- assigns a type to each edge
Vertices and edges both carry properties and labels, making them semi-structured entities similar to documents.
3.2.2 Vertex Space
The vertex space contains all vertices:
Each vertex has:
- A unique identifier:
- Properties:
- Labels:
3.2.3 Edge Space
The edge space contains all edges:
Each edge has:
- A unique identifier:
- Source vertex:
- Target vertex:
- Edge type:
- Properties:
3.2.4 Graph Posting Lists
A graph posting list maps a predicate to a set of vertices or edges:
Vertex posting list:
Edge posting list:
Examples of graph predicates:
Label predicate: Vertices with label
Property predicate: Vertices where property satisfies condition
Edge type predicate: Edges of type
Adjacency predicate: Vertices adjacent to vertex via edge type
3.2.5 Document-Vertex Correspondence
In Cognica, vertices correspond to documents:
This isomorphism maps:
- Vertex ID to document ID
- Vertex properties to document fields
- Vertex labels to document type tags
The correspondence enables a document to participate in both relational queries (as a row) and graph queries (as a vertex) without data duplication.
3.3 Graph-Posting List Isomorphism
We now prove that graph posting lists satisfy the same algebraic properties as document posting lists.
3.3.1 Boolean Algebra Structure
Theorem 3.1 (Graph Posting List Boolean Algebra): The set of graph posting lists with operations , , , , and forms a Boolean algebra isomorphic to the document posting list algebra .
Proof: We verify each axiom:
Closure: For any graph posting lists :
- (intersection of vertex sets is a vertex set)
- (union of vertex sets is a vertex set)
- (complement is a vertex set)
Commutativity: Inherited from set operations.
Associativity: Inherited from set operations.
Distributivity: Inherited from set operations.
Identity: and .
Complementation: and .
The isomorphism is induced by the document-vertex correspondence:
This preserves all Boolean operations:
3.3.2 Preservation Under Graph Operations
Theorem 3.2 (Traversal Preserves Boolean Structure): Graph traversal operations produce posting lists that participate in Boolean algebra.
Proof: Let denote -hop traversal from vertex :
This is a subset of , hence an element of .
For combined traversals:
Conjunction: Vertices reachable from both and :
Disjunction: Vertices reachable from either or :
Both results are elements of , preserving Boolean structure.
3.3.3 Lattice Structure of Graph Queries
Graph queries form a complete lattice ordered by result containment:
Definition: For graph queries , define iff for all graph instances.
Theorem 3.3 (Graph Query Lattice): Graph queries ordered by form a complete lattice.
The lattice operations:
- Meet: returns intersection of results
- Join: returns union of results
- Bottom: Query returning
- Top: Query returning
This lattice structure enables the optimizer to navigate between equivalent graph query formulations.
3.4 Graph Algebra Operations
We define the operators that transform graph posting lists.
3.4.1 Adjacency Operator
The adjacency operator expands a posting list to include adjacent vertices:
For vertex set , edge types , and direction :
Properties of adjacency:
Monotonicity:
Distributivity over union:
Edge type union:
3.4.2 Traversal Operator
The traversal operator performs multi-hop traversal:
Defined recursively:
This computes the -hop neighborhood: all vertices reachable within edge traversals.
Fixed-point characterization:
The infinite traversal is the least fixed point of the adjacency expansion, representing the transitive closure.
3.4.3 Pattern Matching Operator
The pattern matching operator finds subgraph isomorphisms:
where is a pattern graph. The result contains vertices that can serve as the anchor vertex in pattern matches.
Example pattern: Find vertices that are both a "person" and have an outgoing "knows" edge to another "person":
Pattern: (p1:Person)-[:KNOWS]->(p2:Person)
Anchor: p1
Pattern matching reduces to a conjunction of posting list operations:
- Vertex label constraints: label posting lists
- Edge existence constraints: adjacency posting lists
- Property constraints: property posting lists
3.4.4 Regular Path Query Operator
Regular Path Queries (RPQ) find paths matching a regular expression over edge types:
where is the set of regular expressions over edge types.
Grammar:
where is an edge type.
Semantics:
Base case: Edge type
Concatenation: Sequential traversal
Alternation: Union of paths
Kleene star: Zero or more repetitions
Kleene plus: One or more repetitions
Optional: Zero or one occurrence
3.4.5 Shortest Path Operator
The shortest path operator finds vertices connected by minimum-length paths:
This operator returns a path (sequence of vertices) rather than a posting list. However, the reachability check derived from it returns a Boolean:
And path existence produces a posting list:
3.5 Operator Composition and Properties
3.5.1 Composition Monoid
Graph operators compose to form a monoid analogous to document operators:
where:
- is the set of graph operators
- is function composition
- is the identity on
Associativity:
Identity:
3.5.2 Algebraic Properties
Adjacency Idempotence (for undirected edges):
Note: This is containment, not equality - repeated adjacency can reach more vertices.
Traversal Monotonicity:
Traversal Fixed Point:
The fixed point is reached when the traversal saturates (no new vertices discovered).
Filter-Traversal Interaction:
Filtering before traversal restricts starting vertices; filtering after traversal restricts ending vertices. These are semantically different and not interchangeable.
3.5.3 Equivalence-Preserving Transformations
Several transformations preserve query semantics:
Traversal Decomposition:
Adjacency Distribution:
Pattern to Traversal Reduction: Simple chain patterns reduce to traversal:
where is the posting list for vertex pattern .
3.6 Cross-Paradigm Integration
The power of the unified algebra emerges in cross-paradigm queries.
3.6.1 Graph-Relational Integration
ToGraph Operator: Convert relational results to graph vertices
Under the document-vertex correspondence, this is the identity:
FromGraph Operator: Convert graph results to relational rows
Also the identity under correspondence.
Example Query: Find customers who purchased products in the same category as their friends:
SELECT DISTINCT c.name, p.name AS recommended_product
FROM customers c
-- Graph traversal: find friends
JOIN LATERAL (
SELECT friend_id FROM graph_traverse(c.id, 'FRIENDS_WITH', 1)
) f ON true
-- Friend's purchases
JOIN purchases fp ON fp.customer_id = f.friend_id
JOIN products friend_prod ON fp.product_id = friend_prod.id
-- Products in same category
JOIN products p ON p.category = friend_prod.category
-- Exclude already purchased
WHERE NOT EXISTS (
SELECT 1 FROM purchases cp
WHERE cp.customer_id = c.id AND cp.product_id = p.id
);
This query seamlessly combines:
- Relational joins (purchases, products)
- Graph traversal (friends)
- Set operations (exclusion)
The posting list representation enables unified optimization.
3.6.2 Graph-Vector Integration
Vertex Embeddings: Vertices can have vector representations:
This enables:
Graph-aware similarity search:
Find the nearest neighbors that are also reachable via edges of type .
Embedding-based edge prediction:
Predict new edges based on embedding similarity.
3.6.3 Graph-Text Integration
Semantic Graph Search: Combine text relevance with graph structure:
Find documents matching text query that are within hops of vertex via edges of type .
Example: Find research papers mentioning "machine learning" by authors within 2 collaboration hops:
SELECT p.title, bm25_score(p.abstract) AS relevance
FROM papers p
WHERE MATCH(p.abstract) AGAINST ('machine learning')
AND p.author_id IN (
SELECT vertex_id FROM graph_traverse(:current_author, 'COAUTHORED', 2)
)
ORDER BY relevance DESC
LIMIT 10;
3.6.4 Unified Query Plan
The query planner treats graph predicates as another source of posting lists, applying the same optimization strategies (predicate pushdown, intersection ordering by selectivity, etc.) across all paradigms.
3.7 Implementation Architecture
3.7.1 Graph Storage in LSM-Tree
Cognica stores graphs using the same LSM-tree infrastructure as documents:
Vertex Storage: Vertices store as documents with graph metadata:
Key: vertex:{graph_id}:{vertex_id}
Value: {properties, labels, ...}
Edge Storage: Edges store with composite keys for efficient traversal:
Key: edge:out:{graph_id}:{src_id}:{edge_type}:{tgt_id}
Value: {properties, ...}
Key: edge:in:{graph_id}:{tgt_id}:{edge_type}:{src_id}
Value: {} (reference only)
Dual edge storage (out and in) enables efficient traversal in both directions.
3.7.2 Adjacency Index
For each edge type, an adjacency index maps vertices to their neighbors:
Key: adj:out:{graph_id}:{edge_type}:{src_id}
Value: [tgt_id_1, tgt_id_2, ...] (posting list)
Key: adj:in:{graph_id}:{edge_type}:{tgt_id}
Value: [src_id_1, src_id_2, ...] (posting list)
This posting list structure enables:
- Fast single-hop expansion: where is the vertex degree
- Efficient intersection with other posting lists
- Skip pointer acceleration for high-degree vertices
3.7.3 Label Index
Label indexes support vertex filtering:
Key: label:{graph_id}:{label}
Value: [vertex_id_1, vertex_id_2, ...] (posting list)
This enables efficient evaluation of label predicates.
3.7.4 Traversal Execution
Multi-hop traversal executes as iterative adjacency expansion:
Algorithm: BFS_Traversal(start, edge_types, max_depth, direction)
frontier = {start}
visited = {start}
for depth in 1..max_depth:
next_frontier = {}
for v in frontier:
neighbors = adjacency_lookup(v, edge_types, direction)
for u in neighbors:
if u not in visited:
visited.add(u)
next_frontier.add(u)
frontier = next_frontier
if frontier is empty:
break
return visited
Optimization: For intersection with other posting lists, early termination prunes branches:
Algorithm: Filtered_Traversal(start, edge_types, max_depth, direction, filter_posting)
frontier = {start} intersect filter_posting
visited = frontier
for depth in 1..max_depth:
next_frontier = {}
for v in frontier:
neighbors = adjacency_lookup(v, edge_types, direction)
// Early filter application
filtered_neighbors = neighbors intersect filter_posting
for u in filtered_neighbors:
if u not in visited:
visited.add(u)
next_frontier.add(u)
frontier = next_frontier
if frontier is empty:
break
return visited
3.7.5 Pattern Match Execution
Pattern matching compiles to a join tree of posting list operations:
Example Pattern:
(a:Person {age > 30})-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company {name = 'Acme'})
Execution Plan:
- = label_index(Person) property_filter(age > 30)
- = label_index(Company) property_filter(name = 'Acme')
- = adj_in(c, WORKS_AT) label_index(Person)
- = adj_in(b, KNOWS)
- Return
The optimizer reorders these operations based on selectivity estimates.
3.7.6 SQL Graph Query Interface
Cognica exposes graph operations through SQL table functions, enabling graph traversal within standard SQL queries:
graph_traverse() — Multi-hop traversal as a table function:
-- Find all users reachable within 3 hops via FOLLOWS edges
SELECT t.vertex_id, t.depth, t.path
FROM graph_traverse(
'social_graph', -- graph name
'user_123', -- start vertex
'FOLLOWS', -- edge type
3, -- max depth
'outgoing' -- direction
) t;
-- Combine with relational predicates
SELECT u.name, t.depth
FROM graph_traverse('social_graph', 'user_123', 'FOLLOWS', 2, 'outgoing') t
JOIN users u ON u.id = t.vertex_id
WHERE u.active = true;
Recursive CTE Integration — Standard SQL recursive queries can express path-finding:
-- Shortest path via recursive CTE
WITH RECURSIVE paths AS (
SELECT id AS vertex_id, ARRAY[id] AS path, 0 AS depth
FROM vertices WHERE id = 'user_123'
UNION ALL
SELECT e.target_id, p.path || e.target_id, p.depth + 1
FROM paths p
JOIN edges e ON e.source_id = p.vertex_id AND e.type = 'FOLLOWS'
WHERE p.depth < 5 AND NOT e.target_id = ANY(p.path)
)
SELECT * FROM paths WHERE vertex_id = 'user_456'
ORDER BY depth LIMIT 1;
Cypher Query Support — Optional Cypher syntax via the cypher() table function, compatible with Apache AGE:
-- Cypher via table function
SELECT * FROM cypher('social_graph', $$
MATCH (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
RETURN b.name, b.age
$$) AS (name TEXT, age INTEGER);
Adjacency Cache — An in-memory cache accelerates repeated traversals by caching adjacency lists for frequently accessed vertices, reducing RocksDB lookups during iterative graph algorithms.
3.8 Scored Graph Operations
Graph operations can carry scores for ranking.
3.8.1 Edge Weight Scores
Edges may have weights representing relationship strength:
Weighted adjacency returns scored posting lists:
3.8.2 Path Scores
Paths aggregate edge weights:
Additive path score:
Multiplicative path score (for probability):
Min path score (for bottleneck):
3.8.3 PageRank and Centrality
PageRank assigns importance scores to vertices:
where is the damping factor (typically 0.85) and is the vertex count.
PageRank precomputes as a vertex property, enabling score-aware queries:
SELECT v.name, v.pagerank
FROM vertices v
WHERE v.id IN (
SELECT vertex_id FROM graph_traverse(:start, 'LINKS_TO', 3)
)
ORDER BY v.pagerank DESC
LIMIT 10;
3.8.4 Score Combination with Other Paradigms
Graph scores combine with text and vector scores using the frameworks from Chapter 2:
Example: Hybrid ranking combining BM25 and PageRank:
Or probabilistically:
where scores are calibrated to .
3.9 Query Optimization for Graph Operations
3.9.1 Selectivity Estimation
Graph operation selectivity depends on graph structure:
Adjacency selectivity:
Traversal selectivity (k hops):
Pattern selectivity: Product of component selectivities:
3.9.2 Join Ordering
Pattern matching is essentially a join problem. The optimizer orders pattern components by selectivity:
- Start with most selective constraint (smallest posting list)
- Expand through adjacency with next most selective constraint
- Continue until pattern is matched
Example: For pattern (a:Person)-[:KNOWS]->(b:Influencer)-[:PROMOTES]->(c:Product):
If Influencer is rare (high selectivity), start from b:
- = label_index(Influencer)
- = adj_in(b, KNOWS) label_index(Person)
- = adj_out(b, PROMOTES) label_index(Product)
3.9.3 Traversal Pruning
Early termination strategies for traversal:
Top-k pruning: When seeking top-k results by score, maintain a threshold and prune branches that cannot exceed it.
Filter pushdown: Apply filters at each traversal step rather than at the end.
Bidirectional search: For point-to-point queries, search from both ends and meet in the middle.
3.10 Summary
This chapter extended the unified algebra to incorporate graph structures:
-
Graph type system formalizes vertices, edges, properties, and labels within the same framework as documents, with explicit document-vertex correspondence enabling zero-duplication storage
-
Graph-posting list isomorphism proves that graph operations produce posting lists satisfying the same Boolean algebra as document posting lists, enabling unified optimization
-
Graph algebra operators include adjacency (), traversal (), pattern matching (), regular path queries (RPQ), and shortest paths (), all composing through the same monoid structure as document operators
-
Cross-paradigm integration enables queries combining relational predicates, text search, vector similarity, and graph traversal through unified posting list operations
-
Implementation architecture stores graphs in LSM-trees with dual-direction edge indexes and label indexes, supporting efficient traversal and pattern matching
-
Scored graph operations extend the algebra to handle edge weights, path scores, and centrality measures, combining with text and vector scores through the same frameworks
The following chapter develops query optimization theory, showing how the algebraic properties established in Chapters 2 and 3 enable cost-based optimization across all paradigms.