Chapter 3: Extending the Algebra to Graph Structures

This chapter extends the unified mathematical framework to incorporate graph operations while preserving the algebraic properties established in Chapter 2. We demonstrate that graph posting lists form isomorphic structures to document posting lists, enabling graph traversal, pattern matching, and cross-paradigm queries within the same algebraic framework.

3.1 Motivation for Graph Integration

Graph databases have emerged as essential tools for modeling relationships: social networks, knowledge graphs, fraud detection networks, recommendation systems, and supply chain dependencies. Yet traditional graph databases operate as isolated systems, requiring separate data synchronization and offering no transactional consistency with relational or search workloads.

3.1.1 The Relationship Modeling Challenge

Consider a product recommendation system that must:

  1. Find products matching user search terms (full-text search)
  2. Filter by inventory and pricing constraints (relational predicates)
  3. Rank by embedding similarity to user preferences (vector search)
  4. Traverse purchase history and social connections (graph traversal)
  5. Return results with explanation paths (graph + relational join)

In a polyglot architecture, the graph component requires:

  • Duplicating entity data from the relational store
  • Maintaining referential integrity across systems
  • Orchestrating cross-system queries with no shared transaction
  • Merging results with incompatible data models

3.1.2 Graph-Posting List Unification

The key insight enabling graph integration is that graph operations also produce sets of identifiers:

Traversal: Starting from vertex vv, find all vertices reachable in kk hops

traverse(v,k)={uVdist(v,u)k}\text{traverse}(v, k) = \{u \in V \mid \text{dist}(v, u) \leq k\}

Pattern Match: Find all vertices matching a structural pattern

match(G,P)={vVv participates in pattern P}\text{match}(G, P) = \{v \in V \mid v \text{ participates in pattern } P\}

Path Query: Find vertices connected by paths matching a regular expression

RPQ(v,r)={uV path vru}\text{RPQ}(v, r) = \{u \in V \mid \exists \text{ path } v \xrightarrow{r} u\}

Each operation returns a set - a posting list of vertex identifiers. These posting lists combine with document posting lists through the same Boolean operations, enabling unified optimization across paradigms.

3.2 Graph Type System

We formalize the graph type system as an extension of the document type system from Chapter 2.

3.2.1 Property Graph Model

A property graph is a tuple G=(V,E,ρ,λ,σ)G = (V, E, \rho, \lambda, \sigma) where:

  • VV is the set of vertices (nodes)
  • EV×VE \subseteq V \times V is the set of directed edges
  • ρ:VE2F×V\rho: V \cup E \rightarrow 2^{\mathcal{F} \times \mathcal{V}} assigns properties (key-value pairs)
  • λ:VE2L\lambda: V \cup E \rightarrow 2^L assigns labels from label set LL
  • σ:ET\sigma: E \rightarrow \mathcal{T} assigns a type to each edge

Vertices and edges both carry properties and labels, making them semi-structured entities similar to documents.

3.2.2 Vertex Space

The vertex space VG\mathcal{V}_G contains all vertices:

VG={vvV for some property graph G}\mathcal{V}_G = \{v \mid v \in V \text{ for some property graph } G\}

Each vertex has:

  • A unique identifier: id:VGN\text{id}: \mathcal{V}_G \rightarrow \mathbb{N}
  • Properties: props:VG2F×V\text{props}: \mathcal{V}_G \rightarrow 2^{\mathcal{F} \times \mathcal{V}}
  • Labels: labels:VG2L\text{labels}: \mathcal{V}_G \rightarrow 2^L

3.2.3 Edge Space

The edge space EG\mathcal{E}_G contains all edges:

EG={(u,v,t)(u,v)E,t=σ(u,v)}\mathcal{E}_G = \{(u, v, t) \mid (u, v) \in E, t = \sigma(u, v)\}

Each edge has:

  • A unique identifier: id:EGN\text{id}: \mathcal{E}_G \rightarrow \mathbb{N}
  • Source vertex: src:EGVG\text{src}: \mathcal{E}_G \rightarrow \mathcal{V}_G
  • Target vertex: tgt:EGVG\text{tgt}: \mathcal{E}_G \rightarrow \mathcal{V}_G
  • Edge type: type:EGT\text{type}: \mathcal{E}_G \rightarrow \mathcal{T}
  • Properties: props:EG2F×V\text{props}: \mathcal{E}_G \rightarrow 2^{\mathcal{F} \times \mathcal{V}}

3.2.4 Graph Posting Lists

A graph posting list maps a predicate to a set of vertices or edges:

Vertex posting list:

PV:PredV2VGP_V: \text{Pred}_V \rightarrow 2^{\mathcal{V}_G}

Edge posting list:

PE:PredE2EGP_E: \text{Pred}_E \rightarrow 2^{\mathcal{E}_G}

Examples of graph predicates:

Label predicate: Vertices with label \ell

PV(hasLabel())={vVGlabels(v)}P_V(\text{hasLabel}(\ell)) = \{v \in \mathcal{V}_G \mid \ell \in \text{labels}(v)\}

Property predicate: Vertices where property pp satisfies condition ϕ\phi

PV(prop(p,ϕ))={vVGϕ(props(v)(p))}P_V(\text{prop}(p, \phi)) = \{v \in \mathcal{V}_G \mid \phi(\text{props}(v)(p))\}

Edge type predicate: Edges of type tt

PE(hasType(t))={eEGtype(e)=t}P_E(\text{hasType}(t)) = \{e \in \mathcal{E}_G \mid \text{type}(e) = t\}

Adjacency predicate: Vertices adjacent to vertex vv via edge type tt

PV(adj(v,t))={uVG(v,u,t)EG(u,v,t)EG}P_V(\text{adj}(v, t)) = \{u \in \mathcal{V}_G \mid (v, u, t) \in \mathcal{E}_G \lor (u, v, t) \in \mathcal{E}_G\}

3.2.5 Document-Vertex Correspondence

In Cognica, vertices correspond to documents:

VGD\mathcal{V}_G \cong \mathcal{D}

This isomorphism maps:

  • Vertex ID to document ID
  • Vertex properties to document fields
  • Vertex labels to document type tags

The correspondence enables a document to participate in both relational queries (as a row) and graph queries (as a vertex) without data duplication.

3.3 Graph-Posting List Isomorphism

We now prove that graph posting lists satisfy the same algebraic properties as document posting lists.

3.3.1 Boolean Algebra Structure

Theorem 3.1 (Graph Posting List Boolean Algebra): The set of graph posting lists 2VG2^{\mathcal{V}_G} with operations \cap, \cup, \overline{\cdot}, \emptyset, and VG\mathcal{V}_G forms a Boolean algebra isomorphic to the document posting list algebra 2D2^{\mathcal{D}}.

Proof: We verify each axiom:

Closure: For any graph posting lists P1,P22VGP_1, P_2 \in 2^{\mathcal{V}_G}:

  • P1P22VGP_1 \cap P_2 \in 2^{\mathcal{V}_G} (intersection of vertex sets is a vertex set)
  • P1P22VGP_1 \cup P_2 \in 2^{\mathcal{V}_G} (union of vertex sets is a vertex set)
  • P1=VGP12VG\overline{P_1} = \mathcal{V}_G \setminus P_1 \in 2^{\mathcal{V}_G} (complement is a vertex set)

Commutativity: Inherited from set operations.

Associativity: Inherited from set operations.

Distributivity: Inherited from set operations.

Identity: PVG=PP \cap \mathcal{V}_G = P and P=PP \cup \emptyset = P.

Complementation: PP=P \cap \overline{P} = \emptyset and PP=VGP \cup \overline{P} = \mathcal{V}_G.

The isomorphism ϕ:2D2VG\phi: 2^{\mathcal{D}} \rightarrow 2^{\mathcal{V}_G} is induced by the document-vertex correspondence:

ϕ(PD)={vVGdoc(v)PD}\phi(P_D) = \{v \in \mathcal{V}_G \mid \text{doc}(v) \in P_D\}

This ϕ\phi preserves all Boolean operations:

ϕ(P1P2)=ϕ(P1)ϕ(P2)\phi(P_1 \cap P_2) = \phi(P_1) \cap \phi(P_2) ϕ(P1P2)=ϕ(P1)ϕ(P2)\phi(P_1 \cup P_2) = \phi(P_1) \cup \phi(P_2) ϕ(P)=ϕ(P)\phi(\overline{P}) = \overline{\phi(P)}

\square

3.3.2 Preservation Under Graph Operations

Theorem 3.2 (Traversal Preserves Boolean Structure): Graph traversal operations produce posting lists that participate in Boolean algebra.

Proof: Let traversek(v)\text{traverse}_k(v) denote kk-hop traversal from vertex vv:

traversek(v)={uVGdist(v,u)k}\text{traverse}_k(v) = \{u \in \mathcal{V}_G \mid \text{dist}(v, u) \leq k\}

This is a subset of VG\mathcal{V}_G, hence an element of 2VG2^{\mathcal{V}_G}.

For combined traversals:

Conjunction: Vertices reachable from both v1v_1 and v2v_2:

traversek(v1)traversek(v2)\text{traverse}_k(v_1) \cap \text{traverse}_k(v_2)

Disjunction: Vertices reachable from either v1v_1 or v2v_2:

traversek(v1)traversek(v2)\text{traverse}_k(v_1) \cup \text{traverse}_k(v_2)

Both results are elements of 2VG2^{\mathcal{V}_G}, preserving Boolean structure. \square

3.3.3 Lattice Structure of Graph Queries

Graph queries form a complete lattice ordered by result containment:

Definition: For graph queries Q1,Q2Q_1, Q_2, define Q1Q2Q_1 \sqsubseteq Q_2 iff result(Q1)result(Q2)\text{result}(Q_1) \subseteq \text{result}(Q_2) for all graph instances.

Theorem 3.3 (Graph Query Lattice): Graph queries ordered by \sqsubseteq form a complete lattice.

The lattice operations:

  • Meet: Q1Q2Q_1 \sqcap Q_2 returns intersection of results
  • Join: Q1Q2Q_1 \sqcup Q_2 returns union of results
  • Bottom: Query returning \emptyset
  • Top: Query returning VG\mathcal{V}_G

This lattice structure enables the optimizer to navigate between equivalent graph query formulations.

3.4 Graph Algebra Operations

We define the operators that transform graph posting lists.

3.4.1 Adjacency Operator

The adjacency operator α\alpha expands a posting list to include adjacent vertices:

α:2VG×2T×{in,out,both}2VG\alpha: 2^{\mathcal{V}_G} \times 2^{\mathcal{T}} \times \{in, out, both\} \rightarrow 2^{\mathcal{V}_G}

For vertex set SS, edge types TT, and direction dd:

α(S,T,out)={uvS,tT:(v,u,t)EG}\alpha(S, T, out) = \{u \mid \exists v \in S, t \in T: (v, u, t) \in \mathcal{E}_G\} α(S,T,in)={uvS,tT:(u,v,t)EG}\alpha(S, T, in) = \{u \mid \exists v \in S, t \in T: (u, v, t) \in \mathcal{E}_G\} α(S,T,both)=α(S,T,out)α(S,T,in)\alpha(S, T, both) = \alpha(S, T, out) \cup \alpha(S, T, in)

Properties of adjacency:

Monotonicity: S1S2    α(S1,T,d)α(S2,T,d)S_1 \subseteq S_2 \implies \alpha(S_1, T, d) \subseteq \alpha(S_2, T, d)

Distributivity over union:

α(S1S2,T,d)=α(S1,T,d)α(S2,T,d)\alpha(S_1 \cup S_2, T, d) = \alpha(S_1, T, d) \cup \alpha(S_2, T, d)

Edge type union:

α(S,T1T2,d)=α(S,T1,d)α(S,T2,d)\alpha(S, T_1 \cup T_2, d) = \alpha(S, T_1, d) \cup \alpha(S, T_2, d)

3.4.2 Traversal Operator

The traversal operator τG\tau_G performs multi-hop traversal:

τG:2VG×2T×N×{in,out,both}2VG\tau_G: 2^{\mathcal{V}_G} \times 2^{\mathcal{T}} \times \mathbb{N} \times \{in, out, both\} \rightarrow 2^{\mathcal{V}_G}

Defined recursively:

τG(S,T,0,d)=S\tau_G(S, T, 0, d) = S τG(S,T,k,d)=τG(S,T,k1,d)α(τG(S,T,k1,d),T,d)\tau_G(S, T, k, d) = \tau_G(S, T, k-1, d) \cup \alpha(\tau_G(S, T, k-1, d), T, d)

This computes the kk-hop neighborhood: all vertices reachable within kk edge traversals.

Fixed-point characterization:

τG(S,T,,d)=μX.(Sα(X,T,d))\tau_G(S, T, \infty, d) = \mu X. (S \cup \alpha(X, T, d))

The infinite traversal is the least fixed point of the adjacency expansion, representing the transitive closure.

3.4.3 Pattern Matching Operator

The pattern matching operator μ\mu finds subgraph isomorphisms:

μ:GP×VG2VG\mu: \mathcal{G}_P \times \mathcal{V}_G \rightarrow 2^{\mathcal{V}_G}

where GP\mathcal{G}_P is a pattern graph. The result contains vertices that can serve as the anchor vertex in pattern matches.

Example pattern: Find vertices that are both a "person" and have an outgoing "knows" edge to another "person":

Pattern: (p1:Person)-[:KNOWS]->(p2:Person)
Anchor: p1
μ(P,v)={vVGPersonlabels(v)u:(Personlabels(u)(v,u,KNOWS)EG)}\mu(P, v) = \{v \in \mathcal{V}_G \mid \text{Person} \in \text{labels}(v) \land \exists u: (\text{Person} \in \text{labels}(u) \land (v, u, \text{KNOWS}) \in \mathcal{E}_G)\}

Pattern matching reduces to a conjunction of posting list operations:

  1. Vertex label constraints: label posting lists
  2. Edge existence constraints: adjacency posting lists
  3. Property constraints: property posting lists

3.4.4 Regular Path Query Operator

Regular Path Queries (RPQ) find paths matching a regular expression over edge types:

RPQ:VG×R2VG\text{RPQ}: \mathcal{V}_G \times \mathcal{R} \rightarrow 2^{\mathcal{V}_G}

where R\mathcal{R} is the set of regular expressions over edge types.

Grammar:

r::=tr1r2r1r2rr+r?r ::= t \mid r_1 \cdot r_2 \mid r_1 | r_2 \mid r^* \mid r^+ \mid r?

where tTt \in \mathcal{T} is an edge type.

Semantics:

Base case: Edge type

RPQ(v,t)=α({v},{t},out)\text{RPQ}(v, t) = \alpha(\{v\}, \{t\}, out)

Concatenation: Sequential traversal

RPQ(v,r1r2)=uRPQ(v,r1)RPQ(u,r2)\text{RPQ}(v, r_1 \cdot r_2) = \bigcup_{u \in \text{RPQ}(v, r_1)} \text{RPQ}(u, r_2)

Alternation: Union of paths

RPQ(v,r1r2)=RPQ(v,r1)RPQ(v,r2)\text{RPQ}(v, r_1 | r_2) = \text{RPQ}(v, r_1) \cup \text{RPQ}(v, r_2)

Kleene star: Zero or more repetitions

RPQ(v,r)=μX.({v}RPQ(X,r))\text{RPQ}(v, r^*) = \mu X. (\{v\} \cup \text{RPQ}(X, r))

Kleene plus: One or more repetitions

RPQ(v,r+)=RPQ(v,rr)\text{RPQ}(v, r^+) = \text{RPQ}(v, r \cdot r^*)

Optional: Zero or one occurrence

RPQ(v,r?)={v}RPQ(v,r)\text{RPQ}(v, r?) = \{v\} \cup \text{RPQ}(v, r)

3.4.5 Shortest Path Operator

The shortest path operator πsp\pi_{sp} finds vertices connected by minimum-length paths:

πsp:VG×VG×2TVG{null}\pi_{sp}: \mathcal{V}_G \times \mathcal{V}_G \times 2^{\mathcal{T}} \rightarrow \mathcal{V}_G^* \cup \{\text{null}\} πsp(v,u,T)=argminp:vup where edges in p have types in T\pi_{sp}(v, u, T) = \text{argmin}_{p: v \rightsquigarrow u} |p| \text{ where edges in } p \text{ have types in } T

This operator returns a path (sequence of vertices) rather than a posting list. However, the reachability check derived from it returns a Boolean:

reachable(v,u,T)=(πsp(v,u,T)null)\text{reachable}(v, u, T) = (\pi_{sp}(v, u, T) \neq \text{null})

And path existence produces a posting list:

pathExists(v,T,maxLen)={uVGπsp(v,u,T)maxLen}\text{pathExists}(v, T, \text{maxLen}) = \{u \in \mathcal{V}_G \mid |\pi_{sp}(v, u, T)| \leq \text{maxLen}\}

3.5 Operator Composition and Properties

3.5.1 Composition Monoid

Graph operators compose to form a monoid analogous to document operators:

(OpG,,idG)(\text{Op}_G, \circ, \text{id}_G)

where:

  • OpG\text{Op}_G is the set of graph operators
  • \circ is function composition
  • idG\text{id}_G is the identity on 2VG2^{\mathcal{V}_G}

Associativity: (fg)h=f(gh)(f \circ g) \circ h = f \circ (g \circ h)

Identity: fidG=idGf=ff \circ \text{id}_G = \text{id}_G \circ f = f

3.5.2 Algebraic Properties

Adjacency Idempotence (for undirected edges):

α(α(S,T,both),T,both)α(S,T,both)\alpha(\alpha(S, T, both), T, both) \supseteq \alpha(S, T, both)

Note: This is containment, not equality - repeated adjacency can reach more vertices.

Traversal Monotonicity:

k1k2    τG(S,T,k1,d)τG(S,T,k2,d)k_1 \leq k_2 \implies \tau_G(S, T, k_1, d) \subseteq \tau_G(S, T, k_2, d)

Traversal Fixed Point:

k0:kk0:τG(S,T,k,d)=τG(S,T,k0,d)\exists k_0: \forall k \geq k_0: \tau_G(S, T, k, d) = \tau_G(S, T, k_0, d)

The fixed point is reached when the traversal saturates (no new vertices discovered).

Filter-Traversal Interaction:

σϕ(τG(S,T,k,d))τG(σϕ(S),T,k,d) in general\sigma_\phi(\tau_G(S, T, k, d)) \neq \tau_G(\sigma_\phi(S), T, k, d) \text{ in general}

Filtering before traversal restricts starting vertices; filtering after traversal restricts ending vertices. These are semantically different and not interchangeable.

3.5.3 Equivalence-Preserving Transformations

Several transformations preserve query semantics:

Traversal Decomposition:

τG(S,T,k1+k2,d)=τG(τG(S,T,k1,d),T,k2,d)\tau_G(S, T, k_1 + k_2, d) = \tau_G(\tau_G(S, T, k_1, d), T, k_2, d)

Adjacency Distribution:

α(S1S2,T,d)=α(S1,T,d)α(S2,T,d)\alpha(S_1 \cup S_2, T, d) = \alpha(S_1, T, d) \cup \alpha(S_2, T, d)

Pattern to Traversal Reduction: Simple chain patterns reduce to traversal:

μ((a)[t1]>(b)[t2]>(c),a)=α(α(PV(a),{t1},out),{t2},out)PV(c)\mu((a)-[t_1]->(b)-[t_2]->(c), a) = \alpha(\alpha(P_V(a), \{t_1\}, out), \{t_2\}, out) \cap P_V(c)

where PV(x)P_V(x) is the posting list for vertex pattern xx.

3.6 Cross-Paradigm Integration

The power of the unified algebra emerges in cross-paradigm queries.

3.6.1 Graph-Relational Integration

ToGraph Operator: Convert relational results to graph vertices

ToGraph:2D2VG\text{ToGraph}: 2^{\mathcal{D}} \rightarrow 2^{\mathcal{V}_G}

Under the document-vertex correspondence, this is the identity:

ToGraph(PD)=PD (viewing documents as vertices)\text{ToGraph}(P_D) = P_D \text{ (viewing documents as vertices)}

FromGraph Operator: Convert graph results to relational rows

FromGraph:2VG2D\text{FromGraph}: 2^{\mathcal{V}_G} \rightarrow 2^{\mathcal{D}}

Also the identity under correspondence.

Example Query: Find customers who purchased products in the same category as their friends:

SELECT DISTINCT c.name, p.name AS recommended_product
FROM customers c
-- Graph traversal: find friends
JOIN LATERAL (
  SELECT friend_id FROM graph_traverse(c.id, 'FRIENDS_WITH', 1)
) f ON true
-- Friend's purchases
JOIN purchases fp ON fp.customer_id = f.friend_id
JOIN products friend_prod ON fp.product_id = friend_prod.id
-- Products in same category
JOIN products p ON p.category = friend_prod.category
-- Exclude already purchased
WHERE NOT EXISTS (
  SELECT 1 FROM purchases cp
  WHERE cp.customer_id = c.id AND cp.product_id = p.id
);

This query seamlessly combines:

  • Relational joins (purchases, products)
  • Graph traversal (friends)
  • Set operations (exclusion)

The posting list representation enables unified optimization.

3.6.2 Graph-Vector Integration

Vertex Embeddings: Vertices can have vector representations:

embedV:VG×FVn\text{embed}_V: \mathcal{V}_G \times \mathcal{F} \rightarrow \mathcal{V}_n

This enables:

Graph-aware similarity search:

simSearch(v,k,T)=kNN(embedV(v),k)τG({v},T,,both)\text{simSearch}(v, k, T) = \text{kNN}(\text{embed}_V(v), k) \cap \tau_G(\{v\}, T, \infty, both)

Find the kk nearest neighbors that are also reachable via edges of type TT.

Embedding-based edge prediction:

predictEdge(v,t,ϵ)={usim(embedV(v),embedV(u))>ϵ(v,u,t)EG}\text{predictEdge}(v, t, \epsilon) = \{u \mid \text{sim}(\text{embed}_V(v), \text{embed}_V(u)) > \epsilon \land (v, u, t) \notin \mathcal{E}_G\}

Predict new edges based on embedding similarity.

3.6.3 Graph-Text Integration

Semantic Graph Search: Combine text relevance with graph structure:

semanticGraphSearch(q,v,k,T)=topK(τt(q)τG({v},T,k,both),BM25)\text{semanticGraphSearch}(q, v, k, T) = \text{topK}(\tau_t(q) \cap \tau_G(\{v\}, T, k, both), \text{BM25})

Find documents matching text query qq that are within kk hops of vertex vv via edges of type TT.

Example: Find research papers mentioning "machine learning" by authors within 2 collaboration hops:

SELECT p.title, bm25_score(p.abstract) AS relevance
FROM papers p
WHERE MATCH(p.abstract) AGAINST ('machine learning')
  AND p.author_id IN (
    SELECT vertex_id FROM graph_traverse(:current_author, 'COAUTHORED', 2)
  )
ORDER BY relevance DESC
LIMIT 10;

3.6.4 Unified Query Plan

Loading diagram...

The query planner treats graph predicates as another source of posting lists, applying the same optimization strategies (predicate pushdown, intersection ordering by selectivity, etc.) across all paradigms.

3.7 Implementation Architecture

3.7.1 Graph Storage in LSM-Tree

Cognica stores graphs using the same LSM-tree infrastructure as documents:

Vertex Storage: Vertices store as documents with graph metadata:

Key: vertex:{graph_id}:{vertex_id}
Value: {properties, labels, ...}

Edge Storage: Edges store with composite keys for efficient traversal:

Key: edge:out:{graph_id}:{src_id}:{edge_type}:{tgt_id}
Value: {properties, ...}

Key: edge:in:{graph_id}:{tgt_id}:{edge_type}:{src_id}
Value: {} (reference only)

Dual edge storage (out and in) enables efficient traversal in both directions.

3.7.2 Adjacency Index

For each edge type, an adjacency index maps vertices to their neighbors:

Key: adj:out:{graph_id}:{edge_type}:{src_id}
Value: [tgt_id_1, tgt_id_2, ...] (posting list)

Key: adj:in:{graph_id}:{edge_type}:{tgt_id}
Value: [src_id_1, src_id_2, ...] (posting list)

This posting list structure enables:

  • Fast single-hop expansion: O(d)O(d) where dd is the vertex degree
  • Efficient intersection with other posting lists
  • Skip pointer acceleration for high-degree vertices

3.7.3 Label Index

Label indexes support vertex filtering:

Key: label:{graph_id}:{label}
Value: [vertex_id_1, vertex_id_2, ...] (posting list)

This enables efficient evaluation of label predicates.

3.7.4 Traversal Execution

Multi-hop traversal executes as iterative adjacency expansion:

Algorithm: BFS_Traversal(start, edge_types, max_depth, direction)
  frontier = {start}
  visited = {start}
  for depth in 1..max_depth:
    next_frontier = {}
    for v in frontier:
      neighbors = adjacency_lookup(v, edge_types, direction)
      for u in neighbors:
        if u not in visited:
          visited.add(u)
          next_frontier.add(u)
    frontier = next_frontier
    if frontier is empty:
      break
  return visited

Optimization: For intersection with other posting lists, early termination prunes branches:

Algorithm: Filtered_Traversal(start, edge_types, max_depth, direction, filter_posting)
  frontier = {start} intersect filter_posting
  visited = frontier
  for depth in 1..max_depth:
    next_frontier = {}
    for v in frontier:
      neighbors = adjacency_lookup(v, edge_types, direction)
      // Early filter application
      filtered_neighbors = neighbors intersect filter_posting
      for u in filtered_neighbors:
        if u not in visited:
          visited.add(u)
          next_frontier.add(u)
    frontier = next_frontier
    if frontier is empty:
      break
  return visited

3.7.5 Pattern Match Execution

Pattern matching compiles to a join tree of posting list operations:

Example Pattern:

(a:Person {age > 30})-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company {name = 'Acme'})

Execution Plan:

  1. PaP_a = label_index(Person) \cap property_filter(age > 30)
  2. PcP_c = label_index(Company) \cap property_filter(name = 'Acme')
  3. PbP_b = adj_in(c, WORKS_AT) \cap label_index(Person)
  4. Pa,finalP_{a,final} = adj_in(b, KNOWS) \cap PaP_a
  5. Return Pa,finalP_{a,final}

The optimizer reorders these operations based on selectivity estimates.

3.7.6 SQL Graph Query Interface

Cognica exposes graph operations through SQL table functions, enabling graph traversal within standard SQL queries:

graph_traverse() — Multi-hop traversal as a table function:

-- Find all users reachable within 3 hops via FOLLOWS edges
SELECT t.vertex_id, t.depth, t.path
FROM graph_traverse(
    'social_graph',        -- graph name
    'user_123',            -- start vertex
    'FOLLOWS',             -- edge type
    3,                     -- max depth
    'outgoing'             -- direction
) t;

-- Combine with relational predicates
SELECT u.name, t.depth
FROM graph_traverse('social_graph', 'user_123', 'FOLLOWS', 2, 'outgoing') t
JOIN users u ON u.id = t.vertex_id
WHERE u.active = true;

Recursive CTE Integration — Standard SQL recursive queries can express path-finding:

-- Shortest path via recursive CTE
WITH RECURSIVE paths AS (
    SELECT id AS vertex_id, ARRAY[id] AS path, 0 AS depth
    FROM vertices WHERE id = 'user_123'
  UNION ALL
    SELECT e.target_id, p.path || e.target_id, p.depth + 1
    FROM paths p
    JOIN edges e ON e.source_id = p.vertex_id AND e.type = 'FOLLOWS'
    WHERE p.depth < 5 AND NOT e.target_id = ANY(p.path)
)
SELECT * FROM paths WHERE vertex_id = 'user_456'
ORDER BY depth LIMIT 1;

Cypher Query Support — Optional Cypher syntax via the cypher() table function, compatible with Apache AGE:

-- Cypher via table function
SELECT * FROM cypher('social_graph', $$
    MATCH (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
    RETURN b.name, b.age
$$) AS (name TEXT, age INTEGER);

Adjacency Cache — An in-memory cache accelerates repeated traversals by caching adjacency lists for frequently accessed vertices, reducing RocksDB lookups during iterative graph algorithms.

3.8 Scored Graph Operations

Graph operations can carry scores for ranking.

3.8.1 Edge Weight Scores

Edges may have weights representing relationship strength:

w:EGR+w: \mathcal{E}_G \rightarrow \mathbb{R}^+

Weighted adjacency returns scored posting lists:

αw(S,T,d)={(u,w(e))uα(S,T,d),e connects S to u}\alpha_w(S, T, d) = \{(u, w(e)) \mid u \in \alpha(S, T, d), e \text{ connects } S \text{ to } u\}

3.8.2 Path Scores

Paths aggregate edge weights:

Additive path score:

score(p)=epw(e)\text{score}(p) = \sum_{e \in p} w(e)

Multiplicative path score (for probability):

score(p)=epw(e)\text{score}(p) = \prod_{e \in p} w(e)

Min path score (for bottleneck):

score(p)=minepw(e)\text{score}(p) = \min_{e \in p} w(e)

3.8.3 PageRank and Centrality

PageRank assigns importance scores to vertices:

PR(v)=1dN+duin(v)PR(u)out(u)\text{PR}(v) = \frac{1 - d}{N} + d \sum_{u \in \text{in}(v)} \frac{\text{PR}(u)}{|\text{out}(u)|}

where dd is the damping factor (typically 0.85) and NN is the vertex count.

PageRank precomputes as a vertex property, enabling score-aware queries:

SELECT v.name, v.pagerank
FROM vertices v
WHERE v.id IN (
  SELECT vertex_id FROM graph_traverse(:start, 'LINKS_TO', 3)
)
ORDER BY v.pagerank DESC
LIMIT 10;

3.8.4 Score Combination with Other Paradigms

Graph scores combine with text and vector scores using the frameworks from Chapter 2:

Example: Hybrid ranking combining BM25 and PageRank:

score(d)=αnormalize(BM25(d,q))+(1α)PR(d)\text{score}(d) = \alpha \cdot \text{normalize}(\text{BM25}(d, q)) + (1 - \alpha) \cdot \text{PR}(d)

Or probabilistically:

score(d)=PBM25(d)PPR(d)\text{score}(d) = P_{\text{BM25}}(d) \cdot P_{\text{PR}}(d)

where scores are calibrated to [0,1][0, 1].

3.9 Query Optimization for Graph Operations

3.9.1 Selectivity Estimation

Graph operation selectivity depends on graph structure:

Adjacency selectivity:

sel(α(S,T,d))SavgDegree(T,d)/VG\text{sel}(\alpha(S, T, d)) \approx |S| \cdot \text{avgDegree}(T, d) / |\mathcal{V}_G|

Traversal selectivity (k hops):

sel(τG(S,T,k,d))min(1,SavgDegree(T,d)k/VG)\text{sel}(\tau_G(S, T, k, d)) \approx \min(1, |S| \cdot \text{avgDegree}(T, d)^k / |\mathcal{V}_G|)

Pattern selectivity: Product of component selectivities:

sel(μ(P))cconstraints(P)sel(c)\text{sel}(\mu(P)) \approx \prod_{c \in \text{constraints}(P)} \text{sel}(c)

3.9.2 Join Ordering

Pattern matching is essentially a join problem. The optimizer orders pattern components by selectivity:

  1. Start with most selective constraint (smallest posting list)
  2. Expand through adjacency with next most selective constraint
  3. Continue until pattern is matched

Example: For pattern (a:Person)-[:KNOWS]->(b:Influencer)-[:PROMOTES]->(c:Product):

If Influencer is rare (high selectivity), start from b:

  1. PbP_b = label_index(Influencer)
  2. PaP_a = adj_in(b, KNOWS) \cap label_index(Person)
  3. PcP_c = adj_out(b, PROMOTES) \cap label_index(Product)

3.9.3 Traversal Pruning

Early termination strategies for traversal:

Top-k pruning: When seeking top-k results by score, maintain a threshold and prune branches that cannot exceed it.

Filter pushdown: Apply filters at each traversal step rather than at the end.

Bidirectional search: For point-to-point queries, search from both ends and meet in the middle.

3.10 Summary

This chapter extended the unified algebra to incorporate graph structures:

  1. Graph type system formalizes vertices, edges, properties, and labels within the same framework as documents, with explicit document-vertex correspondence enabling zero-duplication storage

  2. Graph-posting list isomorphism proves that graph operations produce posting lists satisfying the same Boolean algebra as document posting lists, enabling unified optimization

  3. Graph algebra operators include adjacency (α\alpha), traversal (τG\tau_G), pattern matching (μ\mu), regular path queries (RPQ), and shortest paths (πsp\pi_{sp}), all composing through the same monoid structure as document operators

  4. Cross-paradigm integration enables queries combining relational predicates, text search, vector similarity, and graph traversal through unified posting list operations

  5. Implementation architecture stores graphs in LSM-trees with dual-direction edge indexes and label indexes, supporting efficient traversal and pattern matching

  6. Scored graph operations extend the algebra to handle edge weights, path scores, and centrality measures, combining with text and vector scores through the same frameworks

The following chapter develops query optimization theory, showing how the algebraic properties established in Chapters 2 and 3 enable cost-based optimization across all paradigms.

Copyright (c) 2023-2026 Cognica, Inc.