Chapter 3: Extending the Algebra to Graph Structures

This chapter extends the unified mathematical framework to incorporate graph operations while preserving the algebraic properties established in Chapter 2. We demonstrate that graph posting lists form isomorphic structures to document posting lists, enabling graph traversal, pattern matching, and cross-paradigm queries within the same algebraic framework.

3.1 Motivation for Graph Integration

Graph databases have emerged as essential tools for modeling relationships: social networks, knowledge graphs, fraud detection networks, recommendation systems, and supply chain dependencies. Yet traditional graph databases operate as isolated systems, requiring separate data synchronization and offering no transactional consistency with relational or search workloads.

3.1.1 The Relationship Modeling Challenge

Consider a product recommendation system that must:

Find products matching user search terms (full-text search)
Filter by inventory and pricing constraints (relational predicates)
Rank by embedding similarity to user preferences (vector search)
Traverse purchase history and social connections (graph traversal)
Return results with explanation paths (graph + relational join)

In a polyglot architecture, the graph component requires:

Duplicating entity data from the relational store
Maintaining referential integrity across systems
Orchestrating cross-system queries with no shared transaction
Merging results with incompatible data models

3.1.2 Graph-Posting List Unification

The key insight enabling graph integration is that graph operations also produce sets of identifiers:

Traversal: Starting from vertex $v$ , find all vertices reachable in $k$ hops

\text{traverse}(v, k) = \{u \in V \mid \text{dist}(v, u) \leq k\}

Pattern Match: Find all vertices matching a structural pattern

\text{match}(G, P) = \{v \in V \mid v \text{ participates in pattern } P\}

Path Query: Find vertices connected by paths matching a regular expression

\text{RPQ}(v, r) = \{u \in V \mid \exists \text{ path } v \xrightarrow{r} u\}

Each operation returns a set - a posting list of vertex identifiers. These posting lists combine with document posting lists through the same Boolean operations, enabling unified optimization across paradigms.

3.2 Graph Type System

We formalize the graph type system as an extension of the document type system from Chapter 2.

3.2.1 Property Graph Model

A property graph is a tuple $G = (V, E, \rho, \lambda, \sigma)$ where:

$V$ is the set of vertices (nodes)
$E \subseteq V \times V$ is the set of directed edges
$\rho: V \cup E \rightarrow 2^{\mathcal{F} \times \mathcal{V}}$ assigns properties (key-value pairs)
$\lambda: V \cup E \rightarrow 2^L$ assigns labels from label set $L$
$\sigma: E \rightarrow \mathcal{T}$ assigns a type to each edge

Vertices and edges both carry properties and labels, making them semi-structured entities similar to documents.

3.2.2 Vertex Space

The vertex space $\mathcal{V}_G$ contains all vertices:

\mathcal{V}_G = \{v \mid v \in V \text{ for some property graph } G\}

Each vertex has:

A unique identifier: $\text{id}: \mathcal{V}_G \rightarrow \mathbb{N}$
Properties: $\text{props}: \mathcal{V}_G \rightarrow 2^{\mathcal{F} \times \mathcal{V}}$
Labels: $\text{labels}: \mathcal{V}_G \rightarrow 2^L$

3.2.3 Edge Space

The edge space $\mathcal{E}_G$ contains all edges:

\mathcal{E}_G = \{(u, v, t) \mid (u, v) \in E, t = \sigma(u, v)\}

Each edge has:

A unique identifier: $\text{id}: \mathcal{E}_G \rightarrow \mathbb{N}$
Source vertex: $\text{src}: \mathcal{E}_G \rightarrow \mathcal{V}_G$
Target vertex: $\text{tgt}: \mathcal{E}_G \rightarrow \mathcal{V}_G$
Edge type: $\text{type}: \mathcal{E}_G \rightarrow \mathcal{T}$
Properties: $\text{props}: \mathcal{E}_G \rightarrow 2^{\mathcal{F} \times \mathcal{V}}$

3.2.4 Graph Posting Lists

A graph posting list maps a predicate to a set of vertices or edges:

Vertex posting list:

P_V: \text{Pred}_V \rightarrow 2^{\mathcal{V}_G}

Edge posting list:

P_E: \text{Pred}_E \rightarrow 2^{\mathcal{E}_G}

Examples of graph predicates:

Label predicate: Vertices with label $\ell$

P_V(\text{hasLabel}(\ell)) = \{v \in \mathcal{V}_G \mid \ell \in \text{labels}(v)\}

Property predicate: Vertices where property $p$ satisfies condition $\phi$

P_V(\text{prop}(p, \phi)) = \{v \in \mathcal{V}_G \mid \phi(\text{props}(v)(p))\}

Edge type predicate: Edges of type $t$

P_E(\text{hasType}(t)) = \{e \in \mathcal{E}_G \mid \text{type}(e) = t\}

Adjacency predicate: Vertices adjacent to vertex $v$ via edge type $t$

P_V(\text{adj}(v, t)) = \{u \in \mathcal{V}_G \mid (v, u, t) \in \mathcal{E}_G \lor (u, v, t) \in \mathcal{E}_G\}

3.2.5 Document-Vertex Correspondence

In Cognica, vertices correspond to documents:

\mathcal{V}_G \cong \mathcal{D}

This isomorphism maps:

Vertex ID to document ID
Vertex properties to document fields
Vertex labels to document type tags

The correspondence enables a document to participate in both relational queries (as a row) and graph queries (as a vertex) without data duplication.

3.3 Graph-Posting List Isomorphism

We now prove that graph posting lists satisfy the same algebraic properties as document posting lists.

3.3.1 Boolean Algebra Structure

Theorem 3.1 (Graph Posting List Boolean Algebra): The set of graph posting lists $2^{\mathcal{V}_G}$ with operations $\cap$ , $\cup$ , $\overline{\cdot}$ , $\emptyset$ , and $\mathcal{V}_G$ forms a Boolean algebra isomorphic to the document posting list algebra $2^{\mathcal{D}}$ .

Proof: We verify each axiom:

Closure: For any graph posting lists $P_1, P_2 \in 2^{\mathcal{V}_G}$ :

$P_1 \cap P_2 \in 2^{\mathcal{V}_G}$ (intersection of vertex sets is a vertex set)
$P_1 \cup P_2 \in 2^{\mathcal{V}_G}$ (union of vertex sets is a vertex set)
$\overline{P_1} = \mathcal{V}_G \setminus P_1 \in 2^{\mathcal{V}_G}$ (complement is a vertex set)

Commutativity: Inherited from set operations.

Associativity: Inherited from set operations.

Distributivity: Inherited from set operations.

Identity: $P \cap \mathcal{V}_G = P$ and $P \cup \emptyset = P$ .

Complementation: $P \cap \overline{P} = \emptyset$ and $P \cup \overline{P} = \mathcal{V}_G$ .

The isomorphism $\phi: 2^{\mathcal{D}} \rightarrow 2^{\mathcal{V}_G}$ is induced by the document-vertex correspondence:

\phi(P_D) = \{v \in \mathcal{V}_G \mid \text{doc}(v) \in P_D\}

This $\phi$ preserves all Boolean operations:

\phi(P_1 \cap P_2) = \phi(P_1) \cap \phi(P_2)

\phi(P_1 \cup P_2) = \phi(P_1) \cup \phi(P_2)

\phi(\overline{P}) = \overline{\phi(P)}

$\square$

3.3.2 Preservation Under Graph Operations

Theorem 3.2 (Traversal Preserves Boolean Structure): Graph traversal operations produce posting lists that participate in Boolean algebra.

Proof: Let $\text{traverse}_k(v)$ denote $k$ -hop traversal from vertex $v$ :

\text{traverse}_k(v) = \{u \in \mathcal{V}_G \mid \text{dist}(v, u) \leq k\}

This is a subset of $\mathcal{V}_G$ , hence an element of $2^{\mathcal{V}_G}$ .

For combined traversals:

Conjunction: Vertices reachable from both $v_1$ and $v_2$ :

\text{traverse}_k(v_1) \cap \text{traverse}_k(v_2)

Disjunction: Vertices reachable from either $v_1$ or $v_2$ :

\text{traverse}_k(v_1) \cup \text{traverse}_k(v_2)

Both results are elements of $2^{\mathcal{V}_G}$ , preserving Boolean structure. $\square$

3.3.3 Lattice Structure of Graph Queries

Graph queries form a complete lattice ordered by result containment:

Definition: For graph queries $Q_1, Q_2$ , define $Q_1 \sqsubseteq Q_2$ iff $\text{result}(Q_1) \subseteq \text{result}(Q_2)$ for all graph instances.

Theorem 3.3 (Graph Query Lattice): Graph queries ordered by $\sqsubseteq$ form a complete lattice.

The lattice operations:

Meet: $Q_1 \sqcap Q_2$ returns intersection of results
Join: $Q_1 \sqcup Q_2$ returns union of results
Bottom: Query returning $\emptyset$
Top: Query returning $\mathcal{V}_G$

This lattice structure enables the optimizer to navigate between equivalent graph query formulations.

3.4 Graph Algebra Operations

We define the operators that transform graph posting lists.

3.4.1 Adjacency Operator

The adjacency operator $\alpha$ expands a posting list to include adjacent vertices:

\alpha: 2^{\mathcal{V}_G} \times 2^{\mathcal{T}} \times \{in, out, both\} \rightarrow 2^{\mathcal{V}_G}

For vertex set $S$ , edge types $T$ , and direction $d$ :

\alpha(S, T, out) = \{u \mid \exists v \in S, t \in T: (v, u, t) \in \mathcal{E}_G\}

\alpha(S, T, in) = \{u \mid \exists v \in S, t \in T: (u, v, t) \in \mathcal{E}_G\}

\alpha(S, T, both) = \alpha(S, T, out) \cup \alpha(S, T, in)

Properties of adjacency:

Monotonicity: $S_1 \subseteq S_2 \implies \alpha(S_1, T, d) \subseteq \alpha(S_2, T, d)$

Distributivity over union:

\alpha(S_1 \cup S_2, T, d) = \alpha(S_1, T, d) \cup \alpha(S_2, T, d)

Edge type union:

\alpha(S, T_1 \cup T_2, d) = \alpha(S, T_1, d) \cup \alpha(S, T_2, d)

3.4.2 Traversal Operator

The traversal operator $\tau_G$ performs multi-hop traversal:

\tau_G: 2^{\mathcal{V}_G} \times 2^{\mathcal{T}} \times \mathbb{N} \times \{in, out, both\} \rightarrow 2^{\mathcal{V}_G}

Defined recursively:

\tau_G(S, T, 0, d) = S

\tau_G(S, T, k, d) = \tau_G(S, T, k-1, d) \cup \alpha(\tau_G(S, T, k-1, d), T, d)

This computes the $k$ -hop neighborhood: all vertices reachable within $k$ edge traversals.

Fixed-point characterization:

\tau_G(S, T, \infty, d) = \mu X. (S \cup \alpha(X, T, d))

The infinite traversal is the least fixed point of the adjacency expansion, representing the transitive closure.

3.4.3 Pattern Matching Operator

The pattern matching operator $\mu$ finds subgraph isomorphisms:

\mu: \mathcal{G}_P \times \mathcal{V}_G \rightarrow 2^{\mathcal{V}_G}

where $\mathcal{G}_P$ is a pattern graph. The result contains vertices that can serve as the anchor vertex in pattern matches.

Example pattern: Find vertices that are both a "person" and have an outgoing "knows" edge to another "person":

Pattern: (p1:Person)-[:KNOWS]->(p2:Person)
Anchor: p1

\mu(P, v) = \{v \in \mathcal{V}_G \mid \text{Person} \in \text{labels}(v) \land \exists u: (\text{Person} \in \text{labels}(u) \land (v, u, \text{KNOWS}) \in \mathcal{E}_G)\}

Pattern matching reduces to a conjunction of posting list operations:

Vertex label constraints: label posting lists
Edge existence constraints: adjacency posting lists
Property constraints: property posting lists

3.4.4 Regular Path Query Operator

Regular Path Queries (RPQ) find paths matching a regular expression over edge types:

\text{RPQ}: \mathcal{V}_G \times \mathcal{R} \rightarrow 2^{\mathcal{V}_G}

where $\mathcal{R}$ is the set of regular expressions over edge types.

Grammar:

r ::= t \mid r_1 \cdot r_2 \mid r_1 | r_2 \mid r^* \mid r^+ \mid r?

where $t \in \mathcal{T}$ is an edge type.

Semantics:

Base case: Edge type

\text{RPQ}(v, t) = \alpha(\{v\}, \{t\}, out)

Concatenation: Sequential traversal

\text{RPQ}(v, r_1 \cdot r_2) = \bigcup_{u \in \text{RPQ}(v, r_1)} \text{RPQ}(u, r_2)

Alternation: Union of paths

\text{RPQ}(v, r_1 | r_2) = \text{RPQ}(v, r_1) \cup \text{RPQ}(v, r_2)

Kleene star: Zero or more repetitions

\text{RPQ}(v, r^*) = \mu X. (\{v\} \cup \text{RPQ}(X, r))

Kleene plus: One or more repetitions

\text{RPQ}(v, r^+) = \text{RPQ}(v, r \cdot r^*)

Optional: Zero or one occurrence

\text{RPQ}(v, r?) = \{v\} \cup \text{RPQ}(v, r)

3.4.5 Shortest Path Operator

The shortest path operator $\pi_{sp}$ finds vertices connected by minimum-length paths:

\pi_{sp}: \mathcal{V}_G \times \mathcal{V}_G \times 2^{\mathcal{T}} \rightarrow \mathcal{V}_G^* \cup \{\text{null}\}

\pi_{sp}(v, u, T) = \text{argmin}_{p: v \rightsquigarrow u} |p| \text{ where edges in } p \text{ have types in } T

This operator returns a path (sequence of vertices) rather than a posting list. However, the reachability check derived from it returns a Boolean:

\text{reachable}(v, u, T) = (\pi_{sp}(v, u, T) \neq \text{null})

And path existence produces a posting list:

\text{pathExists}(v, T, \text{maxLen}) = \{u \in \mathcal{V}_G \mid |\pi_{sp}(v, u, T)| \leq \text{maxLen}\}

3.5 Operator Composition and Properties

3.5.1 Composition Monoid

Graph operators compose to form a monoid analogous to document operators:

(\text{Op}_G, \circ, \text{id}_G)

where:

$\text{Op}_G$ is the set of graph operators
$\circ$ is function composition
$\text{id}_G$ is the identity on $2^{\mathcal{V}_G}$

Associativity: $(f \circ g) \circ h = f \circ (g \circ h)$

Identity: $f \circ \text{id}_G = \text{id}_G \circ f = f$

3.5.2 Algebraic Properties

Adjacency Idempotence (for undirected edges):

\alpha(\alpha(S, T, both), T, both) \supseteq \alpha(S, T, both)

Note: This is containment, not equality - repeated adjacency can reach more vertices.

Traversal Monotonicity:

k_1 \leq k_2 \implies \tau_G(S, T, k_1, d) \subseteq \tau_G(S, T, k_2, d)

Traversal Fixed Point:

\exists k_0: \forall k \geq k_0: \tau_G(S, T, k, d) = \tau_G(S, T, k_0, d)

The fixed point is reached when the traversal saturates (no new vertices discovered).

Filter-Traversal Interaction:

\sigma_\phi(\tau_G(S, T, k, d)) \neq \tau_G(\sigma_\phi(S), T, k, d) \text{ in general}

Filtering before traversal restricts starting vertices; filtering after traversal restricts ending vertices. These are semantically different and not interchangeable.

3.5.3 Equivalence-Preserving Transformations

Several transformations preserve query semantics:

Traversal Decomposition:

\tau_G(S, T, k_1 + k_2, d) = \tau_G(\tau_G(S, T, k_1, d), T, k_2, d)

Adjacency Distribution:

\alpha(S_1 \cup S_2, T, d) = \alpha(S_1, T, d) \cup \alpha(S_2, T, d)

Pattern to Traversal Reduction: Simple chain patterns reduce to traversal:

\mu((a)-[t_1]->(b)-[t_2]->(c), a) = \alpha(\alpha(P_V(a), \{t_1\}, out), \{t_2\}, out) \cap P_V(c)

where $P_V(x)$ is the posting list for vertex pattern $x$ .

3.6 Cross-Paradigm Integration

The power of the unified algebra emerges in cross-paradigm queries.

3.6.1 Graph-Relational Integration

ToGraph Operator: Convert relational results to graph vertices

\text{ToGraph}: 2^{\mathcal{D}} \rightarrow 2^{\mathcal{V}_G}

Under the document-vertex correspondence, this is the identity:

\text{ToGraph}(P_D) = P_D \text{ (viewing documents as vertices)}

FromGraph Operator: Convert graph results to relational rows

\text{FromGraph}: 2^{\mathcal{V}_G} \rightarrow 2^{\mathcal{D}}

Also the identity under correspondence.

Example Query: Find customers who purchased products in the same category as their friends:

SELECT DISTINCT c.name, p.name AS recommended_product
FROM customers c
-- Graph traversal: find friends
JOIN LATERAL (
  SELECT friend_id FROM graph_traverse(c.id, 'FRIENDS_WITH', 1)
) f ON true
-- Friend's purchases
JOIN purchases fp ON fp.customer_id = f.friend_id
JOIN products friend_prod ON fp.product_id = friend_prod.id
-- Products in same category
JOIN products p ON p.category = friend_prod.category
-- Exclude already purchased
WHERE NOT EXISTS (
  SELECT 1 FROM purchases cp
  WHERE cp.customer_id = c.id AND cp.product_id = p.id
);

This query seamlessly combines:

Relational joins (purchases, products)
Graph traversal (friends)
Set operations (exclusion)

The posting list representation enables unified optimization.

3.6.2 Graph-Vector Integration

Vertex Embeddings: Vertices can have vector representations:

\text{embed}_V: \mathcal{V}_G \times \mathcal{F} \rightarrow \mathcal{V}_n

This enables:

Graph-aware similarity search:

\text{simSearch}(v, k, T) = \text{kNN}(\text{embed}_V(v), k) \cap \tau_G(\{v\}, T, \infty, both)

Find the $k$ nearest neighbors that are also reachable via edges of type $T$ .

Embedding-based edge prediction:

\text{predictEdge}(v, t, \epsilon) = \{u \mid \text{sim}(\text{embed}_V(v), \text{embed}_V(u)) > \epsilon \land (v, u, t) \notin \mathcal{E}_G\}

Predict new edges based on embedding similarity.

3.6.3 Graph-Text Integration

Semantic Graph Search: Combine text relevance with graph structure:

\text{semanticGraphSearch}(q, v, k, T) = \text{topK}(\tau_t(q) \cap \tau_G(\{v\}, T, k, both), \text{BM25})

Find documents matching text query $q$ that are within $k$ hops of vertex $v$ via edges of type $T$ .

Example: Find research papers mentioning "machine learning" by authors within 2 collaboration hops:

SELECT p.title, bm25_score(p.abstract) AS relevance
FROM papers p
WHERE MATCH(p.abstract) AGAINST ('machine learning')
  AND p.author_id IN (
    SELECT vertex_id FROM graph_traverse(:current_author, 'COAUTHORED', 2)
  )
ORDER BY relevance DESC
LIMIT 10;

3.6.4 Unified Query Plan

Loading diagram...

The query planner treats graph predicates as another source of posting lists, applying the same optimization strategies (predicate pushdown, intersection ordering by selectivity, etc.) across all paradigms.

3.7 Implementation Architecture

3.7.1 Graph Storage in LSM-Tree

Cognica stores graphs using the same LSM-tree infrastructure as documents:

Vertex Storage: Vertices store as documents with graph metadata:

Key: vertex:{graph_id}:{vertex_id}
Value: {properties, labels, ...}

Edge Storage: Edges store with composite keys for efficient traversal:

Key: edge:out:{graph_id}:{src_id}:{edge_type}:{tgt_id}
Value: {properties, ...}

Key: edge:in:{graph_id}:{tgt_id}:{edge_type}:{src_id}
Value: {} (reference only)

Dual edge storage (out and in) enables efficient traversal in both directions.

3.7.2 Adjacency Index

For each edge type, an adjacency index maps vertices to their neighbors:

Key: adj:out:{graph_id}:{edge_type}:{src_id}
Value: [tgt_id_1, tgt_id_2, ...] (posting list)

Key: adj:in:{graph_id}:{edge_type}:{tgt_id}
Value: [src_id_1, src_id_2, ...] (posting list)

This posting list structure enables:

Fast single-hop expansion: $O(d)$ where $d$ is the vertex degree
Efficient intersection with other posting lists
Skip pointer acceleration for high-degree vertices

3.7.3 Label Index

Label indexes support vertex filtering:

Key: label:{graph_id}:{label}
Value: [vertex_id_1, vertex_id_2, ...] (posting list)

This enables efficient evaluation of label predicates.

3.7.4 Traversal Execution

Multi-hop traversal executes as iterative adjacency expansion:

Algorithm: BFS_Traversal(start, edge_types, max_depth, direction)
  frontier = {start}
  visited = {start}
  for depth in 1..max_depth:
    next_frontier = {}
    for v in frontier:
      neighbors = adjacency_lookup(v, edge_types, direction)
      for u in neighbors:
        if u not in visited:
          visited.add(u)
          next_frontier.add(u)
    frontier = next_frontier
    if frontier is empty:
      break
  return visited

Optimization: For intersection with other posting lists, early termination prunes branches:

Algorithm: Filtered_Traversal(start, edge_types, max_depth, direction, filter_posting)
  frontier = {start} intersect filter_posting
  visited = frontier
  for depth in 1..max_depth:
    next_frontier = {}
    for v in frontier:
      neighbors = adjacency_lookup(v, edge_types, direction)
      // Early filter application
      filtered_neighbors = neighbors intersect filter_posting
      for u in filtered_neighbors:
        if u not in visited:
          visited.add(u)
          next_frontier.add(u)
    frontier = next_frontier
    if frontier is empty:
      break
  return visited

3.7.5 Pattern Match Execution

Pattern matching compiles to a join tree of posting list operations:

Example Pattern:

(a:Person {age > 30})-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company {name = 'Acme'})

Execution Plan:

$P_a$ = label_index(Person) $\cap$ property_filter(age > 30)
$P_c$ = label_index(Company) $\cap$ property_filter(name = 'Acme')
$P_b$ = adj_in(c, WORKS_AT) $\cap$ label_index(Person)
$P_{a,final}$ = adj_in(b, KNOWS) $\cap$ $P_a$
Return $P_{a,final}$

The optimizer reorders these operations based on selectivity estimates.

3.7.6 SQL Graph Query Interface

Cognica exposes graph operations through SQL table functions, enabling graph traversal within standard SQL queries:

graph_traverse() — Multi-hop traversal as a table function:

-- Find all users reachable within 3 hops via FOLLOWS edges
SELECT t.vertex_id, t.depth, t.path
FROM graph_traverse(
    'social_graph',        -- graph name
    'user_123',            -- start vertex
    'FOLLOWS',             -- edge type
    3,                     -- max depth
    'outgoing'             -- direction
) t;

-- Combine with relational predicates
SELECT u.name, t.depth
FROM graph_traverse('social_graph', 'user_123', 'FOLLOWS', 2, 'outgoing') t
JOIN users u ON u.id = t.vertex_id
WHERE u.active = true;

Recursive CTE Integration — Standard SQL recursive queries can express path-finding:

-- Shortest path via recursive CTE
WITH RECURSIVE paths AS (
    SELECT id AS vertex_id, ARRAY[id] AS path, 0 AS depth
    FROM vertices WHERE id = 'user_123'
  UNION ALL
    SELECT e.target_id, p.path || e.target_id, p.depth + 1
    FROM paths p
    JOIN edges e ON e.source_id = p.vertex_id AND e.type = 'FOLLOWS'
    WHERE p.depth < 5 AND NOT e.target_id = ANY(p.path)
)
SELECT * FROM paths WHERE vertex_id = 'user_456'
ORDER BY depth LIMIT 1;

Cypher Query Support — Optional Cypher syntax via the cypher() table function, compatible with Apache AGE:

-- Cypher via table function
SELECT * FROM cypher('social_graph', $$
    MATCH (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
    RETURN b.name, b.age
$$) AS (name TEXT, age INTEGER);

Adjacency Cache — An in-memory cache accelerates repeated traversals by caching adjacency lists for frequently accessed vertices, reducing RocksDB lookups during iterative graph algorithms.

3.8 Scored Graph Operations

Graph operations can carry scores for ranking.

3.8.1 Edge Weight Scores

Edges may have weights representing relationship strength:

w: \mathcal{E}_G \rightarrow \mathbb{R}^+

Weighted adjacency returns scored posting lists:

\alpha_w(S, T, d) = \{(u, w(e)) \mid u \in \alpha(S, T, d), e \text{ connects } S \text{ to } u\}

3.8.2 Path Scores

Paths aggregate edge weights:

Additive path score:

\text{score}(p) = \sum_{e \in p} w(e)

Multiplicative path score (for probability):

\text{score}(p) = \prod_{e \in p} w(e)

Min path score (for bottleneck):

\text{score}(p) = \min_{e \in p} w(e)

3.8.3 PageRank and Centrality

PageRank assigns importance scores to vertices:

\text{PR}(v) = \frac{1 - d}{N} + d \sum_{u \in \text{in}(v)} \frac{\text{PR}(u)}{|\text{out}(u)|}

where $d$ is the damping factor (typically 0.85) and $N$ is the vertex count.

PageRank precomputes as a vertex property, enabling score-aware queries:

SELECT v.name, v.pagerank
FROM vertices v
WHERE v.id IN (
  SELECT vertex_id FROM graph_traverse(:start, 'LINKS_TO', 3)
)
ORDER BY v.pagerank DESC
LIMIT 10;

3.8.4 Score Combination with Other Paradigms

Graph scores combine with text and vector scores using the frameworks from Chapter 2:

Example: Hybrid ranking combining BM25 and PageRank:

\text{score}(d) = \alpha \cdot \text{normalize}(\text{BM25}(d, q)) + (1 - \alpha) \cdot \text{PR}(d)

Or probabilistically:

\text{score}(d) = P_{\text{BM25}}(d) \cdot P_{\text{PR}}(d)

where scores are calibrated to $[0, 1]$ .

3.9 Query Optimization for Graph Operations

3.9.1 Selectivity Estimation

Graph operation selectivity depends on graph structure:

Adjacency selectivity:

\text{sel}(\alpha(S, T, d)) \approx |S| \cdot \text{avgDegree}(T, d) / |\mathcal{V}_G|

Traversal selectivity (k hops):

\text{sel}(\tau_G(S, T, k, d)) \approx \min(1, |S| \cdot \text{avgDegree}(T, d)^k / |\mathcal{V}_G|)

Pattern selectivity: Product of component selectivities:

\text{sel}(\mu(P)) \approx \prod_{c \in \text{constraints}(P)} \text{sel}(c)

3.9.2 Join Ordering

Pattern matching is essentially a join problem. The optimizer orders pattern components by selectivity:

Start with most selective constraint (smallest posting list)
Expand through adjacency with next most selective constraint
Continue until pattern is matched

Example: For pattern (a:Person)-[:KNOWS]->(b:Influencer)-[:PROMOTES]->(c:Product):

If Influencer is rare (high selectivity), start from b:

$P_b$ = label_index(Influencer)
$P_a$ = adj_in(b, KNOWS) $\cap$ label_index(Person)
$P_c$ = adj_out(b, PROMOTES) $\cap$ label_index(Product)

3.9.3 Traversal Pruning

Early termination strategies for traversal:

Top-k pruning: When seeking top-k results by score, maintain a threshold and prune branches that cannot exceed it.

Filter pushdown: Apply filters at each traversal step rather than at the end.

Bidirectional search: For point-to-point queries, search from both ends and meet in the middle.

3.10 Summary

This chapter extended the unified algebra to incorporate graph structures:

Graph type system formalizes vertices, edges, properties, and labels within the same framework as documents, with explicit document-vertex correspondence enabling zero-duplication storage
Graph-posting list isomorphism proves that graph operations produce posting lists satisfying the same Boolean algebra as document posting lists, enabling unified optimization
Graph algebra operators include adjacency ( $\alpha$ ), traversal ( $\tau_G$ ), pattern matching ( $\mu$ ), regular path queries (RPQ), and shortest paths ( $\pi_{sp}$ ), all composing through the same monoid structure as document operators
Cross-paradigm integration enables queries combining relational predicates, text search, vector similarity, and graph traversal through unified posting list operations
Implementation architecture stores graphs in LSM-trees with dual-direction edge indexes and label indexes, supporting efficient traversal and pattern matching
Scored graph operations extend the algebra to handle edge weights, path scores, and centrality measures, combining with text and vector scores through the same frameworks

The following chapter develops query optimization theory, showing how the algebraic properties established in Chapters 2 and 3 enable cost-based optimization across all paradigms.