Chapter 12: Cypher Query Language

Graph databases have long suffered from a fragmentation problem: users must choose between the expressiveness of graph query languages and the ubiquity of SQL. Cognica resolves this tension through an approach inspired by Apache AGE — Cypher queries are rewritten into SQL at parse time, allowing the full power of a declarative graph pattern language to flow through an existing SQL optimizer and executor. This chapter details the lexer, parser, abstract syntax tree, and the Cypher-to-SQL rewriting engine that makes this possible.

12.1 Architecture Overview

12.1.1 Why Cypher in a SQL Database

Graph traversal queries in SQL are notoriously verbose. Consider finding friends-of-friends who work at the same company as a given person. The pure SQL approach using graph table functions requires multiple CROSS JOIN LATERAL invocations, explicit endpoint tracking, and careful column aliasing:

SELECT DISTINCT f2_props->>'name' AS friend_name,
                c_props->>'name' AS company
FROM graph_neighbors('social', 'social:Person:1001',
                     'KNOWS', 'outgoing', 1) f1
CROSS JOIN LATERAL graph_neighbors('social', f1.id,
                                   'KNOWS', 'outgoing', 1) f2
CROSS JOIN LATERAL graph_edges('social', f2.id,
                               'WORKS_AT', 'outgoing') we
CROSS JOIN LATERAL graph_get_node('social', we.end_id) c
WHERE f2.id != 'social:Person:1001';

The same query in Cypher reads like a diagram of the graph pattern it describes:

MATCH (a:Person {name: 'Alice'})-[:KNOWS]->()-[:KNOWS]->(fof:Person),
      (fof)-[:WORKS_AT]->(c:Company)
WHERE fof <> a
RETURN DISTINCT fof.name AS friend_name, c.name AS company

The ASCII-art syntax of Cypher — parentheses for nodes, square brackets for relationships, arrows for direction — maps directly to the visual intuition that graph users carry. Variable-length paths, optional patterns, and mutating operations all benefit from this visual clarity. Rather than building an entirely separate graph query engine, Cognica translates Cypher into its existing SQL infrastructure at parse time, inheriting decades of query optimization research for free.

12.1.2 Parse-Time Rewriting vs Runtime Interpretation

Two fundamentally different strategies exist for integrating a second query language into a SQL engine:

Runtime interpretation treats the graph language as a black box. A table function receives the query string, executes it against the storage layer, and returns results. The SQL optimizer has no visibility into the internal structure of the graph query and cannot push predicates, reorder joins, or select indexes.

Parse-time rewriting translates the graph query into SQL AST nodes before any planning occurs. The SQL optimizer sees a standard subquery with joins, filters, and projections — it can apply predicate pushdown, join reordering, index selection, and all other optimization passes without modification.

Cognica adopts the parse-time rewriting approach, following the architecture established by Apache AGE (A Graph Extension for PostgreSQL). The cypher() function that appears in SQL is not a real function that executes at runtime. It is a sentinel that the AST builder intercepts during parse tree construction:

SELECT * FROM cypher('social', $$
    MATCH (a:Person)-[:FOLLOWS*1..3]->(b:Person)
    WHERE a.name = 'Alice'
    RETURN b.name, b.role
$$) AS (name TEXT, role TEXT);

When the AST builder encounters cypher() in a FROM clause, it extracts the graph name and Cypher string, invokes the Cypher parser and translator, and replaces the function reference with the generated SQL subquery. From that point forward, every downstream component — plan builder, optimizer, physical planner, CVM compiler — operates on standard SQL AST nodes.

12.1.3 The cypher() Sentinel Function

The cypher() function follows Apache AGE conventions:

Argument	Purpose	Example
Graph name	Identifies the graph namespace	`'social'`
Cypher query	Dollar-quoted Cypher text	`$$ MATCH ... RETURN ... $$`
Parameters (optional)	JSON object for `$param` binding	`'{"name": "Alice"}'::jsonb`

The AS (column TYPE, ...) clause following the function call defines the output schema. The number of columns in the AS clause must match the number of RETURN items in the Cypher query. This requirement is validated during rewriting and produces a clear error message on mismatch.

12.1.4 Overall Translation Flow

Loading diagram...

The flow consists of three major phases. First, the outer SQL is parsed by libpg_query, which treats cypher() as an ordinary function call. Second, the AST builder intercepts this function call and invokes the Cypher pipeline: lexical analysis, parsing into a Cypher AST, and rewriting into SQL AST nodes. Third, the generated SQL AST re-enters the standard planning pipeline, where it is indistinguishable from hand-written SQL.

12.2 Lexical Analysis

The Cypher lexer (CypherLexer) transforms raw query text into a stream of typed tokens. It is a hand-written scanner that handles Cypher's unique lexical requirements — pattern arrows, case-insensitive keywords, dollar-quoted parameters, and escaped identifiers.

12.2.1 Token Types

The lexer produces tokens of type TokenType, an enumeration with 29 variants:

enum class TokenType {
  // Delimiters
  kLeftParen, kRightParen,       // ( )
  kLeftBracket, kRightBracket,   // [ ]
  kLeftBrace, kRightBrace,       // { }
  kComma, kColon, kDot, kDotDot, // , : . ..
  kDollar, kPipe,                // $ |

  // Comparison and assignment
  kEquals, kNotEquals,                     // = <>
  kLessThan, kLessThanOrEqual,             // < <=
  kGreaterThan, kGreaterThanOrEqual,       // > >=
  kRegexMatch,                             // =~

  // Arithmetic
  kPlus, kMinus, kStar, kSlash, kPercent,  // + - * / %

  // Pattern arrows
  kArrowLeft, kArrowRight,       // <- ->

  // Atoms
  kIdentifier, kInteger, kFloat, kString,

  // Control
  kEof, kError,
};

Each Token carries its type, string value, and source location (line and column), enabling precise error reporting throughout the parsing and rewriting phases:

struct Token {
  TokenType type {TokenType::kError};
  std::string value;
  int32_t line {1};
  int32_t column {1};
};

12.2.2 Multi-Character Operators and Pattern Arrows

Several Cypher operators consist of two characters. The lexer resolves these using ordered lookahead before falling through to single-character cases. The ordering matters: <- must be checked before < alone, and .. before a single .:

// cypher_lexer.cpp:103-137
if (c == '<' && peek_char_(1) == '-') {
  advance_char_(); advance_char_();
  return Token {TokenType::kArrowLeft, {}, start_line, start_column};
}
if (c == '-' && peek_char_(1) == '>') {
  advance_char_(); advance_char_();
  return Token {TokenType::kArrowRight, {}, start_line, start_column};
}
if (c == '.' && peek_char_(1) == '.') {
  advance_char_(); advance_char_();
  return Token {TokenType::kDotDot, {}, start_line, start_column};
}

The <- and -> tokens are central to Cypher's visual pattern syntax. A relationship pattern like (a)-[:KNOWS]->(b) produces the token sequence: (, a, ), -, [, :, KNOWS, ], ->, (, b, ). The -> is a single token, not two separate tokens, which simplifies the parser's relationship direction logic.

The .. token (range operator) appears exclusively in variable-length path specifications like [:KNOWS*1..3], distinguishing it from the single . used for property access.

12.2.3 Case-Insensitive Keyword Recognition

Unlike SQL, which has a fixed set of reserved words handled by the grammar, Cypher keywords are recognized at the parser level through case-insensitive string comparison. The lexer emits all alphabetic sequences as kIdentifier tokens, and the parser checks whether an identifier matches a keyword using a helper function:

// cypher_parser.cpp:19-31
auto iequals(std::string_view lhs, std::string_view rhs) -> bool {
  if (lhs.size() != rhs.size()) {
    return false;
  }
  for (size_t i = 0; i < lhs.size(); ++i) {
    auto a = static_cast<unsigned char>(lhs[i]);
    auto b = static_cast<unsigned char>(rhs[i]);
    if (std::tolower(a) != std::tolower(b)) {
      return false;
    }
  }
  return true;
}

This design means MATCH, match, and Match are all valid. It also means that keywords like MATCH and RETURN are not reserved — they can be used as variable names in contexts where the parser does not expect a keyword. This permissiveness follows the openCypher specification's approach to context-sensitive keywords.

12.2.4 Multi-Word Keywords

Several Cypher constructs use multi-word keywords: ORDER BY, STARTS WITH, ENDS WITH, IS NULL, IS NOT NULL, and OPTIONAL MATCH. These are not tokenized as single tokens. Instead, the parser consumes them as sequences of identifiers. For example, ORDER BY is parsed as the keyword ORDER followed by the keyword BY:

// cypher_parser.cpp:486-492
if (match_keyword_("ORDER")) {
  expect_keyword_("BY");
  body.order_by.push_back(parse_sort_item_());
  while (match_(TokenType::kComma)) {
    body.order_by.push_back(parse_sort_item_());
  }
}

12.2.5 String Literals and Escape Sequences

Cypher strings can be delimited by either single or double quotes. The lexer supports standard escape sequences within strings: \n (newline), \r (carriage return), \t (tab), \\ (backslash), \' (single quote), and \" (double quote). Unterminated strings produce an error token rather than silently consuming the rest of the input (cypher_lexer.cpp:342-391).

12.2.6 Escaped Identifiers and Parameters

Backtick-delimited identifiers (`my variable`) allow arbitrary characters in variable names. Doubled backticks within the delimiters represent a literal backtick. Parameters use the $ prefix followed by an identifier ($name, $personId), tokenized as a kDollar token followed by a kIdentifier.

12.2.7 Position Tracking

The lexer maintains line and column counters that are updated on every character advance. Newline characters increment the line counter and reset the column to 1. Each token records its starting position, providing the foundation for error messages that include precise source locations (cypher_lexer.cpp:426-433).

12.3 Parsing

The Cypher parser (CypherParser) is a hand-written recursive descent parser that consumes the token stream and produces a typed abstract syntax tree. Recursive descent was chosen over parser generators (Bison/Flex, ANTLR) to avoid external dependencies and to enable precise error messages that reference Cypher-specific concepts.

12.3.1 Recursive Descent Design

The parser maintains a current token and a lexer reference. It advances through the token stream using a small set of primitives:

Method	Behavior
`current_()`	Returns the current token without consuming it
`peek_()`	Returns the next token without consuming either
`advance_()`	Consumes the current token and loads the next
`expect_(type)`	Consumes the current token if it matches; throws on mismatch
`match_(type)`	Consumes and returns true if the current token matches; false otherwise
`check_keyword_(kw)`	Tests if the current token is the given keyword (case-insensitive)

These primitives compose cleanly: match_keyword_("OPTIONAL") tests and consumes OPTIONAL in one call, while check_keyword_("MATCH") tests without consuming.

12.3.2 Grammar Structure

The parser implements the following grammar hierarchy:

Query -> ClauseSequence (UNION [ALL] ClauseSequence)*

ClauseSequence -> Clause+

Clause -> MatchClause
        | CreateClause
        | MergeClause
        | DeleteClause
        | SetClause
        | RemoveClause
        | ReturnClause
        | WithClause
        | UnwindClause

MatchClause -> [OPTIONAL] MATCH PatternPart (, PatternPart)*
               [WHERE Expression]

PatternPart -> [variable =] PatternElement

PatternElement -> NodePattern (RelationshipChain)*

NodePattern -> ( [variable] [:Label]* [{properties}] )

RelationshipChain -> RelationshipPattern NodePattern

RelationshipPattern -> <-[details]- | -[details]-> | -[details]-

ProjectionBody -> [DISTINCT] ProjectionItem (, ProjectionItem)*
                  [ORDER BY SortItem (, SortItem)*]
                  [SKIP expr] [LIMIT expr]

12.3.3 Clause Parsing Dispatch

The parse_clause_() method inspects the current token and dispatches to the appropriate clause parser. It uses keyword checks rather than a token-type switch because Cypher keywords are context-sensitive identifiers:

// cypher_parser.cpp:177-209
auto CypherParser::parse_clause_() -> ast::Clause {
  if (check_keyword_("OPTIONAL") || check_keyword_("MATCH")) {
    return ast::Clause {parse_match_clause_()};
  }
  if (check_keyword_("CREATE")) {
    return ast::Clause {parse_create_clause_()};
  }
  if (check_keyword_("MERGE")) {
    return ast::Clause {parse_merge_clause_()};
  }
  if (check_keyword_("DETACH") || check_keyword_("DELETE")) {
    return ast::Clause {parse_delete_clause_()};
  }
  // ... remaining clauses ...
  error_("Expected Cypher clause");
}

OPTIONAL must be checked alongside MATCH because the two-word keyword OPTIONAL MATCH begins with OPTIONAL. Similarly, DETACH DELETE begins with DETACH.

12.3.4 Pattern Parsing

Graph patterns are the syntactic heart of Cypher. A pattern consists of a chain of node-relationship-node segments:

(a:Person {name: 'Alice'})-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company)

The parser handles this as a PatternElement: an initial NodePattern followed by zero or more RelationshipChain entries. Each chain contains a RelationshipPattern and a NodePattern.

Relationship direction detection works in two phases. First, the parser checks whether the relationship begins with <- (left arrow, indicating an incoming edge). If not, it expects a - (dash). After the optional bracket-enclosed details, the parser checks the trailing token: -> sets direction to kRight, another - sets direction to kBoth (undirected):

// cypher_parser.cpp:407-451
auto CypherParser::parse_relationship_pattern_()
    -> ast::RelationshipPattern {
  // ...
  if (match_(TokenType::kArrowLeft)) {
    pattern.direction = ast::RelationshipDirection::kLeft;
  } else {
    expect_(TokenType::kMinus);
  }
  // ... parse bracket contents ...
  if (pattern.direction == ast::RelationshipDirection::kLeft) {
    expect_(TokenType::kMinus);
  } else if (match_(TokenType::kArrowRight)) {
    pattern.direction = ast::RelationshipDirection::kRight;
  } else {
    expect_(TokenType::kMinus);
    pattern.direction = ast::RelationshipDirection::kBoth;
  }
  return pattern;
}

Variable-length path specifications (*1..3, *..5, *3) are parsed when a kStar token appears inside the relationship brackets. The range parser handles four cases: both bounds present (*1..3), only minimum (*3 is both min and max), only maximum (*..5), and unbounded (*):

// cypher_parser.cpp:453-472
auto CypherParser::parse_relationship_range_()
    -> ast::RelationshipRange {
  auto range = ast::RelationshipRange {};
  if (check_(TokenType::kInteger)) {
    range.min_hops = std::stoll(advance_().value);
  }
  if (match_(TokenType::kDotDot)) {
    if (check_(TokenType::kInteger)) {
      range.max_hops = std::stoll(advance_().value);
    }
    return range;
  }
  if (range.min_hops.has_value()) {
    range.max_hops = range.min_hops;  // *3 means exactly 3 hops
  }
  return range;
}

12.3.5 Expression Parsing with Precedence Levels

Cypher expressions follow a precedence hierarchy implemented through a chain of parsing methods. Each level calls the next-higher-precedence level, and loops to handle left-associative operators at its own level:

Precedence	Operators	Parser Method
1 (lowest)	OR	`parse_or_expression_()`
2	AND	`parse_and_expression_()`
3	NOT	`parse_not_expression_()`
4	= <> < <= > >= IN STARTS WITH ENDS WITH CONTAINS IS NULL IS NOT NULL =~	`parse_comparison_expression_()`
5	+ -	`parse_additive_expression_()`
6	* / %	`parse_multiplicative_expression_()`
7	Unary + -	`parse_unary_expression_()`
8	Property lookup, subscript, function call	`parse_postfix_expression_()`
9 (highest)	Literals, variables, parameters, CASE, parenthesized	`parse_primary_expression_()`

Negated predicates (NOT IN, NOT STARTS WITH, NOT ENDS WITH, NOT CONTAINS) are handled by lookahead in the comparison parser. When NOT is followed by one of these predicate keywords, the parser consumes NOT, parses the predicate, and wraps the result in a unary NOT node (cypher_parser.cpp:600-643).

12.3.6 Error Handling

Parse errors are reported through the ParseError structure, which carries a message and the source location (line and column). The parse() entry point wraps the parsing logic in a try-catch block and returns std::expected<ast::SingleQuery, ParseError>:

// cypher_parser.cpp:127-139
auto CypherParser::parse()
    -> std::expected<ast::SingleQuery, ParseError> {
  try {
    if (current_token_.type == TokenType::kError) {
      error_(current_token_, current_token_.value);
    }
    return parse_query_();
  } catch (const std::exception& e) {
    if (error_info_.message.empty()) {
      error_info_.message = e.what();
    }
    return std::unexpected(error_info_);
  }
}

The error_() methods record the error location before throwing, ensuring that the caller receives both the message and the position in the Cypher source where parsing failed.

12.4 Abstract Syntax Tree

The Cypher AST (cognica::sql::graph::cypher::ast) represents the parsed structure of a Cypher query as a tree of typed nodes. It uses std::variant for sum types and std::unique_ptr for ownership, following the same conventions as the SQL AST elsewhere in the codebase.

12.4.1 Expression Nodes

Expressions are represented as a struct containing a std::variant of 19 expression types:

using ExpressionData
    = std::variant<Wildcard, NullLiteral, BooleanLiteral,
                   IntegerLiteral, FloatLiteral, StringLiteral,
                   Variable, Parameter, MapLiteral, ListLiteral,
                   PropertyLookup, SubscriptExpression,
                   SliceExpression, ListComprehensionExpression,
                   ExistsPatternExpression, FunctionCall,
                   UnaryExpression, BinaryExpression,
                   CaseExpression>;

struct Expression {
  ExpressionData data;

  template<typename T>
  auto is() const -> bool {
    return std::holds_alternative<T>(data);
  }

  template<typename T>
  auto as() -> T& { return std::get<T>(data); }

  template<typename T>
  auto as() const -> const T& { return std::get<T>(data); }
};

The is<T>() and as<T>() template methods provide a clean dispatch interface. The pattern if (expr.is<PropertyLookup>()) { const auto& lookup = expr.as<PropertyLookup>(); ... } appears throughout the translator.

Key expression types include:

Type	Purpose	Example
`Variable`	Bound variable reference	`n`, `r`
`PropertyLookup`	Property access	`n.name`
`FunctionCall`	Function invocation	`id(n)`, `count(*)`
`Parameter`	Query parameter	`$name`
`BinaryExpression`	Binary operators	`a.age > 21`
`UnaryExpression`	Unary operators	`NOT x`, `n IS NULL`
`CaseExpression`	CASE/WHEN/THEN/ELSE	`CASE x WHEN 1 THEN 'a' END`
`ListComprehensionExpression`	List comprehension	`[x IN list WHERE x > 0 \| x * 2]`
`ExistsPatternExpression`	Pattern existence test	`exists((a)-->(b))`

12.4.2 Clause Nodes

Clauses form the top-level structure of a Cypher query. Like expressions, they use a variant-based design:

using ClauseData
    = std::variant<MatchClause, CreateClause, MergeClause,
                   DeleteClause, SetClause, RemoveClause,
                   ReturnClause, WithClause, UnwindClause>;

Each clause type carries the data specific to its syntax:

MatchClause: An optional flag, a list of patterns, and an optional WHERE expression.
CreateClause: A list of patterns describing nodes and edges to create.
MergeClause: A single pattern and a list of MergeAction entries (ON MATCH SET, ON CREATE SET).
DeleteClause: A detach flag and a list of variable names to delete.
SetClause: A list of property assignment items (target expression = value expression).
RemoveClause: A list of property removal items.
ReturnClause and WithClause: A ProjectionBody containing projection items, ORDER BY, SKIP, and LIMIT.
UnwindClause: An expression and a variable name for the unwound elements.

12.4.3 Pattern Nodes

Patterns represent the graph structure to match or create:

struct NodePattern {
  std::string variable;           // Optional binding name
  std::vector<std::string> labels; // Zero or more labels
  std::optional<MapLiteral> properties; // Inline property constraints
};

struct RelationshipPattern {
  std::string variable;
  std::vector<std::string> types;
  std::optional<RelationshipRange> range;
  std::optional<MapLiteral> properties;
  RelationshipDirection direction; // kLeft, kRight, kBoth
};

struct RelationshipChain {
  RelationshipPattern relationship;
  NodePattern node;
};

struct PatternElement {
  NodePattern node;              // Starting node
  std::vector<RelationshipChain> chains; // Relationship-node pairs
};

struct PatternPart {
  std::string variable;          // Optional path variable
  PatternElement element;
};

A PatternElement models the chain structure of Cypher patterns. The pattern (a)-[r]->(b)-[s]->(c) produces a PatternElement with starting node a and two chains: (r, b) and (s, c). This representation directly mirrors the visual syntax and simplifies the translator's iteration over relationship hops.

12.4.4 Query Structure

At the top level, a SingleQuery contains a list of clauses and optional UNION parts:

struct SingleQuery {
  std::vector<Clause> clauses;
  std::vector<UnionQueryPart> unions;
};

struct UnionQueryPart {
  bool all {false};
  std::vector<Clause> clauses;
};

This structure supports queries with UNION: MATCH ... RETURN ... UNION ALL MATCH ... RETURN ....

12.5 Cypher-to-SQL Rewriting

The rewriter (Translator class in cypher_rewriter.cpp) is the most substantial component of the Cypher subsystem at over 5,000 lines. It transforms a Cypher AST into SQL AST nodes that the standard planner can process. The design follows the Apache AGE architecture: each clause contributes to a growing SQL SelectStmt, and scope boundaries (WITH clauses, MERGE) introduce subquery nesting.

12.5.1 Subquery Pipeline Architecture

The core insight of the Cypher-to-SQL translation is that each Cypher clause modifies or wraps the current SQL SELECT statement. The translate_clause_sequence_() method processes clauses in order:

// cypher_rewriter.cpp:276-391
auto translate_clause_sequence_(
    const std::vector<cyast::Clause>& clauses)
    -> std::expected<sqlast::StmtPtr, std::string> {
  auto select = sqlast::make_select_stmt();

  for (size_t i = 0; i < clauses.size(); ++i) {
    const auto& clause = clauses[i];

    if (clause.is<cyast::MatchClause>()) {
      // Adds FROM/JOIN/WHERE to current select
      translate_match_clause_(..., *select);
    }
    if (clause.is<cyast::WithClause>()) {
      // Wraps current select in a subquery, creates new outer select
      select = materialize_with_clause_(std::move(select), ...);
    }
    if (clause.is<cyast::ReturnClause>()) {
      // Adds SELECT list, ORDER BY, LIMIT to current select
      translate_projection_(..., *select, ...);
    }
    // ... other clauses ...
  }
  return sqlast::StmtPtr {std::move(select)};
}

MATCH clauses add FROM entries (for node/edge table scans) and JOIN clauses (for connecting nodes to edges) to the current SELECT. Multiple MATCH clauses in sequence add to the same SELECT, building up a multi-table join.

WITH clauses create a scope boundary. The current SELECT is wrapped inside a subquery, and a fresh outer SELECT is created. Variables projected by WITH become the only visible bindings in the new scope. This directly models Cypher's scoping semantics.

RETURN adds the SELECT list (target entries), ORDER BY, SKIP, and LIMIT to the final SELECT.

12.5.2 Variable Binding Tracking

The translator maintains a binding table (std::unordered_map<std::string, Binding>) that maps Cypher variable names to their SQL representations. Each binding records:

struct Binding {
  BindingKind kind;          // kNode, kEdge, or kScalar
  std::string qualifier;     // SQL table alias
  std::string id_column;     // Column containing _id
  std::string label_column;  // Column containing _label
  std::string type_column;   // Column containing _type (edges)
  std::string source_column; // Column containing _source (edges)
  std::string target_column; // Column containing _target (edges)
  bool direct_properties;    // Can access properties as direct columns?
  bool is_path;              // Is this a path variable?
};

When the translator encounters a variable reference like n.name, it looks up n in the binding table, determines whether n is a node or edge, and generates the appropriate SQL column reference. For nodes with direct property access, n.name becomes t_n.name (a direct column reference). After a scope boundary (WITH clause), properties may be carried through as explicit projected columns, and the binding's property_columns map tracks which properties are available.

12.5.3 Pattern Translation

Node Patterns

Each node variable in a MATCH pattern becomes a scan of the {graph}_nodes collection. For a pattern like (n:Person {name: 'Alice'}), the translator:

Acquires a binding for variable n with a fresh SQL alias (e.g., __cy_n0).
Adds a FROM or JOIN clause: FROM social_nodes AS __cy_n0.
Generates label constraints: array_position(string_to_array(__cy_n0._label, ':'), 'Person') IS NOT NULL.
Generates property constraints: __cy_n0.name = 'Alice'.

The label matching uses string_to_array and array_position rather than simple equality because a node can carry multiple labels stored as a colon-separated string (e.g., Person:Employee).

Edge Patterns

Relationships are translated to JOINs against the {graph}_edges collection. The join condition depends on the relationship direction:

Direction	Edge Entry Condition	Node Entry Condition
Outgoing (`->`)	`node._id = edge._source`	`edge._target = next_node._id`
Incoming (`<-`)	`node._id = edge._target`	`edge._source = next_node._id`
Undirected (`-`)	`node._id = edge._source OR node._id = edge._target`	Complex bidirectional join

For undirected edges, the translator generates an OR condition that accepts traversal in either direction:

// cypher_rewriter.cpp:3988-4011
auto make_edge_entry_condition_(
    const Binding& node_binding, const Binding& edge_binding,
    cyast::RelationshipDirection direction)
    -> std::expected<sqlast::ExprPtr, std::string> {
  if (direction == cyast::RelationshipDirection::kRight) {
    return sqlast::make_binary_expr(
        sqlast::BinaryOpType::kEqual, id_ref_(node_binding),
        source_ref_(edge_binding));
  }
  if (direction == cyast::RelationshipDirection::kLeft) {
    return sqlast::make_binary_expr(
        sqlast::BinaryOpType::kEqual, id_ref_(node_binding),
        target_ref_(edge_binding));
  }
  // Undirected: OR of both directions
  auto left = sqlast::make_binary_expr(
      sqlast::BinaryOpType::kEqual, id_ref_(node_binding),
      source_ref_(edge_binding));
  auto right = sqlast::make_binary_expr(
      sqlast::BinaryOpType::kEqual, id_ref_(node_binding),
      target_ref_(edge_binding));
  return sqlast::make_binary_expr(
      sqlast::BinaryOpType::kOr, std::move(left), std::move(right));
}

Multiple Relationship Types

When a relationship pattern specifies multiple types ([:KNOWS|WORKS_WITH]), each type becomes an equality predicate, and the predicates are combined with OR:

// cypher_rewriter.cpp:3070-3078
if (!relationship.types.empty()) {
  auto type_predicates = std::vector<sqlast::ExprPtr> {};
  for (const auto& type : relationship.types) {
    type_predicates.push_back(sqlast::make_binary_expr(
        sqlast::BinaryOpType::kEqual, type_ref_(binding),
        sqlast::make_constant_string(type)));
  }
  predicates.push_back(combine_with_or_(type_predicates));
}

This generates SQL like: __cy_r0._type = 'KNOWS' OR __cy_r0._type = 'WORKS_WITH'.

12.5.4 Variable-Length Paths

Variable-length path patterns (e.g., [:KNOWS*1..3]) require recursive evaluation. The translator generates a WITH RECURSIVE common table expression (CTE) that iterates from the start node, following edges up to the maximum depth while tracking visited edges to prevent cycles.

Recursive CTE Structure

The generated CTE has five columns:

Column	Purpose
`start_id`	ID of the path's starting node
`end_id`	ID of the current endpoint
`depth`	Number of hops traversed so far
`visited_edge_ids`	Array of edge IDs on the current path (cycle detection)
`visited_node_ids`	Array of node IDs on the current path

The CTE consists of a base case (single-hop traversal from start nodes) and a recursive case (extending paths by one hop):

Loading diagram...

The cycle detection mechanism uses array_position to check whether the current edge has already been visited:

// cypher_rewriter.cpp:1998-2004
auto visited_lookup_args = make_binary_expr_list(
    sqlast::make_column_ref(previous_alias, "visited_edge_ids"),
    id_ref_(recursive_edge_scan));
recursive_conditions.push_back(sqlast::make_unary_expr(
    sqlast::UnaryOpType::kIsNull,
    sqlast::make_function_call("array_position",
                               std::move(visited_lookup_args))));

This translates to: array_position(prev.visited_edge_ids, edge._id) IS NULL, which is true only when the edge has not been visited.

Zero-Hop Paths

When the minimum hop count is 0 (e.g., *0..3), the translator generates a separate zero-hop SELECT that returns the start node as both the start and end of a zero-length path. This SELECT is UNION ALL'd with the recursive CTE.

Segmented Multi-Hop Patterns

When a pattern contains multiple relationship segments and at least one is variable-length, the translator splits the pattern into individual segments. Each segment is translated independently — fixed-length segments become standard JOINs, and variable-length segments become recursive CTEs. The segments are connected by shared node variables at their endpoints (cypher_rewriter.cpp:686-774).

12.5.5 Clause Translation

MATCH to FROM/JOIN/WHERE

A MATCH clause adds table scans and joins to the current SELECT. The translator iterates over the pattern's node-relationship chains, acquiring bindings for each variable and generating join conditions:

MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE a.name = 'Alice'

Translates to:

SELECT ...
FROM social_nodes AS __cy_n0
JOIN social_edges AS __cy_r0
  ON __cy_n0._id = __cy_r0._source
JOIN social_nodes AS __cy_n1
  ON __cy_r0._target = __cy_n1._id
WHERE array_position(string_to_array(__cy_n0._label, ':'),
                     'Person') IS NOT NULL
  AND __cy_n0.name = 'Alice'
  AND array_position(string_to_array(__cy_n1._label, ':'),
                     'Person') IS NOT NULL

OPTIONAL MATCH to LEFT JOIN

OPTIONAL MATCH uses a LATERAL LEFT JOIN to ensure that the outer query's rows are preserved even when the pattern has no matches. The translator creates a sub-translator with a copy of the current bindings, translates the pattern in a subquery, and joins the subquery with a LEFT JOIN:

// cypher_rewriter.cpp:2280-2289
auto join = std::make_unique<sqlast::JoinExpr>();
join->join_type = sqlast::JoinType::kLeft;
join->join_condition = sqlast::make_constant_bool(true);
join->right_subquery = std::make_unique<sqlast::RangeSubselect>();
join->right_subquery->subquery = sqlast::StmtPtr {std::move(subselect)};
join->right_subquery->alias = "__cy_opt" + std::to_string(optional_index_++);
join->right_subquery->lateral = true;

The lateral = true flag ensures the subquery can reference columns from the outer query, which is necessary when the OPTIONAL MATCH pattern references variables bound by a preceding MATCH.

WITH as Scope Boundary

The WITH clause creates a subquery boundary. The current SELECT is materialized into a subquery, and the translator creates a fresh outer SELECT. Only the variables projected by WITH are visible in the subsequent scope:

MATCH (a:Person)
WITH a.name AS name, count(*) AS cnt
WHERE cnt > 5
MATCH (b:Person {name: name})
RETURN b

The WITH clause produces a subquery (SELECT a.name AS name, count(*) AS cnt FROM ...) AS __cy_scope0, and the subsequent MATCH clause operates within a new SELECT that reads from __cy_scope0.

UNWIND to Table Function

UNWIND transforms a list into rows. The translator generates an unnest() table function call, joined laterally to the current query:

UNWIND [1, 2, 3] AS x
RETURN x * 2

Becomes:

SELECT x * 2
FROM unnest(ARRAY[1, 2, 3]) AS __cy_unwind0(x)

When there is an existing FROM clause, the unnest is joined with a CROSS JOIN LATERAL to ensure it can reference outer columns (cypher_rewriter.cpp:2173-2211).

RETURN to SELECT

The RETURN clause translates to the SELECT list, with optional DISTINCT, ORDER BY, SKIP (OFFSET), and LIMIT. Graph variable references in RETURN are handled specially: bare node or edge variables are projected as JSON objects containing id, label, and properties, matching the Apache AGE output format (cypher_rewriter.cpp:2625-2679).

12.5.6 Expression Translation

Property Access

Property access on graph variables (n.name) is the most common expression in Cypher queries. The translator resolves it based on the binding's storage model:

Direct properties: When the binding scans the node/edge collection directly, n.name becomes __cy_n0.name — a direct column reference on the collection.
Carried properties: After a scope boundary, the property may be available as a projected column tracked in the binding's property_columns map.
Scalar bindings: For variables bound by UNWIND or WITH to scalar values, property access becomes a JSON field access.

// cypher_rewriter.cpp:3558-3574
auto translate_property_lookup_(
    const cyast::PropertyLookup& lookup,
    bool allow_identity_variables)
    -> std::expected<sqlast::ExprPtr, std::string> {
  if (lookup.object->is<cyast::Variable>()) {
    const auto& variable = lookup.object->as<cyast::Variable>();
    auto binding = find_binding_(variable.name);
    // ... path variable handling ...
    return property_ref_(*binding, lookup.property);
  }
  // ... fallthrough to general expression translation ...
}

Function Mapping

Cypher functions are translated to their SQL equivalents. The translator recognizes function names case-insensitively and supports the ag_catalog. schema prefix for Apache AGE compatibility:

Cypher Function	SQL Translation
`id(n)`	`__cy_n0._id`
`labels(n)`	`string_to_array(__cy_n0._label, ':')`
`type(r)`	`__cy_r0._type`
`start_id(r)`	`__cy_r0._source`
`end_id(r)`	`__cy_r0._target`
`startNode(r)`	Lateral join to `graph_get_node` with `_source`
`endNode(r)`	Lateral join to `graph_get_node` with `_target`
`properties(n)`	Lateral join to `graph_get_node`, returns `properties`
`keys(n)`	`jsonb_object_keys_array(properties)`
`exists(n.prop)`	`jsonb_exists(properties, 'prop')`
`count(*)`	`count(*)`
`collect(x)`	Passed through as aggregate
`vertex_stats(n)`	Complex subquery with in/out degree counts

The count(*) case requires special handling: the parser produces a Wildcard inside a FunctionCall, and the translator sets the is_star flag on the SQL function call node rather than translating arguments (cypher_rewriter.cpp:3821-3826).

General functions not in the mapping table are passed through with their name lowercased, allowing standard SQL aggregate and scalar functions to work transparently within Cypher queries.

String Predicates

Cypher's string predicates translate to SQL function calls:

Cypher	SQL
`a STARTS WITH 'x'`	`starts_with(a, 'x')`
`a ENDS WITH 'x'`	`ends_with(a, 'x')`
`a CONTAINS 'x'`	`strpos(a, 'x') > 0`
`a =~ 'pattern'`	`a ~ 'pattern'` (regex match)
`a IN list`	`array_position(list, a) IS NOT NULL`

The IN operator uses array_position rather than SQL's IN keyword because Cypher's IN operates on array values, not SQL row sets (cypher_rewriter.cpp:3349-3365).

NULL Handling

Cypher's IS NULL and IS NOT NULL map directly to their SQL counterparts. The IS keyword is parsed as part of the comparison expression handler, which checks for the NOT modifier between IS and NULL (cypher_parser.cpp:588-598).

CASE Expressions

Cypher CASE expressions support both simple form (CASE expr WHEN val THEN ...) and searched form (CASE WHEN condition THEN ...). Both translate directly to SQL CASE expressions with the same structure.

List Operations

List literals [1, 2, 3] translate to SQL ARRAY[1, 2, 3]. List subscript list[0] translates to array subscript with zero-based to one-based index conversion (Cypher uses 0-based indexing, SQL uses 1-based). Negative indices in Cypher index from the end, so the translator generates a CASE expression:

// cypher_rewriter.cpp:3499-3511
// CASE WHEN index >= 0 THEN index + 1 ELSE index END
auto sql_case = std::make_unique<sqlast::CaseExpr>();
sqlast::CaseWhen when_clause;
when_clause.condition = sqlast::make_binary_expr(
    sqlast::BinaryOpType::kGreaterThanOrEqual,
    clone(index), sqlast::make_constant_int(0));
when_clause.result = sqlast::make_binary_expr(
    sqlast::BinaryOpType::kAdd, clone(index),
    sqlast::make_constant_int(1));
sql_case->when_clauses.push_back(std::move(when_clause));
sql_case->else_result = std::move(*index);

List comprehensions [x IN list WHERE x > 0 | x * 2] translate to ARRAY(SELECT x * 2 FROM unnest(list) AS t(x) WHERE x > 0), using a subquery with unnest and the array aggregate subquery type (cypher_rewriter.cpp:3882-3968).

12.5.7 Write Operation Handling

CREATE

The CREATE clause generates calls to graph_create_node and graph_create_edge table functions. These are not SQL INSERT statements — they invoke Cognica's graph storage API directly, which handles ID assignment, label indexing, and adjacency cache updates.

For each new node:

// cypher_rewriter.cpp:986-1000
auto create_func = std::make_unique<sqlast::RangeTableFunc>();
create_func->function_name = "graph_create_node";
create_func->arguments.push_back(
    sqlast::make_constant_string(graph_name_));
create_func->arguments.push_back(
    sqlast::make_constant_string(join_labels(node.labels)));
// ... properties as jsonb_build_object(...) ...
create_func->alias = create_alias;
create_func->output_columns.push_back("id");

After creation, the translator immediately looks up the created node to establish a binding with full metadata (id, label, properties), enabling subsequent clauses to reference the newly created entity.

For edges, the translator determines the actual source and target based on the relationship direction — a left-directed relationship (a)<-[:KNOWS]-(b) makes b the source and a the target (cypher_rewriter.cpp:1066-1073).

MERGE

MERGE implements find-or-create semantics. The translator builds two branches — a "matched" branch and a "created" branch — and combines them with UNION ALL:

Loading diagram...

The matched branch performs a LATERAL INNER JOIN against a subquery that attempts to match the pattern. If the pattern exists, the ON MATCH SET actions execute. The created branch performs a LATERAL LEFT JOIN against the same pattern subquery; rows where the probe returns NULL (pattern not found) pass through to the CREATE logic, and the ON CREATE SET actions execute.

This two-branch approach ensures that each input row produces exactly one output row, regardless of whether the pattern was matched or created.

DELETE and DETACH DELETE

The DELETE clause generates calls to graph_delete_node or graph_delete_edge. The translator validates that the target variable is bound to a node or edge (not a scalar), and orders deletions to process edges before nodes. DETACH DELETE passes a cascade flag to graph_delete_node, which removes all connected edges before removing the node:

// cypher_rewriter.cpp:1170-1183
delete_func->function_name = target.binding.kind == BindingKind::kEdge
                                 ? "graph_delete_edge"
                                 : "graph_delete_node";
delete_func->arguments.push_back(
    sqlast::make_constant_string(graph_name_));
delete_func->arguments.push_back(id_ref_(target.binding));
if (target.binding.kind == BindingKind::kNode) {
  delete_func->arguments.push_back(
      sqlast::make_constant_bool(detach_nodes));
}

SET and REMOVE

The SET clause translates property assignments into graph_update_node or graph_update_edge calls. Multiple assignments to the same variable are grouped into a single update call with a jsonb_build_object argument containing all key-value pairs. Setting a property to NULL removes it. The REMOVE clause uses the same update mechanism, passing a $delete marker key with the property names to remove.

After each mutation (SET, REMOVE, CREATE), the translator re-establishes the affected binding by looking up the entity with its new state, ensuring that subsequent clauses see the updated values.

12.6 Supported Syntax Reference

The following table summarizes the Cypher syntax supported by Cognica's implementation:

Category	Syntax	Supported
Reading
	`MATCH (n:Label)`	Yes
	`MATCH (n:Label {prop: value})`	Yes
	`MATCH (n:Label1:Label2)` — multi-label	Yes
	`MATCH (a)-[r:TYPE]->(b)`	Yes
	`MATCH (a)<-[r:TYPE]-(b)`	Yes
	`MATCH (a)-[r:TYPE]-(b)` — undirected	Yes
	`MATCH (a)-[r:T1\|T2]->(b)` — multiple types	Yes
	`MATCH (a)-[*1..3]->(b)` — variable-length	Yes
	`MATCH (a)-[*]->(b)` — unbounded	Yes
	`OPTIONAL MATCH`	Yes
	`WHERE`	Yes
Projection
	`RETURN expr AS alias`	Yes
	`RETURN DISTINCT`	Yes
	`RETURN *`	Yes
	`ORDER BY expr [ASC\|DESC]`	Yes
	`SKIP n`	Yes
	`LIMIT n`	Yes
Composition
	`WITH expr AS alias`	Yes
	`WITH ... WHERE`	Yes
	`UNWIND expr AS variable`	Yes
	`UNION [ALL]`	Yes
Writing
	`CREATE (n:Label {props})`	Yes
	`CREATE (a)-[:TYPE]->(b)`	Yes
	`MERGE (n:Label {props})`	Yes
	`MERGE ... ON MATCH SET`	Yes
	`MERGE ... ON CREATE SET`	Yes
	`SET n.prop = value`	Yes
	`SET n.prop = NULL` (property removal)	Yes
	`DELETE n`	Yes
	`DETACH DELETE n`	Yes
	`REMOVE n.prop`	Yes
Expressions
	Arithmetic: `+`, `-`, `*`, `/`, `%`	Yes
	Comparison: `=`, `<>`, `<`, `<=`, `>`, `>=`	Yes
	Boolean: `AND`, `OR`, `NOT`	Yes
	String: `STARTS WITH`, `ENDS WITH`, `CONTAINS`	Yes
	Regex: `=~`	Yes
	`IS NULL`, `IS NOT NULL`	Yes
	`IN` (list membership)	Yes
	`NOT IN`, `NOT STARTS WITH`, etc.	Yes
	CASE/WHEN/THEN/ELSE/END	Yes
	List literals: `[1, 2, 3]`	Yes
	Map literals: `{key: value}`	Yes
	List comprehension: `[x IN list WHERE p \| e]`	Yes
	Parameters: `$name`	Yes
Functions
	`id(n)`, `labels(n)`, `type(r)`	Yes
	`startNode(r)`, `endNode(r)`	Yes
	`start_id(r)`, `end_id(r)`	Yes
	`properties(n)`, `keys(n)`, `exists(n.p)`	Yes
	`count(*)`, `count(expr)`	Yes
	`collect(expr)`	Yes
	`sum`, `avg`, `min`, `max`	Yes
	`length(path)`, `nodes(path)`, `relationships(path)`	Yes
	`substring`, `left`, `right`	Yes
	`exists((a)-->(b))` — pattern exists	Yes
	`vertex_stats(n)`	Yes

12.7 Translation Examples

This section presents complete translations from Cypher to SQL, illustrating the key rewriting patterns.

12.7.1 Simple MATCH with Property Filter

Cypher:

MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person)
RETURN b.name AS friend_name

Generated SQL (conceptual):

SELECT __cy_n1.name AS friend_name
FROM social_nodes AS __cy_n0
JOIN social_edges AS __cy_r0
  ON __cy_n0._id = __cy_r0._source
JOIN social_nodes AS __cy_n1
  ON __cy_r0._target = __cy_n1._id
WHERE array_position(
        string_to_array(__cy_n0._label, ':'), 'Person'
      ) IS NOT NULL
  AND __cy_n0.name = 'Alice'
  AND __cy_r0._type = 'KNOWS'
  AND array_position(
        string_to_array(__cy_n1._label, ':'), 'Person'
      ) IS NOT NULL

The translation creates three table scans (two node collections, one edge collection) connected by join conditions that enforce the relationship direction. Label checks use string_to_array and array_position to support multi-label nodes. The property filter {name: 'Alice'} becomes a WHERE clause predicate.

12.7.2 Multi-Hop Traversal with JOIN Chain

Cypher:

MATCH (a:Person)-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company)
WHERE a.name = 'Alice'
RETURN b.name, c.name AS company

Generated SQL (conceptual):

SELECT __cy_n1.name, __cy_n2.name AS company
FROM social_nodes AS __cy_n0
JOIN social_edges AS __cy_r0
  ON __cy_n0._id = __cy_r0._source
JOIN social_nodes AS __cy_n1
  ON __cy_r0._target = __cy_n1._id
JOIN social_edges AS __cy_r1
  ON __cy_n1._id = __cy_r1._source
JOIN social_nodes AS __cy_n2
  ON __cy_r1._target = __cy_n2._id
WHERE array_position(
        string_to_array(__cy_n0._label, ':'), 'Person'
      ) IS NOT NULL
  AND __cy_n0.name = 'Alice'
  AND __cy_r0._type = 'KNOWS'
  AND array_position(
        string_to_array(__cy_n1._label, ':'), 'Person'
      ) IS NOT NULL
  AND __cy_r1._type = 'WORKS_AT'
  AND array_position(
        string_to_array(__cy_n2._label, ':'), 'Company'
      ) IS NOT NULL

Each hop in the chain adds one edge scan and one node scan, connected by join conditions. The SQL optimizer can reorder these joins based on selectivity estimates, potentially starting from the most selective predicate (a.name = 'Alice') and expanding outward.

12.7.3 Variable-Length Path with Recursive CTE

Cypher:

MATCH (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
RETURN b.name, b.age

Generated SQL (conceptual):

WITH RECURSIVE __cy_path0(start_id, end_id, depth,
                           visited_edge_ids, visited_node_ids) AS (
  -- Base case: one hop
  SELECT n0._id, n1._id, 1,
         ARRAY[e0._id], ARRAY[n0._id, n1._id]
  FROM social_nodes AS n0
  JOIN social_edges AS e0 ON n0._id = e0._source
  JOIN social_nodes AS n1 ON e0._target = n1._id
  WHERE array_position(
          string_to_array(n0._label, ':'), 'Person'
        ) IS NOT NULL
    AND n0.name = 'Alice'
    AND e0._type = 'KNOWS'

  UNION ALL

  -- Recursive case: extend by one hop
  SELECT prev.start_id, n1._id, prev.depth + 1,
         array_append(prev.visited_edge_ids, e1._id),
         array_append(prev.visited_node_ids, n1._id)
  FROM __cy_path0 AS prev
  JOIN social_edges AS e1 ON prev.end_id = e1._source
  JOIN social_nodes AS n1 ON e1._target = n1._id
  WHERE prev.depth < 3
    AND array_position(prev.visited_edge_ids, e1._id) IS NULL
    AND e1._type = 'KNOWS'
)
SELECT __cy_n1.name, __cy_n1.age
FROM __cy_path0 AS __cy_path0
JOIN social_nodes AS __cy_n0
  ON __cy_path0.start_id = __cy_n0._id
JOIN social_nodes AS __cy_n1
  ON __cy_path0.end_id = __cy_n1._id
WHERE __cy_path0.depth >= 1
  AND array_position(
        string_to_array(__cy_n1._label, ':'), 'Person'
      ) IS NOT NULL

The recursive CTE maintains visited edge arrays for cycle detection, and the outer query applies the minimum depth and target label constraints.

12.7.4 WITH Aggregation and Scope Boundary

Cypher:

MATCH (a:Person)-[:KNOWS]->(b:Person)
WITH a.name AS person, count(b) AS friend_count
WHERE friend_count > 5
RETURN person, friend_count
ORDER BY friend_count DESC

Generated SQL (conceptual):

SELECT __cy_scope0.person, __cy_scope0.friend_count
FROM (
  SELECT __cy_n0.name AS person, count(*) AS friend_count
  FROM social_nodes AS __cy_n0
  JOIN social_edges AS __cy_r0
    ON __cy_n0._id = __cy_r0._source
  JOIN social_nodes AS __cy_n1
    ON __cy_r0._target = __cy_n1._id
  WHERE array_position(
          string_to_array(__cy_n0._label, ':'), 'Person'
        ) IS NOT NULL
    AND __cy_r0._type = 'KNOWS'
    AND array_position(
          string_to_array(__cy_n1._label, ':'), 'Person'
        ) IS NOT NULL
  GROUP BY __cy_n0.name
) AS __cy_scope0
WHERE __cy_scope0.friend_count > 5
ORDER BY __cy_scope0.friend_count DESC

The WITH clause creates a subquery boundary. The inner query performs the join and aggregation, and the outer query applies the post-aggregation filter and final ordering.

12.7.5 MERGE Find-or-Create

Cypher:

MATCH (a:Person {name: 'Alice'})
MERGE (a)-[:KNOWS]->(b:Person {name: 'Bob'})
ON CREATE SET b.created_at = '2024-01-01'
ON MATCH SET b.last_seen = '2024-06-15'
RETURN b.name

This generates a two-branch UNION ALL structure. The scope CTE captures the current state (the matched a variable). The matched branch performs a LATERAL INNER JOIN to find existing (a)-[:KNOWS]->(b:Person {name: 'Bob'}) patterns and applies the ON MATCH SET. The created branch performs a LATERAL LEFT JOIN to the same pattern; where the probe returns NULL, it invokes graph_create_node and graph_create_edge and applies the ON CREATE SET. The UNION ALL of both branches ensures exactly one row per input, with the correct variable binding regardless of whether the pattern was matched or created.

12.8 Optimization Considerations

12.8.1 Predicate Pushdown Through Subquery Layers

Because the Cypher-to-SQL translation produces standard SQL AST nodes, the existing optimizer passes apply without modification. Predicate pushdown through subquery layers is handled by the plan optimizer's pushdown_predicates_() pass, which recognizes that filters on the outer query can often be pushed into subqueries.

For Cypher queries, this means that a WHERE clause in an outer SQL context wrapping a cypher() call can potentially be pushed down into the generated subquery:

SELECT * FROM cypher('social', $$
    MATCH (n:Person) RETURN n.name, n.age
$$) AS (name TEXT, age INT)
WHERE age > 21;

The age > 21 filter can be pushed below the subquery boundary, reducing the number of rows that flow through the pipeline.

12.8.2 Join Reordering for Graph Patterns

Multi-hop traversal patterns produce a chain of JOINs. The SQL optimizer's join reordering pass (described in Chapter 9) can evaluate different join orderings based on selectivity estimates. For a pattern like:

MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(b)-[:WORKS_AT]->(c:Company)

The optimizer might choose to start with the most selective scan (a.name = 'Alice'), then join edges, then join the next node — or it might start from a different point if statistics suggest a different ordering would be more efficient.

12.8.3 Index Utilization

The _label and _type columns on the node and edge collections are common filter targets. Indexes on these columns enable fast filtering, and the optimizer's index selection logic applies to the WHERE predicates generated by label and type constraints. Property constraints (e.g., {name: 'Alice'}) translate to equality predicates on collection columns, which benefit from column-level indexes.

12.8.4 Relationship Uniqueness

The openCypher specification requires that within a single MATCH clause, each relationship is traversed at most once per result row. Cognica's flat storage model and join-based translation naturally enforce this for fixed-length patterns, since each edge table scan produces distinct rows. For variable-length paths, the recursive CTE's visited_edge_ids array explicitly prevents revisiting edges, implementing cycle detection and relationship uniqueness simultaneously.

12.9 Summary

Cognica's Cypher implementation demonstrates that a graph query language need not require a separate execution engine. By rewriting Cypher into SQL at parse time, the system leverages the full power of an existing query optimizer and executor.

The architecture consists of three tightly integrated components:

A hand-written lexer and parser that transforms Cypher text into a typed abstract syntax tree with 19 expression variants and 9 clause types, supporting the core openCypher grammar including variable-length paths, list comprehensions, and pattern existence tests.
A binding-aware translator that converts the Cypher AST into SQL AST nodes through a clause-by-clause pipeline. Node patterns become table scans, relationship patterns become JOINs, variable-length paths become recursive CTEs, and write operations become graph storage function calls. The translator tracks variable bindings across scope boundaries, ensuring that property access, function calls, and aggregation all resolve correctly.
Transparent integration with the SQL pipeline, where the generated subquery passes through plan building, optimization (predicate pushdown, join reordering, index selection), physical planning, and CVM compilation without any Cypher-specific processing. The optimizer sees standard SQL constructs and applies its full repertoire of transformations.

The cypher() sentinel function — inspired by Apache AGE — provides a clean interface between the SQL and Cypher worlds. Users write graph patterns in Cypher's visual syntax, and the database executes them with the same efficiency as hand-written SQL joins. Chapter 3 provides the algebraic foundations that this translation preserves, and Chapter 11 describes the graph storage layer that the generated queries ultimately access.