Chapter 12: Cypher Query Language
Graph databases have long suffered from a fragmentation problem: users must choose between the expressiveness of graph query languages and the ubiquity of SQL. Cognica resolves this tension through an approach inspired by Apache AGE — Cypher queries are rewritten into SQL at parse time, allowing the full power of a declarative graph pattern language to flow through an existing SQL optimizer and executor. This chapter details the lexer, parser, abstract syntax tree, and the Cypher-to-SQL rewriting engine that makes this possible.
12.1 Architecture Overview
12.1.1 Why Cypher in a SQL Database
Graph traversal queries in SQL are notoriously verbose. Consider finding friends-of-friends who work at the same company as a given person. The pure SQL approach using graph table functions requires multiple CROSS JOIN LATERAL invocations, explicit endpoint tracking, and careful column aliasing:
SELECT DISTINCT f2_props->>'name' AS friend_name,
c_props->>'name' AS company
FROM graph_neighbors('social', 'social:Person:1001',
'KNOWS', 'outgoing', 1) f1
CROSS JOIN LATERAL graph_neighbors('social', f1.id,
'KNOWS', 'outgoing', 1) f2
CROSS JOIN LATERAL graph_edges('social', f2.id,
'WORKS_AT', 'outgoing') we
CROSS JOIN LATERAL graph_get_node('social', we.end_id) c
WHERE f2.id != 'social:Person:1001';
The same query in Cypher reads like a diagram of the graph pattern it describes:
MATCH (a:Person {name: 'Alice'})-[:KNOWS]->()-[:KNOWS]->(fof:Person),
(fof)-[:WORKS_AT]->(c:Company)
WHERE fof <> a
RETURN DISTINCT fof.name AS friend_name, c.name AS company
The ASCII-art syntax of Cypher — parentheses for nodes, square brackets for relationships, arrows for direction — maps directly to the visual intuition that graph users carry. Variable-length paths, optional patterns, and mutating operations all benefit from this visual clarity. Rather than building an entirely separate graph query engine, Cognica translates Cypher into its existing SQL infrastructure at parse time, inheriting decades of query optimization research for free.
12.1.2 Parse-Time Rewriting vs Runtime Interpretation
Two fundamentally different strategies exist for integrating a second query language into a SQL engine:
Runtime interpretation treats the graph language as a black box. A table function receives the query string, executes it against the storage layer, and returns results. The SQL optimizer has no visibility into the internal structure of the graph query and cannot push predicates, reorder joins, or select indexes.
Parse-time rewriting translates the graph query into SQL AST nodes before any planning occurs. The SQL optimizer sees a standard subquery with joins, filters, and projections — it can apply predicate pushdown, join reordering, index selection, and all other optimization passes without modification.
Cognica adopts the parse-time rewriting approach, following the architecture established by Apache AGE (A Graph Extension for PostgreSQL). The cypher() function that appears in SQL is not a real function that executes at runtime. It is a sentinel that the AST builder intercepts during parse tree construction:
SELECT * FROM cypher('social', $$
MATCH (a:Person)-[:FOLLOWS*1..3]->(b:Person)
WHERE a.name = 'Alice'
RETURN b.name, b.role
$$) AS (name TEXT, role TEXT);
When the AST builder encounters cypher() in a FROM clause, it extracts the graph name and Cypher string, invokes the Cypher parser and translator, and replaces the function reference with the generated SQL subquery. From that point forward, every downstream component — plan builder, optimizer, physical planner, CVM compiler — operates on standard SQL AST nodes.
12.1.3 The cypher() Sentinel Function
The cypher() function follows Apache AGE conventions:
| Argument | Purpose | Example |
|---|---|---|
| Graph name | Identifies the graph namespace | 'social' |
| Cypher query | Dollar-quoted Cypher text | $$ MATCH ... RETURN ... $$ |
| Parameters (optional) | JSON object for $param binding | '{"name": "Alice"}'::jsonb |
The AS (column TYPE, ...) clause following the function call defines the output schema. The number of columns in the AS clause must match the number of RETURN items in the Cypher query. This requirement is validated during rewriting and produces a clear error message on mismatch.
12.1.4 Overall Translation Flow
The flow consists of three major phases. First, the outer SQL is parsed by libpg_query, which treats cypher() as an ordinary function call. Second, the AST builder intercepts this function call and invokes the Cypher pipeline: lexical analysis, parsing into a Cypher AST, and rewriting into SQL AST nodes. Third, the generated SQL AST re-enters the standard planning pipeline, where it is indistinguishable from hand-written SQL.
12.2 Lexical Analysis
The Cypher lexer (CypherLexer) transforms raw query text into a stream of typed tokens. It is a hand-written scanner that handles Cypher's unique lexical requirements — pattern arrows, case-insensitive keywords, dollar-quoted parameters, and escaped identifiers.
12.2.1 Token Types
The lexer produces tokens of type TokenType, an enumeration with 29 variants:
enum class TokenType {
// Delimiters
kLeftParen, kRightParen, // ( )
kLeftBracket, kRightBracket, // [ ]
kLeftBrace, kRightBrace, // { }
kComma, kColon, kDot, kDotDot, // , : . ..
kDollar, kPipe, // $ |
// Comparison and assignment
kEquals, kNotEquals, // = <>
kLessThan, kLessThanOrEqual, // < <=
kGreaterThan, kGreaterThanOrEqual, // > >=
kRegexMatch, // =~
// Arithmetic
kPlus, kMinus, kStar, kSlash, kPercent, // + - * / %
// Pattern arrows
kArrowLeft, kArrowRight, // <- ->
// Atoms
kIdentifier, kInteger, kFloat, kString,
// Control
kEof, kError,
};
Each Token carries its type, string value, and source location (line and column), enabling precise error reporting throughout the parsing and rewriting phases:
struct Token {
TokenType type {TokenType::kError};
std::string value;
int32_t line {1};
int32_t column {1};
};
12.2.2 Multi-Character Operators and Pattern Arrows
Several Cypher operators consist of two characters. The lexer resolves these using ordered lookahead before falling through to single-character cases. The ordering matters: <- must be checked before < alone, and .. before a single .:
// cypher_lexer.cpp:103-137
if (c == '<' && peek_char_(1) == '-') {
advance_char_(); advance_char_();
return Token {TokenType::kArrowLeft, {}, start_line, start_column};
}
if (c == '-' && peek_char_(1) == '>') {
advance_char_(); advance_char_();
return Token {TokenType::kArrowRight, {}, start_line, start_column};
}
if (c == '.' && peek_char_(1) == '.') {
advance_char_(); advance_char_();
return Token {TokenType::kDotDot, {}, start_line, start_column};
}
The <- and -> tokens are central to Cypher's visual pattern syntax. A relationship pattern like (a)-[:KNOWS]->(b) produces the token sequence: (, a, ), -, [, :, KNOWS, ], ->, (, b, ). The -> is a single token, not two separate tokens, which simplifies the parser's relationship direction logic.
The .. token (range operator) appears exclusively in variable-length path specifications like [:KNOWS*1..3], distinguishing it from the single . used for property access.
12.2.3 Case-Insensitive Keyword Recognition
Unlike SQL, which has a fixed set of reserved words handled by the grammar, Cypher keywords are recognized at the parser level through case-insensitive string comparison. The lexer emits all alphabetic sequences as kIdentifier tokens, and the parser checks whether an identifier matches a keyword using a helper function:
// cypher_parser.cpp:19-31
auto iequals(std::string_view lhs, std::string_view rhs) -> bool {
if (lhs.size() != rhs.size()) {
return false;
}
for (size_t i = 0; i < lhs.size(); ++i) {
auto a = static_cast<unsigned char>(lhs[i]);
auto b = static_cast<unsigned char>(rhs[i]);
if (std::tolower(a) != std::tolower(b)) {
return false;
}
}
return true;
}
This design means MATCH, match, and Match are all valid. It also means that keywords like MATCH and RETURN are not reserved — they can be used as variable names in contexts where the parser does not expect a keyword. This permissiveness follows the openCypher specification's approach to context-sensitive keywords.
12.2.4 Multi-Word Keywords
Several Cypher constructs use multi-word keywords: ORDER BY, STARTS WITH, ENDS WITH, IS NULL, IS NOT NULL, and OPTIONAL MATCH. These are not tokenized as single tokens. Instead, the parser consumes them as sequences of identifiers. For example, ORDER BY is parsed as the keyword ORDER followed by the keyword BY:
// cypher_parser.cpp:486-492
if (match_keyword_("ORDER")) {
expect_keyword_("BY");
body.order_by.push_back(parse_sort_item_());
while (match_(TokenType::kComma)) {
body.order_by.push_back(parse_sort_item_());
}
}
12.2.5 String Literals and Escape Sequences
Cypher strings can be delimited by either single or double quotes. The lexer supports standard escape sequences within strings: \n (newline), \r (carriage return), \t (tab), \\ (backslash), \' (single quote), and \" (double quote). Unterminated strings produce an error token rather than silently consuming the rest of the input (cypher_lexer.cpp:342-391).
12.2.6 Escaped Identifiers and Parameters
Backtick-delimited identifiers (`my variable`) allow arbitrary characters in variable names. Doubled backticks within the delimiters represent a literal backtick. Parameters use the $ prefix followed by an identifier ($name, $personId), tokenized as a kDollar token followed by a kIdentifier.
12.2.7 Position Tracking
The lexer maintains line and column counters that are updated on every character advance. Newline characters increment the line counter and reset the column to 1. Each token records its starting position, providing the foundation for error messages that include precise source locations (cypher_lexer.cpp:426-433).
12.3 Parsing
The Cypher parser (CypherParser) is a hand-written recursive descent parser that consumes the token stream and produces a typed abstract syntax tree. Recursive descent was chosen over parser generators (Bison/Flex, ANTLR) to avoid external dependencies and to enable precise error messages that reference Cypher-specific concepts.
12.3.1 Recursive Descent Design
The parser maintains a current token and a lexer reference. It advances through the token stream using a small set of primitives:
| Method | Behavior |
|---|---|
current_() | Returns the current token without consuming it |
peek_() | Returns the next token without consuming either |
advance_() | Consumes the current token and loads the next |
expect_(type) | Consumes the current token if it matches; throws on mismatch |
match_(type) | Consumes and returns true if the current token matches; false otherwise |
check_keyword_(kw) | Tests if the current token is the given keyword (case-insensitive) |
These primitives compose cleanly: match_keyword_("OPTIONAL") tests and consumes OPTIONAL in one call, while check_keyword_("MATCH") tests without consuming.
12.3.2 Grammar Structure
The parser implements the following grammar hierarchy:
Query -> ClauseSequence (UNION [ALL] ClauseSequence)*
ClauseSequence -> Clause+
Clause -> MatchClause
| CreateClause
| MergeClause
| DeleteClause
| SetClause
| RemoveClause
| ReturnClause
| WithClause
| UnwindClause
MatchClause -> [OPTIONAL] MATCH PatternPart (, PatternPart)*
[WHERE Expression]
PatternPart -> [variable =] PatternElement
PatternElement -> NodePattern (RelationshipChain)*
NodePattern -> ( [variable] [:Label]* [{properties}] )
RelationshipChain -> RelationshipPattern NodePattern
RelationshipPattern -> <-[details]- | -[details]-> | -[details]-
ProjectionBody -> [DISTINCT] ProjectionItem (, ProjectionItem)*
[ORDER BY SortItem (, SortItem)*]
[SKIP expr] [LIMIT expr]
12.3.3 Clause Parsing Dispatch
The parse_clause_() method inspects the current token and dispatches to the appropriate clause parser. It uses keyword checks rather than a token-type switch because Cypher keywords are context-sensitive identifiers:
// cypher_parser.cpp:177-209
auto CypherParser::parse_clause_() -> ast::Clause {
if (check_keyword_("OPTIONAL") || check_keyword_("MATCH")) {
return ast::Clause {parse_match_clause_()};
}
if (check_keyword_("CREATE")) {
return ast::Clause {parse_create_clause_()};
}
if (check_keyword_("MERGE")) {
return ast::Clause {parse_merge_clause_()};
}
if (check_keyword_("DETACH") || check_keyword_("DELETE")) {
return ast::Clause {parse_delete_clause_()};
}
// ... remaining clauses ...
error_("Expected Cypher clause");
}
OPTIONAL must be checked alongside MATCH because the two-word keyword OPTIONAL MATCH begins with OPTIONAL. Similarly, DETACH DELETE begins with DETACH.
12.3.4 Pattern Parsing
Graph patterns are the syntactic heart of Cypher. A pattern consists of a chain of node-relationship-node segments:
(a:Person {name: 'Alice'})-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company)
The parser handles this as a PatternElement: an initial NodePattern followed by zero or more RelationshipChain entries. Each chain contains a RelationshipPattern and a NodePattern.
Relationship direction detection works in two phases. First, the parser checks whether the relationship begins with <- (left arrow, indicating an incoming edge). If not, it expects a - (dash). After the optional bracket-enclosed details, the parser checks the trailing token: -> sets direction to kRight, another - sets direction to kBoth (undirected):
// cypher_parser.cpp:407-451
auto CypherParser::parse_relationship_pattern_()
-> ast::RelationshipPattern {
// ...
if (match_(TokenType::kArrowLeft)) {
pattern.direction = ast::RelationshipDirection::kLeft;
} else {
expect_(TokenType::kMinus);
}
// ... parse bracket contents ...
if (pattern.direction == ast::RelationshipDirection::kLeft) {
expect_(TokenType::kMinus);
} else if (match_(TokenType::kArrowRight)) {
pattern.direction = ast::RelationshipDirection::kRight;
} else {
expect_(TokenType::kMinus);
pattern.direction = ast::RelationshipDirection::kBoth;
}
return pattern;
}
Variable-length path specifications (*1..3, *..5, *3) are parsed when a kStar token appears inside the relationship brackets. The range parser handles four cases: both bounds present (*1..3), only minimum (*3 is both min and max), only maximum (*..5), and unbounded (*):
// cypher_parser.cpp:453-472
auto CypherParser::parse_relationship_range_()
-> ast::RelationshipRange {
auto range = ast::RelationshipRange {};
if (check_(TokenType::kInteger)) {
range.min_hops = std::stoll(advance_().value);
}
if (match_(TokenType::kDotDot)) {
if (check_(TokenType::kInteger)) {
range.max_hops = std::stoll(advance_().value);
}
return range;
}
if (range.min_hops.has_value()) {
range.max_hops = range.min_hops; // *3 means exactly 3 hops
}
return range;
}
12.3.5 Expression Parsing with Precedence Levels
Cypher expressions follow a precedence hierarchy implemented through a chain of parsing methods. Each level calls the next-higher-precedence level, and loops to handle left-associative operators at its own level:
| Precedence | Operators | Parser Method |
|---|---|---|
| 1 (lowest) | OR | parse_or_expression_() |
| 2 | AND | parse_and_expression_() |
| 3 | NOT | parse_not_expression_() |
| 4 | = <> < <= > >= IN STARTS WITH ENDS WITH CONTAINS IS NULL IS NOT NULL =~ | parse_comparison_expression_() |
| 5 | + - | parse_additive_expression_() |
| 6 | * / % | parse_multiplicative_expression_() |
| 7 | Unary + - | parse_unary_expression_() |
| 8 | Property lookup, subscript, function call | parse_postfix_expression_() |
| 9 (highest) | Literals, variables, parameters, CASE, parenthesized | parse_primary_expression_() |
Negated predicates (NOT IN, NOT STARTS WITH, NOT ENDS WITH, NOT CONTAINS) are handled by lookahead in the comparison parser. When NOT is followed by one of these predicate keywords, the parser consumes NOT, parses the predicate, and wraps the result in a unary NOT node (cypher_parser.cpp:600-643).
12.3.6 Error Handling
Parse errors are reported through the ParseError structure, which carries a message and the source location (line and column). The parse() entry point wraps the parsing logic in a try-catch block and returns std::expected<ast::SingleQuery, ParseError>:
// cypher_parser.cpp:127-139
auto CypherParser::parse()
-> std::expected<ast::SingleQuery, ParseError> {
try {
if (current_token_.type == TokenType::kError) {
error_(current_token_, current_token_.value);
}
return parse_query_();
} catch (const std::exception& e) {
if (error_info_.message.empty()) {
error_info_.message = e.what();
}
return std::unexpected(error_info_);
}
}
The error_() methods record the error location before throwing, ensuring that the caller receives both the message and the position in the Cypher source where parsing failed.
12.4 Abstract Syntax Tree
The Cypher AST (cognica::sql::graph::cypher::ast) represents the parsed structure of a Cypher query as a tree of typed nodes. It uses std::variant for sum types and std::unique_ptr for ownership, following the same conventions as the SQL AST elsewhere in the codebase.
12.4.1 Expression Nodes
Expressions are represented as a struct containing a std::variant of 19 expression types:
using ExpressionData
= std::variant<Wildcard, NullLiteral, BooleanLiteral,
IntegerLiteral, FloatLiteral, StringLiteral,
Variable, Parameter, MapLiteral, ListLiteral,
PropertyLookup, SubscriptExpression,
SliceExpression, ListComprehensionExpression,
ExistsPatternExpression, FunctionCall,
UnaryExpression, BinaryExpression,
CaseExpression>;
struct Expression {
ExpressionData data;
template<typename T>
auto is() const -> bool {
return std::holds_alternative<T>(data);
}
template<typename T>
auto as() -> T& { return std::get<T>(data); }
template<typename T>
auto as() const -> const T& { return std::get<T>(data); }
};
The is<T>() and as<T>() template methods provide a clean dispatch interface. The pattern if (expr.is<PropertyLookup>()) { const auto& lookup = expr.as<PropertyLookup>(); ... } appears throughout the translator.
Key expression types include:
| Type | Purpose | Example |
|---|---|---|
Variable | Bound variable reference | n, r |
PropertyLookup | Property access | n.name |
FunctionCall | Function invocation | id(n), count(*) |
Parameter | Query parameter | $name |
BinaryExpression | Binary operators | a.age > 21 |
UnaryExpression | Unary operators | NOT x, n IS NULL |
CaseExpression | CASE/WHEN/THEN/ELSE | CASE x WHEN 1 THEN 'a' END |
ListComprehensionExpression | List comprehension | [x IN list WHERE x > 0 | x * 2] |
ExistsPatternExpression | Pattern existence test | exists((a)-->(b)) |
12.4.2 Clause Nodes
Clauses form the top-level structure of a Cypher query. Like expressions, they use a variant-based design:
using ClauseData
= std::variant<MatchClause, CreateClause, MergeClause,
DeleteClause, SetClause, RemoveClause,
ReturnClause, WithClause, UnwindClause>;
Each clause type carries the data specific to its syntax:
- MatchClause: An
optionalflag, a list of patterns, and an optional WHERE expression. - CreateClause: A list of patterns describing nodes and edges to create.
- MergeClause: A single pattern and a list of
MergeActionentries (ON MATCH SET, ON CREATE SET). - DeleteClause: A
detachflag and a list of variable names to delete. - SetClause: A list of property assignment items (target expression = value expression).
- RemoveClause: A list of property removal items.
- ReturnClause and WithClause: A
ProjectionBodycontaining projection items, ORDER BY, SKIP, and LIMIT. - UnwindClause: An expression and a variable name for the unwound elements.
12.4.3 Pattern Nodes
Patterns represent the graph structure to match or create:
struct NodePattern {
std::string variable; // Optional binding name
std::vector<std::string> labels; // Zero or more labels
std::optional<MapLiteral> properties; // Inline property constraints
};
struct RelationshipPattern {
std::string variable;
std::vector<std::string> types;
std::optional<RelationshipRange> range;
std::optional<MapLiteral> properties;
RelationshipDirection direction; // kLeft, kRight, kBoth
};
struct RelationshipChain {
RelationshipPattern relationship;
NodePattern node;
};
struct PatternElement {
NodePattern node; // Starting node
std::vector<RelationshipChain> chains; // Relationship-node pairs
};
struct PatternPart {
std::string variable; // Optional path variable
PatternElement element;
};
A PatternElement models the chain structure of Cypher patterns. The pattern (a)-[r]->(b)-[s]->(c) produces a PatternElement with starting node a and two chains: (r, b) and (s, c). This representation directly mirrors the visual syntax and simplifies the translator's iteration over relationship hops.
12.4.4 Query Structure
At the top level, a SingleQuery contains a list of clauses and optional UNION parts:
struct SingleQuery {
std::vector<Clause> clauses;
std::vector<UnionQueryPart> unions;
};
struct UnionQueryPart {
bool all {false};
std::vector<Clause> clauses;
};
This structure supports queries with UNION: MATCH ... RETURN ... UNION ALL MATCH ... RETURN ....
12.5 Cypher-to-SQL Rewriting
The rewriter (Translator class in cypher_rewriter.cpp) is the most substantial component of the Cypher subsystem at over 5,000 lines. It transforms a Cypher AST into SQL AST nodes that the standard planner can process. The design follows the Apache AGE architecture: each clause contributes to a growing SQL SelectStmt, and scope boundaries (WITH clauses, MERGE) introduce subquery nesting.
12.5.1 Subquery Pipeline Architecture
The core insight of the Cypher-to-SQL translation is that each Cypher clause modifies or wraps the current SQL SELECT statement. The translate_clause_sequence_() method processes clauses in order:
// cypher_rewriter.cpp:276-391
auto translate_clause_sequence_(
const std::vector<cyast::Clause>& clauses)
-> std::expected<sqlast::StmtPtr, std::string> {
auto select = sqlast::make_select_stmt();
for (size_t i = 0; i < clauses.size(); ++i) {
const auto& clause = clauses[i];
if (clause.is<cyast::MatchClause>()) {
// Adds FROM/JOIN/WHERE to current select
translate_match_clause_(..., *select);
}
if (clause.is<cyast::WithClause>()) {
// Wraps current select in a subquery, creates new outer select
select = materialize_with_clause_(std::move(select), ...);
}
if (clause.is<cyast::ReturnClause>()) {
// Adds SELECT list, ORDER BY, LIMIT to current select
translate_projection_(..., *select, ...);
}
// ... other clauses ...
}
return sqlast::StmtPtr {std::move(select)};
}
MATCH clauses add FROM entries (for node/edge table scans) and JOIN clauses (for connecting nodes to edges) to the current SELECT. Multiple MATCH clauses in sequence add to the same SELECT, building up a multi-table join.
WITH clauses create a scope boundary. The current SELECT is wrapped inside a subquery, and a fresh outer SELECT is created. Variables projected by WITH become the only visible bindings in the new scope. This directly models Cypher's scoping semantics.
RETURN adds the SELECT list (target entries), ORDER BY, SKIP, and LIMIT to the final SELECT.
12.5.2 Variable Binding Tracking
The translator maintains a binding table (std::unordered_map<std::string, Binding>) that maps Cypher variable names to their SQL representations. Each binding records:
struct Binding {
BindingKind kind; // kNode, kEdge, or kScalar
std::string qualifier; // SQL table alias
std::string id_column; // Column containing _id
std::string label_column; // Column containing _label
std::string type_column; // Column containing _type (edges)
std::string source_column; // Column containing _source (edges)
std::string target_column; // Column containing _target (edges)
bool direct_properties; // Can access properties as direct columns?
bool is_path; // Is this a path variable?
};
When the translator encounters a variable reference like n.name, it looks up n in the binding table, determines whether n is a node or edge, and generates the appropriate SQL column reference. For nodes with direct property access, n.name becomes t_n.name (a direct column reference). After a scope boundary (WITH clause), properties may be carried through as explicit projected columns, and the binding's property_columns map tracks which properties are available.
12.5.3 Pattern Translation
Node Patterns
Each node variable in a MATCH pattern becomes a scan of the {graph}_nodes collection. For a pattern like (n:Person {name: 'Alice'}), the translator:
- Acquires a binding for variable
nwith a fresh SQL alias (e.g.,__cy_n0). - Adds a FROM or JOIN clause:
FROM social_nodes AS __cy_n0. - Generates label constraints:
array_position(string_to_array(__cy_n0._label, ':'), 'Person') IS NOT NULL. - Generates property constraints:
__cy_n0.name = 'Alice'.
The label matching uses string_to_array and array_position rather than simple equality because a node can carry multiple labels stored as a colon-separated string (e.g., Person:Employee).
Edge Patterns
Relationships are translated to JOINs against the {graph}_edges collection. The join condition depends on the relationship direction:
| Direction | Edge Entry Condition | Node Entry Condition |
|---|---|---|
Outgoing (->) | node._id = edge._source | edge._target = next_node._id |
Incoming (<-) | node._id = edge._target | edge._source = next_node._id |
Undirected (-) | node._id = edge._source OR node._id = edge._target | Complex bidirectional join |
For undirected edges, the translator generates an OR condition that accepts traversal in either direction:
// cypher_rewriter.cpp:3988-4011
auto make_edge_entry_condition_(
const Binding& node_binding, const Binding& edge_binding,
cyast::RelationshipDirection direction)
-> std::expected<sqlast::ExprPtr, std::string> {
if (direction == cyast::RelationshipDirection::kRight) {
return sqlast::make_binary_expr(
sqlast::BinaryOpType::kEqual, id_ref_(node_binding),
source_ref_(edge_binding));
}
if (direction == cyast::RelationshipDirection::kLeft) {
return sqlast::make_binary_expr(
sqlast::BinaryOpType::kEqual, id_ref_(node_binding),
target_ref_(edge_binding));
}
// Undirected: OR of both directions
auto left = sqlast::make_binary_expr(
sqlast::BinaryOpType::kEqual, id_ref_(node_binding),
source_ref_(edge_binding));
auto right = sqlast::make_binary_expr(
sqlast::BinaryOpType::kEqual, id_ref_(node_binding),
target_ref_(edge_binding));
return sqlast::make_binary_expr(
sqlast::BinaryOpType::kOr, std::move(left), std::move(right));
}
Multiple Relationship Types
When a relationship pattern specifies multiple types ([:KNOWS|WORKS_WITH]), each type becomes an equality predicate, and the predicates are combined with OR:
// cypher_rewriter.cpp:3070-3078
if (!relationship.types.empty()) {
auto type_predicates = std::vector<sqlast::ExprPtr> {};
for (const auto& type : relationship.types) {
type_predicates.push_back(sqlast::make_binary_expr(
sqlast::BinaryOpType::kEqual, type_ref_(binding),
sqlast::make_constant_string(type)));
}
predicates.push_back(combine_with_or_(type_predicates));
}
This generates SQL like: __cy_r0._type = 'KNOWS' OR __cy_r0._type = 'WORKS_WITH'.
12.5.4 Variable-Length Paths
Variable-length path patterns (e.g., [:KNOWS*1..3]) require recursive evaluation. The translator generates a WITH RECURSIVE common table expression (CTE) that iterates from the start node, following edges up to the maximum depth while tracking visited edges to prevent cycles.
Recursive CTE Structure
The generated CTE has five columns:
| Column | Purpose |
|---|---|
start_id | ID of the path's starting node |
end_id | ID of the current endpoint |
depth | Number of hops traversed so far |
visited_edge_ids | Array of edge IDs on the current path (cycle detection) |
visited_node_ids | Array of node IDs on the current path |
The CTE consists of a base case (single-hop traversal from start nodes) and a recursive case (extending paths by one hop):
The cycle detection mechanism uses array_position to check whether the current edge has already been visited:
// cypher_rewriter.cpp:1998-2004
auto visited_lookup_args = make_binary_expr_list(
sqlast::make_column_ref(previous_alias, "visited_edge_ids"),
id_ref_(recursive_edge_scan));
recursive_conditions.push_back(sqlast::make_unary_expr(
sqlast::UnaryOpType::kIsNull,
sqlast::make_function_call("array_position",
std::move(visited_lookup_args))));
This translates to: array_position(prev.visited_edge_ids, edge._id) IS NULL, which is true only when the edge has not been visited.
Zero-Hop Paths
When the minimum hop count is 0 (e.g., *0..3), the translator generates a separate zero-hop SELECT that returns the start node as both the start and end of a zero-length path. This SELECT is UNION ALL'd with the recursive CTE.
Segmented Multi-Hop Patterns
When a pattern contains multiple relationship segments and at least one is variable-length, the translator splits the pattern into individual segments. Each segment is translated independently — fixed-length segments become standard JOINs, and variable-length segments become recursive CTEs. The segments are connected by shared node variables at their endpoints (cypher_rewriter.cpp:686-774).
12.5.5 Clause Translation
MATCH to FROM/JOIN/WHERE
A MATCH clause adds table scans and joins to the current SELECT. The translator iterates over the pattern's node-relationship chains, acquiring bindings for each variable and generating join conditions:
MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE a.name = 'Alice'
Translates to:
SELECT ...
FROM social_nodes AS __cy_n0
JOIN social_edges AS __cy_r0
ON __cy_n0._id = __cy_r0._source
JOIN social_nodes AS __cy_n1
ON __cy_r0._target = __cy_n1._id
WHERE array_position(string_to_array(__cy_n0._label, ':'),
'Person') IS NOT NULL
AND __cy_n0.name = 'Alice'
AND array_position(string_to_array(__cy_n1._label, ':'),
'Person') IS NOT NULL
OPTIONAL MATCH to LEFT JOIN
OPTIONAL MATCH uses a LATERAL LEFT JOIN to ensure that the outer query's rows are preserved even when the pattern has no matches. The translator creates a sub-translator with a copy of the current bindings, translates the pattern in a subquery, and joins the subquery with a LEFT JOIN:
// cypher_rewriter.cpp:2280-2289
auto join = std::make_unique<sqlast::JoinExpr>();
join->join_type = sqlast::JoinType::kLeft;
join->join_condition = sqlast::make_constant_bool(true);
join->right_subquery = std::make_unique<sqlast::RangeSubselect>();
join->right_subquery->subquery = sqlast::StmtPtr {std::move(subselect)};
join->right_subquery->alias = "__cy_opt" + std::to_string(optional_index_++);
join->right_subquery->lateral = true;
The lateral = true flag ensures the subquery can reference columns from the outer query, which is necessary when the OPTIONAL MATCH pattern references variables bound by a preceding MATCH.
WITH as Scope Boundary
The WITH clause creates a subquery boundary. The current SELECT is materialized into a subquery, and the translator creates a fresh outer SELECT. Only the variables projected by WITH are visible in the subsequent scope:
MATCH (a:Person)
WITH a.name AS name, count(*) AS cnt
WHERE cnt > 5
MATCH (b:Person {name: name})
RETURN b
The WITH clause produces a subquery (SELECT a.name AS name, count(*) AS cnt FROM ...) AS __cy_scope0, and the subsequent MATCH clause operates within a new SELECT that reads from __cy_scope0.
UNWIND to Table Function
UNWIND transforms a list into rows. The translator generates an unnest() table function call, joined laterally to the current query:
UNWIND [1, 2, 3] AS x
RETURN x * 2
Becomes:
SELECT x * 2
FROM unnest(ARRAY[1, 2, 3]) AS __cy_unwind0(x)
When there is an existing FROM clause, the unnest is joined with a CROSS JOIN LATERAL to ensure it can reference outer columns (cypher_rewriter.cpp:2173-2211).
RETURN to SELECT
The RETURN clause translates to the SELECT list, with optional DISTINCT, ORDER BY, SKIP (OFFSET), and LIMIT. Graph variable references in RETURN are handled specially: bare node or edge variables are projected as JSON objects containing id, label, and properties, matching the Apache AGE output format (cypher_rewriter.cpp:2625-2679).
12.5.6 Expression Translation
Property Access
Property access on graph variables (n.name) is the most common expression in Cypher queries. The translator resolves it based on the binding's storage model:
- Direct properties: When the binding scans the node/edge collection directly,
n.namebecomes__cy_n0.name— a direct column reference on the collection. - Carried properties: After a scope boundary, the property may be available as a projected column tracked in the binding's
property_columnsmap. - Scalar bindings: For variables bound by UNWIND or WITH to scalar values, property access becomes a JSON field access.
// cypher_rewriter.cpp:3558-3574
auto translate_property_lookup_(
const cyast::PropertyLookup& lookup,
bool allow_identity_variables)
-> std::expected<sqlast::ExprPtr, std::string> {
if (lookup.object->is<cyast::Variable>()) {
const auto& variable = lookup.object->as<cyast::Variable>();
auto binding = find_binding_(variable.name);
// ... path variable handling ...
return property_ref_(*binding, lookup.property);
}
// ... fallthrough to general expression translation ...
}
Function Mapping
Cypher functions are translated to their SQL equivalents. The translator recognizes function names case-insensitively and supports the ag_catalog. schema prefix for Apache AGE compatibility:
| Cypher Function | SQL Translation |
|---|---|
id(n) | __cy_n0._id |
labels(n) | string_to_array(__cy_n0._label, ':') |
type(r) | __cy_r0._type |
start_id(r) | __cy_r0._source |
end_id(r) | __cy_r0._target |
startNode(r) | Lateral join to graph_get_node with _source |
endNode(r) | Lateral join to graph_get_node with _target |
properties(n) | Lateral join to graph_get_node, returns properties |
keys(n) | jsonb_object_keys_array(properties) |
exists(n.prop) | jsonb_exists(properties, 'prop') |
count(*) | count(*) |
collect(x) | Passed through as aggregate |
vertex_stats(n) | Complex subquery with in/out degree counts |
The count(*) case requires special handling: the parser produces a Wildcard inside a FunctionCall, and the translator sets the is_star flag on the SQL function call node rather than translating arguments (cypher_rewriter.cpp:3821-3826).
General functions not in the mapping table are passed through with their name lowercased, allowing standard SQL aggregate and scalar functions to work transparently within Cypher queries.
String Predicates
Cypher's string predicates translate to SQL function calls:
| Cypher | SQL |
|---|---|
a STARTS WITH 'x' | starts_with(a, 'x') |
a ENDS WITH 'x' | ends_with(a, 'x') |
a CONTAINS 'x' | strpos(a, 'x') > 0 |
a =~ 'pattern' | a ~ 'pattern' (regex match) |
a IN list | array_position(list, a) IS NOT NULL |
The IN operator uses array_position rather than SQL's IN keyword because Cypher's IN operates on array values, not SQL row sets (cypher_rewriter.cpp:3349-3365).
NULL Handling
Cypher's IS NULL and IS NOT NULL map directly to their SQL counterparts. The IS keyword is parsed as part of the comparison expression handler, which checks for the NOT modifier between IS and NULL (cypher_parser.cpp:588-598).
CASE Expressions
Cypher CASE expressions support both simple form (CASE expr WHEN val THEN ...) and searched form (CASE WHEN condition THEN ...). Both translate directly to SQL CASE expressions with the same structure.
List Operations
List literals [1, 2, 3] translate to SQL ARRAY[1, 2, 3]. List subscript list[0] translates to array subscript with zero-based to one-based index conversion (Cypher uses 0-based indexing, SQL uses 1-based). Negative indices in Cypher index from the end, so the translator generates a CASE expression:
// cypher_rewriter.cpp:3499-3511
// CASE WHEN index >= 0 THEN index + 1 ELSE index END
auto sql_case = std::make_unique<sqlast::CaseExpr>();
sqlast::CaseWhen when_clause;
when_clause.condition = sqlast::make_binary_expr(
sqlast::BinaryOpType::kGreaterThanOrEqual,
clone(index), sqlast::make_constant_int(0));
when_clause.result = sqlast::make_binary_expr(
sqlast::BinaryOpType::kAdd, clone(index),
sqlast::make_constant_int(1));
sql_case->when_clauses.push_back(std::move(when_clause));
sql_case->else_result = std::move(*index);
List comprehensions [x IN list WHERE x > 0 | x * 2] translate to ARRAY(SELECT x * 2 FROM unnest(list) AS t(x) WHERE x > 0), using a subquery with unnest and the array aggregate subquery type (cypher_rewriter.cpp:3882-3968).
12.5.7 Write Operation Handling
CREATE
The CREATE clause generates calls to graph_create_node and graph_create_edge table functions. These are not SQL INSERT statements — they invoke Cognica's graph storage API directly, which handles ID assignment, label indexing, and adjacency cache updates.
For each new node:
// cypher_rewriter.cpp:986-1000
auto create_func = std::make_unique<sqlast::RangeTableFunc>();
create_func->function_name = "graph_create_node";
create_func->arguments.push_back(
sqlast::make_constant_string(graph_name_));
create_func->arguments.push_back(
sqlast::make_constant_string(join_labels(node.labels)));
// ... properties as jsonb_build_object(...) ...
create_func->alias = create_alias;
create_func->output_columns.push_back("id");
After creation, the translator immediately looks up the created node to establish a binding with full metadata (id, label, properties), enabling subsequent clauses to reference the newly created entity.
For edges, the translator determines the actual source and target based on the relationship direction — a left-directed relationship (a)<-[:KNOWS]-(b) makes b the source and a the target (cypher_rewriter.cpp:1066-1073).
MERGE
MERGE implements find-or-create semantics. The translator builds two branches — a "matched" branch and a "created" branch — and combines them with UNION ALL:
The matched branch performs a LATERAL INNER JOIN against a subquery that attempts to match the pattern. If the pattern exists, the ON MATCH SET actions execute. The created branch performs a LATERAL LEFT JOIN against the same pattern subquery; rows where the probe returns NULL (pattern not found) pass through to the CREATE logic, and the ON CREATE SET actions execute.
This two-branch approach ensures that each input row produces exactly one output row, regardless of whether the pattern was matched or created.
DELETE and DETACH DELETE
The DELETE clause generates calls to graph_delete_node or graph_delete_edge. The translator validates that the target variable is bound to a node or edge (not a scalar), and orders deletions to process edges before nodes. DETACH DELETE passes a cascade flag to graph_delete_node, which removes all connected edges before removing the node:
// cypher_rewriter.cpp:1170-1183
delete_func->function_name = target.binding.kind == BindingKind::kEdge
? "graph_delete_edge"
: "graph_delete_node";
delete_func->arguments.push_back(
sqlast::make_constant_string(graph_name_));
delete_func->arguments.push_back(id_ref_(target.binding));
if (target.binding.kind == BindingKind::kNode) {
delete_func->arguments.push_back(
sqlast::make_constant_bool(detach_nodes));
}
SET and REMOVE
The SET clause translates property assignments into graph_update_node or graph_update_edge calls. Multiple assignments to the same variable are grouped into a single update call with a jsonb_build_object argument containing all key-value pairs. Setting a property to NULL removes it. The REMOVE clause uses the same update mechanism, passing a $delete marker key with the property names to remove.
After each mutation (SET, REMOVE, CREATE), the translator re-establishes the affected binding by looking up the entity with its new state, ensuring that subsequent clauses see the updated values.
12.6 Supported Syntax Reference
The following table summarizes the Cypher syntax supported by Cognica's implementation:
| Category | Syntax | Supported |
|---|---|---|
| Reading | ||
MATCH (n:Label) | Yes | |
MATCH (n:Label {prop: value}) | Yes | |
MATCH (n:Label1:Label2) — multi-label | Yes | |
MATCH (a)-[r:TYPE]->(b) | Yes | |
MATCH (a)<-[r:TYPE]-(b) | Yes | |
MATCH (a)-[r:TYPE]-(b) — undirected | Yes | |
MATCH (a)-[r:T1|T2]->(b) — multiple types | Yes | |
MATCH (a)-[*1..3]->(b) — variable-length | Yes | |
MATCH (a)-[*]->(b) — unbounded | Yes | |
OPTIONAL MATCH | Yes | |
WHERE | Yes | |
| Projection | ||
RETURN expr AS alias | Yes | |
RETURN DISTINCT | Yes | |
RETURN * | Yes | |
ORDER BY expr [ASC|DESC] | Yes | |
SKIP n | Yes | |
LIMIT n | Yes | |
| Composition | ||
WITH expr AS alias | Yes | |
WITH ... WHERE | Yes | |
UNWIND expr AS variable | Yes | |
UNION [ALL] | Yes | |
| Writing | ||
CREATE (n:Label {props}) | Yes | |
CREATE (a)-[:TYPE]->(b) | Yes | |
MERGE (n:Label {props}) | Yes | |
MERGE ... ON MATCH SET | Yes | |
MERGE ... ON CREATE SET | Yes | |
SET n.prop = value | Yes | |
SET n.prop = NULL (property removal) | Yes | |
DELETE n | Yes | |
DETACH DELETE n | Yes | |
REMOVE n.prop | Yes | |
| Expressions | ||
Arithmetic: +, -, *, /, % | Yes | |
Comparison: =, <>, <, <=, >, >= | Yes | |
Boolean: AND, OR, NOT | Yes | |
String: STARTS WITH, ENDS WITH, CONTAINS | Yes | |
Regex: =~ | Yes | |
IS NULL, IS NOT NULL | Yes | |
IN (list membership) | Yes | |
NOT IN, NOT STARTS WITH, etc. | Yes | |
| CASE/WHEN/THEN/ELSE/END | Yes | |
List literals: [1, 2, 3] | Yes | |
Map literals: {key: value} | Yes | |
List comprehension: [x IN list WHERE p | e] | Yes | |
Parameters: $name | Yes | |
| Functions | ||
id(n), labels(n), type(r) | Yes | |
startNode(r), endNode(r) | Yes | |
start_id(r), end_id(r) | Yes | |
properties(n), keys(n), exists(n.p) | Yes | |
count(*), count(expr) | Yes | |
collect(expr) | Yes | |
sum, avg, min, max | Yes | |
length(path), nodes(path), relationships(path) | Yes | |
substring, left, right | Yes | |
exists((a)-->(b)) — pattern exists | Yes | |
vertex_stats(n) | Yes |
12.7 Translation Examples
This section presents complete translations from Cypher to SQL, illustrating the key rewriting patterns.
12.7.1 Simple MATCH with Property Filter
Cypher:
MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person)
RETURN b.name AS friend_name
Generated SQL (conceptual):
SELECT __cy_n1.name AS friend_name
FROM social_nodes AS __cy_n0
JOIN social_edges AS __cy_r0
ON __cy_n0._id = __cy_r0._source
JOIN social_nodes AS __cy_n1
ON __cy_r0._target = __cy_n1._id
WHERE array_position(
string_to_array(__cy_n0._label, ':'), 'Person'
) IS NOT NULL
AND __cy_n0.name = 'Alice'
AND __cy_r0._type = 'KNOWS'
AND array_position(
string_to_array(__cy_n1._label, ':'), 'Person'
) IS NOT NULL
The translation creates three table scans (two node collections, one edge collection) connected by join conditions that enforce the relationship direction. Label checks use string_to_array and array_position to support multi-label nodes. The property filter {name: 'Alice'} becomes a WHERE clause predicate.
12.7.2 Multi-Hop Traversal with JOIN Chain
Cypher:
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:WORKS_AT]->(c:Company)
WHERE a.name = 'Alice'
RETURN b.name, c.name AS company
Generated SQL (conceptual):
SELECT __cy_n1.name, __cy_n2.name AS company
FROM social_nodes AS __cy_n0
JOIN social_edges AS __cy_r0
ON __cy_n0._id = __cy_r0._source
JOIN social_nodes AS __cy_n1
ON __cy_r0._target = __cy_n1._id
JOIN social_edges AS __cy_r1
ON __cy_n1._id = __cy_r1._source
JOIN social_nodes AS __cy_n2
ON __cy_r1._target = __cy_n2._id
WHERE array_position(
string_to_array(__cy_n0._label, ':'), 'Person'
) IS NOT NULL
AND __cy_n0.name = 'Alice'
AND __cy_r0._type = 'KNOWS'
AND array_position(
string_to_array(__cy_n1._label, ':'), 'Person'
) IS NOT NULL
AND __cy_r1._type = 'WORKS_AT'
AND array_position(
string_to_array(__cy_n2._label, ':'), 'Company'
) IS NOT NULL
Each hop in the chain adds one edge scan and one node scan, connected by join conditions. The SQL optimizer can reorder these joins based on selectivity estimates, potentially starting from the most selective predicate (a.name = 'Alice') and expanding outward.
12.7.3 Variable-Length Path with Recursive CTE
Cypher:
MATCH (a:Person {name: 'Alice'})-[:KNOWS*1..3]->(b:Person)
RETURN b.name, b.age
Generated SQL (conceptual):
WITH RECURSIVE __cy_path0(start_id, end_id, depth,
visited_edge_ids, visited_node_ids) AS (
-- Base case: one hop
SELECT n0._id, n1._id, 1,
ARRAY[e0._id], ARRAY[n0._id, n1._id]
FROM social_nodes AS n0
JOIN social_edges AS e0 ON n0._id = e0._source
JOIN social_nodes AS n1 ON e0._target = n1._id
WHERE array_position(
string_to_array(n0._label, ':'), 'Person'
) IS NOT NULL
AND n0.name = 'Alice'
AND e0._type = 'KNOWS'
UNION ALL
-- Recursive case: extend by one hop
SELECT prev.start_id, n1._id, prev.depth + 1,
array_append(prev.visited_edge_ids, e1._id),
array_append(prev.visited_node_ids, n1._id)
FROM __cy_path0 AS prev
JOIN social_edges AS e1 ON prev.end_id = e1._source
JOIN social_nodes AS n1 ON e1._target = n1._id
WHERE prev.depth < 3
AND array_position(prev.visited_edge_ids, e1._id) IS NULL
AND e1._type = 'KNOWS'
)
SELECT __cy_n1.name, __cy_n1.age
FROM __cy_path0 AS __cy_path0
JOIN social_nodes AS __cy_n0
ON __cy_path0.start_id = __cy_n0._id
JOIN social_nodes AS __cy_n1
ON __cy_path0.end_id = __cy_n1._id
WHERE __cy_path0.depth >= 1
AND array_position(
string_to_array(__cy_n1._label, ':'), 'Person'
) IS NOT NULL
The recursive CTE maintains visited edge arrays for cycle detection, and the outer query applies the minimum depth and target label constraints.
12.7.4 WITH Aggregation and Scope Boundary
Cypher:
MATCH (a:Person)-[:KNOWS]->(b:Person)
WITH a.name AS person, count(b) AS friend_count
WHERE friend_count > 5
RETURN person, friend_count
ORDER BY friend_count DESC
Generated SQL (conceptual):
SELECT __cy_scope0.person, __cy_scope0.friend_count
FROM (
SELECT __cy_n0.name AS person, count(*) AS friend_count
FROM social_nodes AS __cy_n0
JOIN social_edges AS __cy_r0
ON __cy_n0._id = __cy_r0._source
JOIN social_nodes AS __cy_n1
ON __cy_r0._target = __cy_n1._id
WHERE array_position(
string_to_array(__cy_n0._label, ':'), 'Person'
) IS NOT NULL
AND __cy_r0._type = 'KNOWS'
AND array_position(
string_to_array(__cy_n1._label, ':'), 'Person'
) IS NOT NULL
GROUP BY __cy_n0.name
) AS __cy_scope0
WHERE __cy_scope0.friend_count > 5
ORDER BY __cy_scope0.friend_count DESC
The WITH clause creates a subquery boundary. The inner query performs the join and aggregation, and the outer query applies the post-aggregation filter and final ordering.
12.7.5 MERGE Find-or-Create
Cypher:
MATCH (a:Person {name: 'Alice'})
MERGE (a)-[:KNOWS]->(b:Person {name: 'Bob'})
ON CREATE SET b.created_at = '2024-01-01'
ON MATCH SET b.last_seen = '2024-06-15'
RETURN b.name
This generates a two-branch UNION ALL structure. The scope CTE captures the current state (the matched a variable). The matched branch performs a LATERAL INNER JOIN to find existing (a)-[:KNOWS]->(b:Person {name: 'Bob'}) patterns and applies the ON MATCH SET. The created branch performs a LATERAL LEFT JOIN to the same pattern; where the probe returns NULL, it invokes graph_create_node and graph_create_edge and applies the ON CREATE SET. The UNION ALL of both branches ensures exactly one row per input, with the correct variable binding regardless of whether the pattern was matched or created.
12.8 Optimization Considerations
12.8.1 Predicate Pushdown Through Subquery Layers
Because the Cypher-to-SQL translation produces standard SQL AST nodes, the existing optimizer passes apply without modification. Predicate pushdown through subquery layers is handled by the plan optimizer's pushdown_predicates_() pass, which recognizes that filters on the outer query can often be pushed into subqueries.
For Cypher queries, this means that a WHERE clause in an outer SQL context wrapping a cypher() call can potentially be pushed down into the generated subquery:
SELECT * FROM cypher('social', $$
MATCH (n:Person) RETURN n.name, n.age
$$) AS (name TEXT, age INT)
WHERE age > 21;
The age > 21 filter can be pushed below the subquery boundary, reducing the number of rows that flow through the pipeline.
12.8.2 Join Reordering for Graph Patterns
Multi-hop traversal patterns produce a chain of JOINs. The SQL optimizer's join reordering pass (described in Chapter 9) can evaluate different join orderings based on selectivity estimates. For a pattern like:
MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(b)-[:WORKS_AT]->(c:Company)
The optimizer might choose to start with the most selective scan (a.name = 'Alice'), then join edges, then join the next node — or it might start from a different point if statistics suggest a different ordering would be more efficient.
12.8.3 Index Utilization
The _label and _type columns on the node and edge collections are common filter targets. Indexes on these columns enable fast filtering, and the optimizer's index selection logic applies to the WHERE predicates generated by label and type constraints. Property constraints (e.g., {name: 'Alice'}) translate to equality predicates on collection columns, which benefit from column-level indexes.
12.8.4 Relationship Uniqueness
The openCypher specification requires that within a single MATCH clause, each relationship is traversed at most once per result row. Cognica's flat storage model and join-based translation naturally enforce this for fixed-length patterns, since each edge table scan produces distinct rows. For variable-length paths, the recursive CTE's visited_edge_ids array explicitly prevents revisiting edges, implementing cycle detection and relationship uniqueness simultaneously.
12.9 Summary
Cognica's Cypher implementation demonstrates that a graph query language need not require a separate execution engine. By rewriting Cypher into SQL at parse time, the system leverages the full power of an existing query optimizer and executor.
The architecture consists of three tightly integrated components:
-
A hand-written lexer and parser that transforms Cypher text into a typed abstract syntax tree with 19 expression variants and 9 clause types, supporting the core openCypher grammar including variable-length paths, list comprehensions, and pattern existence tests.
-
A binding-aware translator that converts the Cypher AST into SQL AST nodes through a clause-by-clause pipeline. Node patterns become table scans, relationship patterns become JOINs, variable-length paths become recursive CTEs, and write operations become graph storage function calls. The translator tracks variable bindings across scope boundaries, ensuring that property access, function calls, and aggregation all resolve correctly.
-
Transparent integration with the SQL pipeline, where the generated subquery passes through plan building, optimization (predicate pushdown, join reordering, index selection), physical planning, and CVM compilation without any Cypher-specific processing. The optimizer sees standard SQL constructs and applies its full repertoire of transformations.
The cypher() sentinel function — inspired by Apache AGE — provides a clean interface between the SQL and Cypher worlds. Users write graph patterns in Cypher's visual syntax, and the database executes them with the same efficiency as hand-written SQL joins. Chapter 3 provides the algebraic foundations that this translation preserves, and Chapter 11 describes the graph storage layer that the generated queries ultimately access.