Chapter 9: Logical Planning and Optimization
9.1 Introduction to Query Optimization
Query optimization represents one of the most intellectually challenging problems in database systems. Given a declarative SQL query, the optimizer must find an efficient execution strategy from an exponentially large search space of possible plans. The quality of this decision directly impacts query performance by orders of magnitude.
Cognica implements a sophisticated multi-phase optimization pipeline that combines rule-based transformations with cost-based decisions. This chapter explores the logical planning phase, where queries are transformed into an algebraic representation and optimized through a series of rewrite rules before being converted to physical execution plans.
9.1.1 The Optimization Problem
Consider a simple three-way join query:
SELECT o.order_id, c.name, p.title
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE c.country = 'US' AND o.total > 100
Even for this modest query, the optimizer must decide:
-
Join Order: Should we join
orders-customersfirst, thenproducts? Ororders-productsfirst? With tables, there are possible bushy join trees. -
Join Algorithm: Hash join, merge join, nested loop, or index nested loop for each join?
-
Access Paths: Full table scan or index scan for each table? Which index if multiple are available?
-
Filter Placement: Apply
c.country = 'US'before or after the join? -
Sort Strategy: In-memory sort, external sort, or leverage index ordering?
The number of possible plans grows combinatorially. For a 10-way join, there are over 17 billion possible join orderings alone. The optimizer must navigate this space efficiently to find a good plan quickly.
9.1.2 Optimization Pipeline Architecture
Cognica's query optimizer implements a five-phase pipeline:
Each phase serves a distinct purpose:
| Phase | Component | Purpose |
|---|---|---|
| 1 | LogicalPlanBuilder | Convert AST to relational algebra |
| 2 | ASTOptimizer | Apply transformation rules |
| 3 | CardinalityEstimator | Estimate result sizes |
| 4 | PhysicalPlanBuilder | Select algorithms and access paths |
| 5 | MemoryBudgetAllocator | Distribute memory, predict spills |
9.2 Logical Plan Representation
The logical plan represents a query as a tree of relational algebra operators. Unlike the physical plan (which specifies concrete algorithms), the logical plan describes what operations to perform without specifying how.
9.2.1 Operator Types
Cognica defines fourteen logical operator types:
Each operator type corresponds to a relational algebra operation:
| Operator | Algebra | SQL Clause |
|---|---|---|
ScanOp | FROM table | |
FilterOp | WHERE condition | |
ProjectOp | SELECT columns | |
SortOp | ORDER BY | |
LimitOp | LIMIT n / LIMIT expr | |
GroupOp | GROUP BY | |
JoinOp | JOIN ... ON | |
UnionOp | UNION | |
SubqueryOp | FROM (SELECT ...) AS t | |
TableFuncOp | FROM func(...) | |
UnnestOp | FROM unnest(array) | |
SearchOp | Full-text search |
Expression-Based LIMIT/OFFSET: The LimitOp supports both constant values and arbitrary expressions for LIMIT and OFFSET. When the limit or offset is a constant integer literal, the planner stores it directly in the limit / offset optional fields. When the value comes from a parameter, a subquery, or an arithmetic expression, the planner stores the unevaluated AST expression in limit_expr / offset_expr and defers evaluation to execution time. This dual representation lets the optimizer reason about constant limits during planning (for TopK pushdown and cardinality estimation) while still supporting dynamic limits such as LIMIT $1.
9.2.2 Plan Tree Structure
A logical plan forms a tree where:
- Leaf nodes are
ScanOpoperators (data sources) - Internal nodes are transformation operators
- The root produces the final query result
For the query:
SELECT name, total FROM orders WHERE status = 'shipped' ORDER BY total DESC LIMIT 10
The logical plan tree is:
9.2.3 Operator Properties
Each logical operator maintains properties that guide optimization:
Cardinality Estimate: The estimated_rows_ field stores the predicted output row count. This is populated during cardinality estimation and used for cost calculations.
Schema Propagation: Operators track their output schema, enabling the optimizer to verify that downstream operators reference valid columns.
Ordering Properties: Some operators (Sort, certain scans) produce ordered output. Tracking this enables sort elimination optimizations.
9.3 Rule-Based Optimization
The first optimization phase applies a set of transformation rules that improve the plan regardless of data statistics. These rules implement algebraic equivalences that always (or almost always) improve performance.
9.3.1 Optimization Rule Framework
Cognica implements an extensible rule framework:
class OptimizationRule {
public:
virtual ~OptimizationRule() = default;
virtual auto apply(LogicalOpPtr root) -> std::pair<LogicalOpPtr, bool> = 0;
virtual auto name() const -> std::string = 0;
};
Each rule's apply() method returns:
- The transformed plan (possibly unchanged)
- A boolean indicating whether any transformation occurred
The ASTOptimizer manages rule application:
class ASTOptimizer {
public:
auto optimize(LogicalPlan plan) -> std::pair<LogicalPlan, OptimizationStats>;
private:
std::vector<std::unique_ptr<OptimizationRule>> rules_;
uint32_t max_passes_ = 10;
};
Rules are applied iteratively until a fixed point (no rule makes changes) or the maximum iteration count is reached. This fixed-point iteration ensures that rules can trigger each other—for example, filter merging might enable additional pushdown opportunities.
9.3.2 MergeFiltersRule
The first rule merges consecutive filter operators into a single filter with a conjunctive predicate.
Transformation:
Example:
Before:
Filter(price > 100)
Filter(category = 'electronics')
Scan(products)
After:
Filter(price > 100 AND category = 'electronics')
Scan(products)
Benefits:
- Reduces operator overhead (fewer virtual function calls)
- Enables better predicate evaluation order
- Improves index matching (composite predicates may match composite indexes)
The implementation recursively processes the tree bottom-up:
auto MergeFiltersRule::merge_recursive_(LogicalOpPtr node) -> LogicalOpPtr {
// First, recursively process children
for (size_t i = 0; i < node->children().size(); ++i) {
node->set_child(i, merge_recursive_(node->children()[i]));
}
// If this is a filter with a filter child, merge them
if (node->type() == LogicalOpType::kFilter) {
auto* filter = static_cast<FilterOp*>(node.get());
if (!filter->children().empty() &&
filter->children()[0]->type() == LogicalOpType::kFilter) {
auto* child_filter = static_cast<FilterOp*>(filter->children()[0].get());
// Create AND of both predicates
auto merged = ast::make_and(filter->predicate(), child_filter->predicate());
filter->set_predicate(std::move(merged));
// Skip the child filter
filter->set_child(0, child_filter->children()[0]);
}
}
return node;
}
9.3.3 PredicatePushdownRule
Predicate pushdown moves filter operations closer to data sources, reducing the number of rows that flow through the plan.
Transformation:
Pushdown Compatibility Matrix:
| Parent Operator | Can Push Through? | Condition |
|---|---|---|
| Project | Yes | If predicate columns are in projection |
| Sort | Yes | Always |
| Limit | Yes | Always |
| Skip | Yes | Always |
| Group | No | Predicate references aggregates |
| Join | Partial | Only predicates on single table |
| Search | No | Changes semantics |
Example:
Before:
Project(name, price)
Sort(price DESC)
Filter(category = 'books')
Scan(products)
After:
Project(name, price)
Sort(price DESC)
Filter(category = 'books')
Scan(products)
In this case, the filter is already at the optimal position. But consider:
Before:
Filter(category = 'books')
Project(name, price, category)
Sort(price DESC)
Scan(products)
After:
Project(name, price, category)
Sort(price DESC)
Filter(category = 'books')
Scan(products)
Benefits:
- Reduces data volume early in the pipeline
- Enables index utilization (filter at scan level can use indexes)
- Reduces memory pressure for blocking operators
9.3.4 PredicateOrderingRule
Within a compound predicate, the evaluation order affects performance. For short-circuit evaluation:
- In AND clauses: evaluate most selective predicates first
- In OR clauses: evaluate least selective predicates first
Selectivity Heuristics:
The rule uses hardcoded selectivity estimates when statistics are unavailable:
| Predicate Type | Estimated Selectivity |
|---|---|
Equality (=) | 0.01 (1%) |
Inequality (!=) | 0.99 (99%) |
Range (<, >, <=, >=) | 0.33 (33%) |
IN clause | 0.01 per value, max 0.30 |
Pattern match (LIKE) | 0.15 (15%) |
| Existence check | 0.90 (90%) |
Compound Selectivity:
For AND:
For OR:
For NOT:
Example:
Before:
WHERE status != 'deleted' AND id = 12345
After:
WHERE id = 12345 AND status != 'deleted'
The equality predicate (1% selectivity) is evaluated first because it will short-circuit more often than the inequality (99% selectivity).
9.3.5 TopKPushdownRule
When a query has both ORDER BY and LIMIT, the optimizer can avoid sorting all rows by using a heap-based Top-K algorithm.
Transformation:
Example:
Before:
Limit(10)
Sort(score DESC)
Scan(articles) -- 10 million rows
After (conceptually):
TopKSort(10, score DESC)
Scan(articles)
Algorithm Comparison:
| Algorithm | Time Complexity | Space Complexity |
|---|---|---|
| Full Sort | ||
| Top-K Heap |
For and :
- Full sort: ~233 million comparisons
- Top-K heap: ~33 million comparisons
The rule annotates the Sort operator with estimated_rows = k, signaling to the physical planner to use a heap-based strategy.
9.3.6 RemoveRedundantSkipRule
A Skip(0) operator has no effect and can be removed:
This rule handles edge cases where query builders generate trivial offsets.
9.3.7 Fixed-Point Iteration
Rules are applied iteratively because one transformation may enable another:
auto ASTOptimizer::optimize(LogicalPlan plan) -> std::pair<LogicalPlan, OptimizationStats> {
auto stats = OptimizationStats {};
auto root = plan.take_root();
for (uint32_t pass = 0; pass < max_passes_; ++pass) {
auto any_changed = false;
for (auto& rule : rules_) {
auto [new_root, changed] = rule->apply(std::move(root));
root = std::move(new_root);
if (changed) {
any_changed = true;
++stats.rules_applied;
}
}
if (!any_changed) {
break; // Fixed point reached
}
}
plan.set_root(std::move(root));
return {std::move(plan), stats};
}
Typical queries converge within 2-3 passes. The maximum of 10 passes handles pathological cases.
9.3.8 Logical Plan Optimization Pipeline
After the AST-level rule-based optimizer produces an initial logical plan, the PlanOptimizer applies a second round of transformations that operate directly on the logical plan tree. These passes require access to the full relational algebra structure and cannot be expressed as simple AST rewrite rules.
The PlanOptimizer executes a fixed sequence of passes:
1. simplify_predicates_ -- Boolean algebra simplification
2. estimate_selectivity_ -- Annotate filters and joins with selectivity
3. pushdown_predicates_ -- Push filters below joins and projections
4. unnest_subqueries_ -- Convert correlated subqueries to joins
5. eliminate_common_subexpressions_ -- Factor out repeated expressions
6. expand_join_or_to_union_ -- Rewrite OR-linked join predicates
7. reorder_joins_ -- Cost-based multi-way join enumeration
8. prune_columns_ -- Remove unused columns from scans
Unlike the fixed-point iteration of the AST optimizer, the plan optimizer runs each pass exactly once in the order shown above. The ordering is significant: selectivity estimates must be available before predicate pushdown can make cost-aware decisions, and subquery unnesting must complete before join reordering can consider the newly introduced join nodes.
9.3.9 Subquery Unnesting
Subquery unnesting (also called subquery decorrelation) converts correlated scalar subqueries and EXISTS / IN / NOT EXISTS subqueries into equivalent join operations. This transformation is important because a correlated subquery would otherwise require re-execution for every outer row, producing work, whereas a join can be executed in with a hash join.
EXISTS Subquery to Semi-Join:
A filter of the form EXISTS (SELECT 1 FROM S WHERE S.fk = R.pk) is converted to a semi-join:
IN Subquery to Semi-Join:
A predicate R.col IN (SELECT S.col FROM S WHERE ...) is rewritten similarly:
NOT EXISTS to Anti-Join:
Source Resolution: The unnesting pass must determine the data source for the right side of the generated join. Earlier versions only supported plain table references in the subquery's FROM clause. The current implementation uses a unified source resolution strategy that handles three kinds of FROM sources:
- Table references — the subquery selects from a named table, producing a
LogicalScannode. - Derived tables — the subquery's FROM clause is itself a subquery (
FROM (SELECT ...) AS t), producing aLogicalSubquerynode that wraps a recursively built logical plan. - Table-valued functions — the subquery references a table function (
FROM generate_series(1, 10)), producing aLogicalTableFuncnode.
This generality allows unnesting to work uniformly regardless of the subquery's internal structure. The join condition is then constructed from the correlation predicate, and any remaining WHERE-clause predicates from the subquery are placed as a filter above the join's right child.
9.3.10 Common Subexpression Elimination
Common subexpression elimination (CSE) identifies expression subtrees that appear more than once in a query's projection list and factors them into a single computation. The result is referenced by subsequent operators through a generated column name, avoiding redundant evaluation.
Example:
SELECT price * quantity AS line_total,
price * quantity * tax_rate AS tax_amount
FROM orders
The expression price * quantity appears twice. CSE factors it into a pre-computation step, and the outer projection references the computed column instead of re-evaluating the multiplication.
Exclusions: Not all repeated expressions are candidates for CSE. The optimizer skips:
- Aggregate and window functions — these have special evaluation semantics that cannot be hoisted into a pre-projection layer.
SELECT *queries — wildcard projections are incompatible with the intermediate projection node that CSE inserts, because the column set is not fully known at optimization time.- Field access and array slicing expressions —
FieldAccessandArraySlicenodes over lateral or correlated values are context-sensitive; hoisting them into a CSE layer can change evaluation semantics when the outer row varies.
9.3.11 Column Pruning
The final optimization pass removes columns from scan operators that are not referenced by any downstream operator. This reduces I/O and memory consumption, which is significant for wide tables where only a few columns are needed.
The pass walks the plan tree top-down, collecting the set of columns referenced by each operator's expressions (projection lists, filter predicates, join conditions, sort keys, group-by keys, and aggregate arguments). Scan operators are then annotated with this minimal column set so that the physical planner can push the projection into the storage layer.
9.4 Cardinality Estimation
Accurate cardinality estimates are crucial for cost-based optimization. A wrong estimate can lead to catastrophically bad plan choices.
9.4.1 The Cardinality Estimation Problem
Consider estimating the result size of:
SELECT * FROM orders WHERE status = 'pending' AND total > 1000
We need to estimate:
- How many rows satisfy
status = 'pending'? - How many rows satisfy
total > 1000? - How many rows satisfy both?
The naive approach assumes independence:
But predicates are often correlated. If status = 'pending' implies recent orders, and recent orders tend to be smaller, the predicates are negatively correlated.
9.4.2 CardinalityEstimate Structure
Cognica represents cardinality estimates with:
struct CardinalityEstimate {
double rows; // Estimated row count
double row_width = 512.0; // Average bytes per row
auto memory_estimate() const -> double {
return rows * row_width;
}
auto apply_selectivity(double selectivity) -> CardinalityEstimate& {
rows *= selectivity;
return *this;
}
};
The row_width field enables memory estimation for buffer sizing and spill prediction.
9.4.3 Operator Cardinality Propagation
Each operator type has specific propagation rules:
Scan:
The base table cardinality comes from statistics.
Filter:
Sort, Project:
Rows unchanged (though row_width may change for Project).
Limit:
When the limit is an expression rather than a constant, cardinality estimation falls back to the child estimate because the actual value is unknown until execution time.
Offset:
Group:
Where is the number of distinct values in the grouping columns.
Join:
Union:
9.4.4 Selectivity Estimation
The CardinalityEstimator class provides selectivity estimation for various predicate types:
Equality Selectivity:
With statistics:
Without statistics:
Range Selectivity:
With histogram:
Without histogram:
IN Selectivity:
Pattern Selectivity:
9.4.5 Correlation Handling
The naive independence assumption often produces severe underestimates. Cognica uses a damping factor for conjunctive predicates:
This geometric mean provides a middle ground between:
- Full independence: (often too low)
- Full correlation: (often too high)
Example:
For status = 'active' AND country = 'US':
- Independence:
- Damped:
9.4.6 Default Constants
When statistics are unavailable, the estimator uses conservative defaults:
static constexpr double kDefaultEqualitySelectivity = 0.01;
static constexpr double kDefaultRangeSelectivity = 0.30;
static constexpr double kDefaultInSelectivity = 0.05;
static constexpr double kDefaultJoinSelectivity = 0.1;
static constexpr double kDefaultGroupReduction = 0.1;
These defaults are calibrated to avoid catastrophic underestimates while remaining reasonably selective.
9.5 Statistics and Histograms
Accurate cardinality estimation requires statistics about data distribution. Cognica maintains comprehensive statistics including histograms, distinct value counts, and correlation information.
9.5.1 Histogram Structure
Cognica uses equi-depth (equi-height) histograms:
struct HistogramBucket {
double lower_bound; // Inclusive
double upper_bound; // Inclusive
int64_t count; // Values in bucket
int64_t distinct; // Distinct values in bucket
};
Equi-Depth Property: Each bucket contains approximately the same number of values:
Where is the number of buckets (default: 100).
Advantages over Equi-Width:
- Better handles skewed distributions
- Uniform error bounds across the value range
- Naturally adapts to data density
9.5.2 Histogram-Based Selectivity
Equality Selectivity:
auto Histogram::estimate_equality_selectivity(double value) const -> double {
auto bucket_idx = find_bucket_(value);
if (bucket_idx >= buckets_.size()) {
return 0.0; // Value outside range
}
const auto& bucket = buckets_[bucket_idx];
// Assume uniform distribution within bucket
return static_cast<double>(bucket.count) /
static_cast<double>(bucket.distinct * total_count_);
}
Range Selectivity:
With linear interpolation for partial buckets:
auto Histogram::interpolate_bucket_fraction_(
size_t bucket_idx, double lower, double upper) const -> double {
const auto& b = buckets_[bucket_idx];
auto bucket_width = b.upper_bound - b.lower_bound;
if (bucket_width <= 0) {
return 1.0; // Single-value bucket
}
auto effective_lower = std::max(lower, b.lower_bound);
auto effective_upper = std::min(upper, b.upper_bound);
return (effective_upper - effective_lower) / bucket_width;
}
9.5.3 Multi-Column Statistics
For correlated columns, single-column statistics are insufficient. Cognica supports several multi-column statistics:
N-Distinct Statistics:
Tracks distinct value counts for column combinations:
// Key: "col1,col2,col3" (sorted, comma-separated)
// Value: distinct count for the combination
std::unordered_map<std::string, int64_t> multi_column_ndistinct;
Functional Dependencies:
Tracks when one column determines another:
// Key: "A->B" (A determines B)
// Value: dependency strength [0.0, 1.0]
std::unordered_map<std::string, double> functional_dependencies;
A value of 1.0 indicates perfect functional dependency; lower values indicate partial dependency.
2D Histograms:
For pairs of numeric columns with significant correlation:
struct Histogram2D {
std::vector<double> x_boundaries;
std::vector<double> y_boundaries;
std::vector<std::vector<int64_t>> grid; // Row-major counts
};
9.5.4 Statistics Collection
Statistics are collected by IndexStatisticsCollector:
struct Statistics {
// Basic cardinality
int64_t total_keys;
int64_t distinct_values;
HyperLogLog hll; // Approximate distinct count
// Range information
std::unordered_map<std::string, double> min_values;
std::unordered_map<std::string, double> max_values;
std::unordered_map<std::string, std::string> min_string_values;
std::unordered_map<std::string, std::string> max_string_values;
// Null counts
std::unordered_map<std::string, int64_t> null_counts;
// Histograms
std::unordered_map<std::string, Histogram> histograms;
// Multi-column statistics
std::unordered_map<std::string, int64_t> multi_column_ndistinct;
std::unordered_map<std::string, double> functional_dependencies;
std::unordered_map<std::string, Histogram2D> histograms_2d;
// Staleness tracking
TimePoint last_updated;
bool is_stale;
int64_t last_analyzed_write_count;
};
9.6 Cost Model
The cost model estimates the resource consumption of execution plans, enabling the optimizer to compare alternatives.
9.6.1 Cost Structure
Cognica uses a multi-dimensional cost model:
struct Cost {
double cpu_cost; // CPU cycles estimate
double io_cost; // I/O operations
double memory_cost; // Memory pressure
double spill_cost; // Disk spill overhead
auto total() const -> double {
return cpu_cost +
io_cost * kIOCostWeight +
memory_cost * kMemoryCostWeight +
spill_cost * kSpillCostWeight;
}
};
Cost Weights:
static constexpr double kIOCostWeight = 10.0;
static constexpr double kMemoryCostWeight = 0.1;
static constexpr double kSpillCostWeight = 5.0;
The weights reflect modern hardware characteristics:
- I/O is heavily weighted because disk access is orders of magnitude slower than CPU
- Memory cost is lightly weighted because it represents pressure, not direct time
- Spill cost represents the penalty of exceeding memory budgets
9.6.2 Cost Constants
The cost model uses calibrated constants:
// CPU costs
static constexpr double kCPUTupleProcessing = 0.01;
static constexpr double kCPUComparison = 0.0001;
static constexpr double kCPUHashOp = 0.0005;
// I/O costs
static constexpr double kHeapAccessCost = 0.5;
static constexpr double kSpillIOCost = 0.01; // Per byte
// Memory constants
static constexpr double kCompressionFactor = 0.5; // LZ4 typical
static constexpr double kHashTableOverhead = 1.5;
static constexpr size_t kMergeWidth = 16;
static constexpr size_t kDefaultBlockSize = 8192;
static constexpr size_t kAggStateSize = 256;
9.6.3 Operator Cost Formulas
Sequential Scan:
Index Scan:
The random I/O cost reflects the non-sequential access pattern.
Sort:
In-memory:
External:
Where is the merge width and is the number of initial sorted runs.
Hash Join:
Where and (probe is slightly more expensive due to collision handling).
Nested Loop Join:
Where reflects the high cost of repeated inner scans.
9.6.4 Spill Prediction
The cost model predicts when operations will exceed memory budgets:
auto CostEstimator::will_spill(double data_size, size_t memory_budget) const -> bool {
return data_size > static_cast<double>(memory_budget);
}
When spill is predicted, the cost model adds spill overhead:
9.7 Join Ordering
Join ordering is often the most impactful optimization decision. The order of joins can change query execution time by orders of magnitude.
9.7.1 The Join Ordering Problem
For tables, the number of possible join trees is:
| Tables | Join Trees |
|---|---|
| 2 | 1 |
| 3 | 12 |
| 4 | 120 |
| 5 | 1,680 |
| 10 | 17,643,225,600 |
Exhaustive enumeration is infeasible for large queries. Dynamic programming reduces the complexity to while guaranteeing optimality.
9.7.2 Join Graph Representation
Cognica represents join relationships as a graph:
struct JoinVertex {
std::string identifier; // Table alias
LogicalPlanPtr plan; // Subplan for this table
double cardinality; // Estimated rows
};
struct JoinGraphEdge {
size_t left_idx; // Left table index
size_t right_idx; // Right table index
ast::Expression* condition; // Join predicate
ast::JoinType type; // Inner, left, right, full
double selectivity; // Join selectivity
};
Example:
For the query:
SELECT * FROM A
JOIN B ON A.x = B.x
JOIN C ON B.y = C.y
JOIN D ON A.z = D.z
9.7.3 DPccp Algorithm
Cognica implements the DPccp (Dynamic Programming with Connected-Complement-Pairs) algorithm from Moerkotte and Neumann. This algorithm generates optimal bushy join trees without cross products.
Key Insight: A valid join plan connects a subset of tables with its complement only if there exists a join predicate between them.
Algorithm Structure:
auto JoinOrderEnumerator::enumerate(const JoinGraph& graph) -> LogicalPlanPtr {
// Initialize DP table with single-table plans
for (size_t i = 0; i < graph.vertex_count(); ++i) {
auto singleton = 1ULL << i;
dp_table_[singleton] = {
.subset = singleton,
.plan = graph.vertices()[i].plan,
.cost = 0.0,
.cardinality = graph.vertices()[i].cardinality
};
}
// Enumerate connected subgraph-complement pairs
enumerate_csg_cmp_pairs_(graph);
// Return plan for all tables
auto all_tables = (1ULL << graph.vertex_count()) - 1;
return dp_table_[all_tables].plan;
}
CSG-CMP Enumeration:
The algorithm enumerates pairs where:
- and are both connected subgraphs
- There exists an edge between and
void JoinOrderEnumerator::enumerate_csg_cmp_pairs_(const JoinGraph& graph) {
auto n = graph.vertex_count();
// For each starting vertex
for (size_t i = 0; i < n; ++i) {
auto start = 1ULL << i;
// Enumerate connected subgraphs containing vertex i
enumerate_csg_rec_(graph, start, start, i);
}
}
void JoinOrderEnumerator::enumerate_csg_rec_(
const JoinGraph& graph,
uint64_t current,
uint64_t excluded,
size_t min_vertex) {
// For current subgraph, enumerate complements
enumerate_cmp_(graph, current);
// Extend subgraph with neighbors
auto neighbors = get_neighborhood_(graph, current, excluded, min_vertex);
for (auto neighbor : neighbors) {
auto extended = current | neighbor;
enumerate_csg_rec_(graph, extended, excluded | neighbor, min_vertex);
}
}
Complement Enumeration:
void JoinOrderEnumerator::enumerate_cmp_(const JoinGraph& graph, uint64_t s1) {
// Find edges crossing from s1 to complement
auto crossing = graph.get_crossing_edges(s1);
if (crossing.empty()) {
return; // No valid complement
}
// Build complement subgraphs
auto complement_vertices = get_complement_vertices_(graph, s1, crossing);
for (auto s2 : enumerate_connected_subsets_(graph, complement_vertices)) {
emit_csg_cmp_(graph, s1, s2);
}
}
Plan Emission:
void JoinOrderEnumerator::emit_csg_cmp_(
const JoinGraph& graph, uint64_t s1, uint64_t s2) {
// Skip if either subset doesn't have a plan yet
if (dp_table_.find(s1) == dp_table_.end() ||
dp_table_.find(s2) == dp_table_.end()) {
return;
}
auto& plan1 = dp_table_[s1];
auto& plan2 = dp_table_[s2];
// Create join plan and estimate cost
auto join_plan = create_join_plan_(graph, s1, s2, plan1, plan2);
auto cost = estimate_join_cost_(plan1, plan2);
auto cardinality = estimate_join_cardinality_(graph, s1, s2, plan1, plan2);
// Store if better than existing plan
auto combined = s1 | s2;
store_plan_(combined, std::move(join_plan), cost, cardinality);
}
9.7.4 Join Cost Estimation
The join cost depends on the algorithm chosen:
static constexpr double kHashBuildCostPerRow = 1.0;
static constexpr double kHashProbeCostPerRow = 1.2;
static constexpr double kNestLoopCostPerRow = 10.0;
static constexpr double kSortMergeCostPerRow = 2.0;
Hash Join Cost:
The smaller relation is typically chosen as the build side.
Join Cardinality:
Join selectivity is estimated from:
- Key cardinalities (for equi-joins on keys)
- Default selectivity (0.1) when statistics are unavailable
9.7.5 Bushy vs Left-Deep Plans
DPccp generates bushy plans, which can be more efficient than left-deep plans:
Left-Deep Plan:
Bushy Plan:
Bushy plans enable parallelism and can reduce intermediate result sizes.
9.7.6 Join Reordering Safety Guard
The multi-way join enumeration algorithm (DPccp) assumes that the inputs to each join are independent, side-effect-free relations whose evaluation order does not affect semantics. This assumption holds for plain table scans but breaks down for several classes of join inputs:
- Lateral joins — the right side of a
LATERALjoin references columns from the left side, creating a data dependency that the enumerator's bitmask representation cannot express. - Table-valued functions — function calls like
generate_series()orunnest()may depend on correlated outer values and must remain in their original position. - Subquery inputs — derived tables (
FROM (SELECT ...) AS t) may contain correlated references or side effects.
The optimizer guards against incorrect reordering by inspecting the join chain before invoking DPccp. It walks the left-spine of nested inner joins and checks two conditions:
- All joins in the chain must be
INNERjoins (outer and semi-joins have non-commutative semantics). - Every right-side input of each join must be a plain
LogicalScannode.
If either condition is violated, the optimizer skips multi-way enumeration and falls back to pairwise join optimization, which preserves the original tree shape and its associated evaluation order.
9.8 Access Path Selection
Access path selection determines how to read data from tables—full scan, index scan, or index-only scan.
9.8.1 Access Path Types
enum class AccessPathType {
kSeqScan, // Full table scan
kIndexScan, // Index scan with heap fetch
kIndexOnlyScan, // Covering index scan
kPrimaryKeyLookup // Direct point lookup
};
9.8.2 Access Path Enumeration
The AccessPathEnumerator generates all viable access paths:
auto AccessPathEnumerator::enumerate(const ast::Expression* filter)
-> std::vector<AccessPath> {
auto paths = std::vector<AccessPath> {};
// Sequential scan is always available
paths.push_back(create_seq_scan_());
// Check for primary key equality
if (is_pk_equality_(filter)) {
paths.push_back(create_pk_lookup_(filter));
}
// Check each available index
for (const auto& index : indexes_) {
if (auto bounds = extract_index_bounds_(filter, index)) {
paths.push_back(create_index_scan_(index, *bounds));
// Check for covering index
if (is_covering_index_(index)) {
paths.push_back(create_index_only_scan_(index, *bounds));
}
}
}
// Sort by cost
std::sort(paths.begin(), paths.end(),
[](const auto& a, const auto& b) {
return a.cost.total() < b.cost.total();
});
return paths;
}
9.8.3 Cost Comparison
Sequential Scan:
- Cost: Proportional to table size
- Best when: High selectivity (reading most of table) or no suitable index
Index Scan:
- Cost: Index traversal + heap fetches
- Best when: Low selectivity (reading small fraction of table)
Index-Only Scan:
- Cost: Index traversal only
- Best when: Index contains all required columns (covering index)
Primary Key Lookup:
- Cost: Single point lookup
- Best when: Equality predicate on primary key
Break-Even Analysis:
For an index scan to beat a sequential scan:
Solving for selectivity:
With typical constants (, , ):
Index scans typically win when selecting less than ~25% of rows.
9.9 Strategy Selection
After determining access paths, the optimizer selects algorithms for each operator.
9.9.1 Sort Strategy Selection
enum class SortStrategy {
kNoSort, // Input already sorted
kTopKHeap, // Heap-based for small limits
kInMemorySort, // QuickSort in memory
kExternalSort, // External merge sort
kIndexScan // Use index ordering
};
Selection Logic:
auto SortStrategySelector::select(const SortConfig& config) -> SortStrategy {
// Check if input is already sorted
if (is_sorted_by_(config.input_ordering, config.sort_keys)) {
return SortStrategy::kNoSort;
}
// Check for index providing order
for (const auto& idx : config.available_indexes) {
if (provides_ordering_(idx, config.sort_keys)) {
return SortStrategy::kIndexScan;
}
}
// Check for TopK optimization
if (config.limit_hint && *config.limit_hint <= kTopKThreshold) {
return SortStrategy::kTopKHeap;
}
// Choose between in-memory and external
auto data_size = config.input_rows * config.row_width;
if (data_size <= config.memory_budget) {
return SortStrategy::kInMemorySort;
}
return SortStrategy::kExternalSort;
}
9.9.2 Join Strategy Selection
enum class JoinStrategy {
kHashJoin, // Hash build/probe
kIndexNestedLoop, // Index lookup on inner
kMergeJoin, // Sort-merge
kNestedLoop // Simple nested loop
};
Selection Logic:
auto JoinStrategySelector::select(const JoinConfig& config) -> JoinStrategy {
// Check for index nested loop (small outer, index on inner)
if (config.left_rows <= kINLOuterThreshold) {
for (const auto& idx : config.right_indexes) {
if (matches_join_keys_(idx, config.join_keys)) {
return JoinStrategy::kIndexNestedLoop;
}
}
}
// Check if both sides are sorted on join keys
if (is_sorted_on_(config.left_ordering, config.join_keys) &&
is_sorted_on_(config.right_ordering, config.join_keys)) {
return JoinStrategy::kMergeJoin;
}
// Default to hash join
auto smaller_side = std::min(config.left_rows, config.right_rows);
auto hash_size = smaller_side * config.row_width * kHashTableOverhead;
if (hash_size <= config.memory_budget) {
return JoinStrategy::kHashJoin;
}
// Fallback to nested loop (should be rare)
return JoinStrategy::kNestedLoop;
}
9.9.3 Aggregate Strategy Selection
enum class AggregateStrategy {
kStreamAggregate, // Streaming on sorted input
kHashAggregate, // Hash-based grouping
kSortAggregate // Sort then stream
};
Selection Logic:
auto AggregateStrategySelector::select(const AggregateConfig& config)
-> AggregateStrategy {
// Check if input is sorted on group keys
if (is_sorted_on_(config.input_ordering, config.group_keys)) {
return AggregateStrategy::kStreamAggregate;
}
// Estimate hash table size
auto groups = config.estimated_groups.value_or(
config.input_rows * kDefaultGroupReduction);
auto hash_size = groups * kBytesPerGroupState;
if (hash_size <= config.memory_budget) {
return AggregateStrategy::kHashAggregate;
}
// Sort then stream
return AggregateStrategy::kSortAggregate;
}
9.10 Memory Budget Allocation
The final optimization phase allocates memory budgets to operators and predicts spill behavior.
9.10.1 Budget Distribution
Memory is distributed proportionally to operator data requirements:
void MemoryBudgetAllocator::allocate(PhysicalPlan& plan) {
// Identify memory-consuming operators
auto blocking_ops = find_blocking_operators_(plan);
// Calculate total data volume
auto total_data = 0.0;
for (const auto& op : blocking_ops) {
total_data += op->properties().cardinality.memory_estimate();
}
// Distribute budget proportionally
for (auto& op : blocking_ops) {
auto data_size = op->properties().cardinality.memory_estimate();
auto weight = data_size / total_data;
auto budget = static_cast<size_t>(weight * total_budget_);
// Ensure minimum budget
budget = std::max(budget, kMinBudget);
op->properties().memory_budget = budget;
op->properties().will_spill = (data_size > budget);
}
}
9.10.2 Spill Prediction
When an operator's estimated data exceeds its budget, spill is predicted:
struct OperatorProperties {
CardinalityEstimate cardinality;
std::vector<PhysicalOrdering> ordering;
size_t memory_budget;
bool will_spill;
std::optional<SpillOptions> spill_options;
bool preserves_ordering;
};
Spill options configure the spill behavior:
struct SpillOptions {
std::string spill_directory;
size_t spill_block_size = 64 * 1024;
CompressionType compression = CompressionType::kLZ4;
};
9.10.3 Minimum Budget Guarantee
Each operator receives at least 16 MB:
static constexpr size_t kMinBudget = 16 * 1024 * 1024;
This ensures operators can make progress even with limited memory. Below this threshold, the constant overhead of spilling exceeds any benefit from reduced memory usage.
9.11 Optimizer Configuration
The optimizer behavior is controlled by configuration options:
struct OptimizerConfig {
// Memory configuration
size_t memory_budget = 256 * 1024 * 1024; // 256 MB default
std::string spill_directory;
// Optimization passes
uint32_t max_optimization_passes = 10;
// Feature flags
bool enable_filter_pushdown = true;
bool enable_topk_optimization = true;
bool enable_index_selection = true;
bool enable_cost_based_join = true;
// Strategy forcing (for testing)
std::optional<SortStrategy> force_sort_strategy;
std::optional<JoinStrategy> force_join_strategy;
std::optional<AggregateStrategy> force_aggregate_strategy;
};
Feature Flags: Individual optimizations can be disabled for debugging or when they cause regressions.
Strategy Forcing: For testing and debugging, specific strategies can be forced regardless of cost estimates.
9.12 Optimization Statistics
The optimizer collects detailed statistics about its operation:
struct OptimizerStats {
// Timing (microseconds)
int64_t parse_time_us;
int64_t logical_build_time_us;
int64_t rule_optimization_time_us;
int64_t cardinality_estimation_time_us;
int64_t physical_build_time_us;
int64_t total_time_us;
// Rule application
uint32_t rules_applied;
uint32_t filters_pushed_down;
uint32_t topk_optimizations;
// Plan characteristics
double estimated_rows;
double estimated_cost;
std::string selected_access_path;
std::string selected_sort_strategy;
std::string selected_join_strategy;
std::string selected_aggregate_strategy;
// Memory
size_t total_memory_allocated;
size_t operators_will_spill;
};
These statistics enable:
- Performance monitoring of the optimizer itself
- Plan explanation for users
- Regression detection in optimizer changes
- Debugging of plan quality issues
9.13 Summary
Cognica's logical planning and optimization system implements a sophisticated multi-phase pipeline:
- Logical Plan Construction: Converts SQL AST to relational algebra tree with support for subqueries, table-valued functions, UNNEST, and expression-based LIMIT/OFFSET.
- Rule-Based Optimization: Applies algebraic transformations (filter merging, pushdown, predicate ordering, TopK optimization).
- Plan Optimization: An eight-pass pipeline that simplifies predicates, estimates selectivity, pushes down predicates, unnests correlated subqueries into joins, eliminates common subexpressions, expands OR-linked join predicates, reorders multi-way joins, and prunes unused columns.
- Cardinality Estimation: Estimates result sizes using histograms and statistics.
- Physical Planning: Selects access paths and execution algorithms.
- Memory Allocation: Distributes memory budgets and predicts spills.
Key innovations include:
- Damped Selectivity: Uses geometric mean for correlated predicates
- DPccp Join Ordering: Optimal bushy tree generation without cross products, with a safety guard that preserves lateral, table function, and subquery inputs
- Subquery Unnesting: Converts correlated EXISTS/IN/NOT EXISTS subqueries into semi-joins and anti-joins, supporting table references, derived tables, and table-valued functions as subquery sources
- Common Subexpression Elimination: Factors out repeated expression subtrees to avoid redundant computation
- Multi-Dimensional Cost Model: Balances CPU, I/O, memory, and spill costs
- Integrated Spill Prediction: Memory-aware planning from the start
The optimizer balances plan quality against optimization time, using heuristics where exact solutions are intractable while guaranteeing optimality for critical decisions like join ordering.