Chapter 9: Logical Planning and Optimization

9.1 Introduction to Query Optimization

Query optimization represents one of the most intellectually challenging problems in database systems. Given a declarative SQL query, the optimizer must find an efficient execution strategy from an exponentially large search space of possible plans. The quality of this decision directly impacts query performance by orders of magnitude.

Cognica implements a sophisticated multi-phase optimization pipeline that combines rule-based transformations with cost-based decisions. This chapter explores the logical planning phase, where queries are transformed into an algebraic representation and optimized through a series of rewrite rules before being converted to physical execution plans.

9.1.1 The Optimization Problem

Consider a simple three-way join query:

SELECT o.order_id, c.name, p.title
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE c.country = 'US' AND o.total > 100

Even for this modest query, the optimizer must decide:

  1. Join Order: Should we join orders-customers first, then products? Or orders-products first? With nn tables, there are (2n2)!(n1)!\frac{(2n-2)!}{(n-1)!} possible bushy join trees.

  2. Join Algorithm: Hash join, merge join, nested loop, or index nested loop for each join?

  3. Access Paths: Full table scan or index scan for each table? Which index if multiple are available?

  4. Filter Placement: Apply c.country = 'US' before or after the join?

  5. Sort Strategy: In-memory sort, external sort, or leverage index ordering?

The number of possible plans grows combinatorially. For a 10-way join, there are over 17 billion possible join orderings alone. The optimizer must navigate this space efficiently to find a good plan quickly.

9.1.2 Optimization Pipeline Architecture

Cognica's query optimizer implements a five-phase pipeline:

Loading diagram...

Each phase serves a distinct purpose:

PhaseComponentPurpose
1LogicalPlanBuilderConvert AST to relational algebra
2ASTOptimizerApply transformation rules
3CardinalityEstimatorEstimate result sizes
4PhysicalPlanBuilderSelect algorithms and access paths
5MemoryBudgetAllocatorDistribute memory, predict spills

9.2 Logical Plan Representation

The logical plan represents a query as a tree of relational algebra operators. Unlike the physical plan (which specifies concrete algorithms), the logical plan describes what operations to perform without specifying how.

9.2.1 Operator Types

Cognica defines fourteen logical operator types:

Loading diagram...

Each operator type corresponds to a relational algebra operation:

OperatorAlgebraSQL Clause
ScanOpRRFROM table
FilterOpσθ(R)\sigma_\theta(R)WHERE condition
ProjectOpπa1,...,an(R)\pi_{a_1,...,a_n}(R)SELECT columns
SortOpτk1,...,kn(R)\tau_{k_1,...,k_n}(R)ORDER BY
LimitOpλn(R)\lambda_{n}(R)LIMIT n / LIMIT expr
GroupOpγG,F(R)\gamma_{G,F}(R)GROUP BY
JoinOpRθSR \bowtie_\theta SJOIN ... ON
UnionOpRSR \cup SUNION
SubqueryOpρalias(Q)\rho_{alias}(Q)FROM (SELECT ...) AS t
TableFuncOpf(args)f(args)FROM func(...)
UnnestOpμ(e)\mu(e)FROM unnest(array)
SearchOpSq(R)\mathcal{S}_q(R)Full-text search

Expression-Based LIMIT/OFFSET: The LimitOp supports both constant values and arbitrary expressions for LIMIT and OFFSET. When the limit or offset is a constant integer literal, the planner stores it directly in the limit / offset optional fields. When the value comes from a parameter, a subquery, or an arithmetic expression, the planner stores the unevaluated AST expression in limit_expr / offset_expr and defers evaluation to execution time. This dual representation lets the optimizer reason about constant limits during planning (for TopK pushdown and cardinality estimation) while still supporting dynamic limits such as LIMIT $1.

9.2.2 Plan Tree Structure

A logical plan forms a tree where:

  • Leaf nodes are ScanOp operators (data sources)
  • Internal nodes are transformation operators
  • The root produces the final query result

For the query:

SELECT name, total FROM orders WHERE status = 'shipped' ORDER BY total DESC LIMIT 10

The logical plan tree is:

Loading diagram...

9.2.3 Operator Properties

Each logical operator maintains properties that guide optimization:

Cardinality Estimate: The estimated_rows_ field stores the predicted output row count. This is populated during cardinality estimation and used for cost calculations.

Schema Propagation: Operators track their output schema, enabling the optimizer to verify that downstream operators reference valid columns.

Ordering Properties: Some operators (Sort, certain scans) produce ordered output. Tracking this enables sort elimination optimizations.

9.3 Rule-Based Optimization

The first optimization phase applies a set of transformation rules that improve the plan regardless of data statistics. These rules implement algebraic equivalences that always (or almost always) improve performance.

9.3.1 Optimization Rule Framework

Cognica implements an extensible rule framework:

class OptimizationRule {
public:
  virtual ~OptimizationRule() = default;
  virtual auto apply(LogicalOpPtr root) -> std::pair<LogicalOpPtr, bool> = 0;
  virtual auto name() const -> std::string = 0;
};

Each rule's apply() method returns:

  1. The transformed plan (possibly unchanged)
  2. A boolean indicating whether any transformation occurred

The ASTOptimizer manages rule application:

class ASTOptimizer {
public:
  auto optimize(LogicalPlan plan) -> std::pair<LogicalPlan, OptimizationStats>;

private:
  std::vector<std::unique_ptr<OptimizationRule>> rules_;
  uint32_t max_passes_ = 10;
};

Rules are applied iteratively until a fixed point (no rule makes changes) or the maximum iteration count is reached. This fixed-point iteration ensures that rules can trigger each other—for example, filter merging might enable additional pushdown opportunities.

9.3.2 MergeFiltersRule

The first rule merges consecutive filter operators into a single filter with a conjunctive predicate.

Transformation:

σθ1(σθ2(R))σθ1θ2(R)\sigma_{\theta_1}(\sigma_{\theta_2}(R)) \Rightarrow \sigma_{\theta_1 \land \theta_2}(R)

Example:

Before:

Filter(price > 100)
  Filter(category = 'electronics')
    Scan(products)

After:

Filter(price > 100 AND category = 'electronics')
  Scan(products)

Benefits:

  1. Reduces operator overhead (fewer virtual function calls)
  2. Enables better predicate evaluation order
  3. Improves index matching (composite predicates may match composite indexes)

The implementation recursively processes the tree bottom-up:

auto MergeFiltersRule::merge_recursive_(LogicalOpPtr node) -> LogicalOpPtr {
  // First, recursively process children
  for (size_t i = 0; i < node->children().size(); ++i) {
    node->set_child(i, merge_recursive_(node->children()[i]));
  }

  // If this is a filter with a filter child, merge them
  if (node->type() == LogicalOpType::kFilter) {
    auto* filter = static_cast<FilterOp*>(node.get());
    if (!filter->children().empty() &&
        filter->children()[0]->type() == LogicalOpType::kFilter) {
      auto* child_filter = static_cast<FilterOp*>(filter->children()[0].get());

      // Create AND of both predicates
      auto merged = ast::make_and(filter->predicate(), child_filter->predicate());
      filter->set_predicate(std::move(merged));

      // Skip the child filter
      filter->set_child(0, child_filter->children()[0]);
    }
  }
  return node;
}

9.3.3 PredicatePushdownRule

Predicate pushdown moves filter operations closer to data sources, reducing the number of rows that flow through the plan.

Transformation:

πL(σθ(R))πL(σθ(R)) (no change)\pi_L(\sigma_\theta(R)) \Rightarrow \pi_L(\sigma_\theta(R)) \text{ (no change)} σθ(πL(R))πL(σθ(R)) (if θ only references L)\sigma_\theta(\pi_L(R)) \Rightarrow \pi_L(\sigma_\theta(R)) \text{ (if } \theta \text{ only references } L\text{)}

Pushdown Compatibility Matrix:

Parent OperatorCan Push Through?Condition
ProjectYesIf predicate columns are in projection
SortYesAlways
LimitYesAlways
SkipYesAlways
GroupNoPredicate references aggregates
JoinPartialOnly predicates on single table
SearchNoChanges semantics

Example:

Before:

Project(name, price)
  Sort(price DESC)
    Filter(category = 'books')
      Scan(products)

After:

Project(name, price)
  Sort(price DESC)
    Filter(category = 'books')
      Scan(products)

In this case, the filter is already at the optimal position. But consider:

Before:

Filter(category = 'books')
  Project(name, price, category)
    Sort(price DESC)
      Scan(products)

After:

Project(name, price, category)
  Sort(price DESC)
    Filter(category = 'books')
      Scan(products)

Benefits:

  1. Reduces data volume early in the pipeline
  2. Enables index utilization (filter at scan level can use indexes)
  3. Reduces memory pressure for blocking operators

9.3.4 PredicateOrderingRule

Within a compound predicate, the evaluation order affects performance. For short-circuit evaluation:

  • In AND clauses: evaluate most selective predicates first
  • In OR clauses: evaluate least selective predicates first

Selectivity Heuristics:

The rule uses hardcoded selectivity estimates when statistics are unavailable:

Predicate TypeEstimated Selectivity
Equality (=)0.01 (1%)
Inequality (!=)0.99 (99%)
Range (<, >, <=, >=)0.33 (33%)
IN clause0.01 per value, max 0.30
Pattern match (LIKE)0.15 (15%)
Existence check0.90 (90%)

Compound Selectivity:

For AND:

S(θ1θ2)=S(θ1)×S(θ2)S(\theta_1 \land \theta_2) = S(\theta_1) \times S(\theta_2)

For OR:

S(θ1θ2)=1(1S(θ1))(1S(θ2))S(\theta_1 \lor \theta_2) = 1 - (1 - S(\theta_1))(1 - S(\theta_2))

For NOT:

S(¬θ)=1S(θ)S(\lnot \theta) = 1 - S(\theta)

Example:

Before:

WHERE status != 'deleted' AND id = 12345

After:

WHERE id = 12345 AND status != 'deleted'

The equality predicate (1% selectivity) is evaluated first because it will short-circuit more often than the inequality (99% selectivity).

9.3.5 TopKPushdownRule

When a query has both ORDER BY and LIMIT, the optimizer can avoid sorting all rows by using a heap-based Top-K algorithm.

Transformation:

λk(τkeys(R))TopKkkeys(R) (when k1000)\lambda_k(\tau_{keys}(R)) \Rightarrow \text{TopK}_k^{keys}(R) \text{ (when } k \leq 1000\text{)}

Example:

Before:

Limit(10)
  Sort(score DESC)
    Scan(articles)  -- 10 million rows

After (conceptually):

TopKSort(10, score DESC)
  Scan(articles)

Algorithm Comparison:

AlgorithmTime ComplexitySpace Complexity
Full SortO(nlogn)O(n \log n)O(n)O(n)
Top-K HeapO(nlogk)O(n \log k)O(k)O(k)

For n=107n = 10^7 and k=10k = 10:

  • Full sort: ~233 million comparisons
  • Top-K heap: ~33 million comparisons

The rule annotates the Sort operator with estimated_rows = k, signaling to the physical planner to use a heap-based strategy.

9.3.6 RemoveRedundantSkipRule

A Skip(0) operator has no effect and can be removed:

δ0(R)R\delta_0(R) \Rightarrow R

This rule handles edge cases where query builders generate trivial offsets.

9.3.7 Fixed-Point Iteration

Rules are applied iteratively because one transformation may enable another:

auto ASTOptimizer::optimize(LogicalPlan plan) -> std::pair<LogicalPlan, OptimizationStats> {
  auto stats = OptimizationStats {};
  auto root = plan.take_root();

  for (uint32_t pass = 0; pass < max_passes_; ++pass) {
    auto any_changed = false;

    for (auto& rule : rules_) {
      auto [new_root, changed] = rule->apply(std::move(root));
      root = std::move(new_root);

      if (changed) {
        any_changed = true;
        ++stats.rules_applied;
      }
    }

    if (!any_changed) {
      break;  // Fixed point reached
    }
  }

  plan.set_root(std::move(root));
  return {std::move(plan), stats};
}

Typical queries converge within 2-3 passes. The maximum of 10 passes handles pathological cases.

9.3.8 Logical Plan Optimization Pipeline

After the AST-level rule-based optimizer produces an initial logical plan, the PlanOptimizer applies a second round of transformations that operate directly on the logical plan tree. These passes require access to the full relational algebra structure and cannot be expressed as simple AST rewrite rules.

The PlanOptimizer executes a fixed sequence of passes:

1. simplify_predicates_       -- Boolean algebra simplification
2. estimate_selectivity_      -- Annotate filters and joins with selectivity
3. pushdown_predicates_       -- Push filters below joins and projections
4. unnest_subqueries_         -- Convert correlated subqueries to joins
5. eliminate_common_subexpressions_ -- Factor out repeated expressions
6. expand_join_or_to_union_   -- Rewrite OR-linked join predicates
7. reorder_joins_             -- Cost-based multi-way join enumeration
8. prune_columns_             -- Remove unused columns from scans

Unlike the fixed-point iteration of the AST optimizer, the plan optimizer runs each pass exactly once in the order shown above. The ordering is significant: selectivity estimates must be available before predicate pushdown can make cost-aware decisions, and subquery unnesting must complete before join reordering can consider the newly introduced join nodes.

9.3.9 Subquery Unnesting

Subquery unnesting (also called subquery decorrelation) converts correlated scalar subqueries and EXISTS / IN / NOT EXISTS subqueries into equivalent join operations. This transformation is important because a correlated subquery would otherwise require re-execution for every outer row, producing O(n×m)O(n \times m) work, whereas a join can be executed in O(n+m)O(n + m) with a hash join.

EXISTS Subquery to Semi-Join:

A filter of the form EXISTS (SELECT 1 FROM S WHERE S.fk = R.pk) is converted to a semi-join:

σEXISTS(Q)(R)RθS\sigma_{\text{EXISTS}(Q)}(R) \Rightarrow R \ltimes_{\theta} S

IN Subquery to Semi-Join:

A predicate R.col IN (SELECT S.col FROM S WHERE ...) is rewritten similarly:

σcolQ(R)RR.col=S.colS\sigma_{col \in Q}(R) \Rightarrow R \ltimes_{R.col = S.col} S

NOT EXISTS to Anti-Join:

σ¬EXISTS(Q)(R)RθS\sigma_{\lnot\text{EXISTS}(Q)}(R) \Rightarrow R \rhd_{\theta} S

Source Resolution: The unnesting pass must determine the data source for the right side of the generated join. Earlier versions only supported plain table references in the subquery's FROM clause. The current implementation uses a unified source resolution strategy that handles three kinds of FROM sources:

  1. Table references — the subquery selects from a named table, producing a LogicalScan node.
  2. Derived tables — the subquery's FROM clause is itself a subquery (FROM (SELECT ...) AS t), producing a LogicalSubquery node that wraps a recursively built logical plan.
  3. Table-valued functions — the subquery references a table function (FROM generate_series(1, 10)), producing a LogicalTableFunc node.

This generality allows unnesting to work uniformly regardless of the subquery's internal structure. The join condition is then constructed from the correlation predicate, and any remaining WHERE-clause predicates from the subquery are placed as a filter above the join's right child.

9.3.10 Common Subexpression Elimination

Common subexpression elimination (CSE) identifies expression subtrees that appear more than once in a query's projection list and factors them into a single computation. The result is referenced by subsequent operators through a generated column name, avoiding redundant evaluation.

Example:

SELECT price * quantity AS line_total,
       price * quantity * tax_rate AS tax_amount
FROM orders

The expression price * quantity appears twice. CSE factors it into a pre-computation step, and the outer projection references the computed column instead of re-evaluating the multiplication.

Exclusions: Not all repeated expressions are candidates for CSE. The optimizer skips:

  • Aggregate and window functions — these have special evaluation semantics that cannot be hoisted into a pre-projection layer.
  • SELECT * queries — wildcard projections are incompatible with the intermediate projection node that CSE inserts, because the column set is not fully known at optimization time.
  • Field access and array slicing expressionsFieldAccess and ArraySlice nodes over lateral or correlated values are context-sensitive; hoisting them into a CSE layer can change evaluation semantics when the outer row varies.

9.3.11 Column Pruning

The final optimization pass removes columns from scan operators that are not referenced by any downstream operator. This reduces I/O and memory consumption, which is significant for wide tables where only a few columns are needed.

The pass walks the plan tree top-down, collecting the set of columns referenced by each operator's expressions (projection lists, filter predicates, join conditions, sort keys, group-by keys, and aggregate arguments). Scan operators are then annotated with this minimal column set so that the physical planner can push the projection into the storage layer.

9.4 Cardinality Estimation

Accurate cardinality estimates are crucial for cost-based optimization. A wrong estimate can lead to catastrophically bad plan choices.

9.4.1 The Cardinality Estimation Problem

Consider estimating the result size of:

SELECT * FROM orders WHERE status = 'pending' AND total > 1000

We need to estimate:

  1. How many rows satisfy status = 'pending'?
  2. How many rows satisfy total > 1000?
  3. How many rows satisfy both?

The naive approach assumes independence:

Rθ1θ2=R×S(θ1)×S(θ2)|R_{\theta_1 \land \theta_2}| = |R| \times S(\theta_1) \times S(\theta_2)

But predicates are often correlated. If status = 'pending' implies recent orders, and recent orders tend to be smaller, the predicates are negatively correlated.

9.4.2 CardinalityEstimate Structure

Cognica represents cardinality estimates with:

struct CardinalityEstimate {
  double rows;                    // Estimated row count
  double row_width = 512.0;       // Average bytes per row

  auto memory_estimate() const -> double {
    return rows * row_width;
  }

  auto apply_selectivity(double selectivity) -> CardinalityEstimate& {
    rows *= selectivity;
    return *this;
  }
};

The row_width field enables memory estimation for buffer sizing and spill prediction.

9.4.3 Operator Cardinality Propagation

Each operator type has specific propagation rules:

Scan:

Scan(R)=R|Scan(R)| = |R|

The base table cardinality comes from statistics.

Filter:

Filterθ(R)=R×S(θ)|Filter_\theta(R)| = |R| \times S(\theta)

Sort, Project:

Sort(R)=Project(R)=R|Sort(R)| = |Project(R)| = |R|

Rows unchanged (though row_width may change for Project).

Limit:

Limitk(R)=min(R,k)|Limit_k(R)| = \min(|R|, k)

When the limit is an expression rather than a constant, cardinality estimation falls back to the child estimate because the actual value is unknown until execution time.

Offset:

Offsetk(R)=max(0,Rk)|Offset_k(R)| = \max(0, |R| - k)

Group:

GroupG,F(R)=NDV(G)|Group_{G,F}(R)| = NDV(G)

Where NDV(G)NDV(G) is the number of distinct values in the grouping columns.

Join:

RθS=R×S×Sjoin(θ)|R \bowtie_\theta S| = |R| \times |S| \times S_{join}(\theta)

Union:

RS=R+S|R \cup S| = |R| + |S|

9.4.4 Selectivity Estimation

The CardinalityEstimator class provides selectivity estimation for various predicate types:

Equality Selectivity:

With statistics:

S(col=v)=1NDV(col)S(col = v) = \frac{1}{NDV(col)}

Without statistics:

S(col=v)=0.01 (default)S(col = v) = 0.01 \text{ (default)}

Range Selectivity:

With histogram:

S(col<v)=b:upperb<vcountb+interpolate(bv)RS(col < v) = \frac{\sum_{b: upper_b < v} count_b + \text{interpolate}(b_v)}{|R|}

Without histogram:

S(col<v)=0.30 (default)S(col < v) = 0.30 \text{ (default)}

IN Selectivity:

S(col{v1,...,vk})=min(k×S(col=vi),0.30)S(col \in \{v_1, ..., v_k\}) = \min(k \times S(col = v_i), 0.30)

Pattern Selectivity:

S(colpattern)=0.15 (default)S(col \sim pattern) = 0.15 \text{ (default)}

9.4.5 Correlation Handling

The naive independence assumption often produces severe underestimates. Cognica uses a damping factor for conjunctive predicates:

S(θ1θ2)=S(θ1)×S(θ2)S(\theta_1 \land \theta_2) = \sqrt{S(\theta_1) \times S(\theta_2)}

This geometric mean provides a middle ground between:

  • Full independence: S(θ1)×S(θ2)S(\theta_1) \times S(\theta_2) (often too low)
  • Full correlation: min(S(θ1),S(θ2))\min(S(\theta_1), S(\theta_2)) (often too high)

Example:

For status = 'active' AND country = 'US':

  • S(status=active)=0.3S(status = 'active') = 0.3
  • S(country=US)=0.2S(country = 'US') = 0.2
  • Independence: 0.3×0.2=0.060.3 \times 0.2 = 0.06
  • Damped: 0.3×0.2=0.245\sqrt{0.3 \times 0.2} = 0.245

9.4.6 Default Constants

When statistics are unavailable, the estimator uses conservative defaults:

static constexpr double kDefaultEqualitySelectivity = 0.01;
static constexpr double kDefaultRangeSelectivity = 0.30;
static constexpr double kDefaultInSelectivity = 0.05;
static constexpr double kDefaultJoinSelectivity = 0.1;
static constexpr double kDefaultGroupReduction = 0.1;

These defaults are calibrated to avoid catastrophic underestimates while remaining reasonably selective.

9.5 Statistics and Histograms

Accurate cardinality estimation requires statistics about data distribution. Cognica maintains comprehensive statistics including histograms, distinct value counts, and correlation information.

9.5.1 Histogram Structure

Cognica uses equi-depth (equi-height) histograms:

struct HistogramBucket {
  double lower_bound;   // Inclusive
  double upper_bound;   // Inclusive
  int64_t count;        // Values in bucket
  int64_t distinct;     // Distinct values in bucket
};

Equi-Depth Property: Each bucket contains approximately the same number of values:

countiRBcount_i \approx \frac{|R|}{B}

Where BB is the number of buckets (default: 100).

Advantages over Equi-Width:

  • Better handles skewed distributions
  • Uniform error bounds across the value range
  • Naturally adapts to data density

9.5.2 Histogram-Based Selectivity

Equality Selectivity:

auto Histogram::estimate_equality_selectivity(double value) const -> double {
  auto bucket_idx = find_bucket_(value);
  if (bucket_idx >= buckets_.size()) {
    return 0.0;  // Value outside range
  }

  const auto& bucket = buckets_[bucket_idx];
  // Assume uniform distribution within bucket
  return static_cast<double>(bucket.count) /
         static_cast<double>(bucket.distinct * total_count_);
}

Range Selectivity:

S(lowercol<upper)=brangecountbRS(lower \leq col < upper) = \frac{\sum_{b \in range} count_b}{|R|}

With linear interpolation for partial buckets:

auto Histogram::interpolate_bucket_fraction_(
    size_t bucket_idx, double lower, double upper) const -> double {
  const auto& b = buckets_[bucket_idx];
  auto bucket_width = b.upper_bound - b.lower_bound;

  if (bucket_width <= 0) {
    return 1.0;  // Single-value bucket
  }

  auto effective_lower = std::max(lower, b.lower_bound);
  auto effective_upper = std::min(upper, b.upper_bound);

  return (effective_upper - effective_lower) / bucket_width;
}

9.5.3 Multi-Column Statistics

For correlated columns, single-column statistics are insufficient. Cognica supports several multi-column statistics:

N-Distinct Statistics:

Tracks distinct value counts for column combinations:

// Key: "col1,col2,col3" (sorted, comma-separated)
// Value: distinct count for the combination
std::unordered_map<std::string, int64_t> multi_column_ndistinct;

Functional Dependencies:

Tracks when one column determines another:

// Key: "A->B" (A determines B)
// Value: dependency strength [0.0, 1.0]
std::unordered_map<std::string, double> functional_dependencies;

A value of 1.0 indicates perfect functional dependency; lower values indicate partial dependency.

2D Histograms:

For pairs of numeric columns with significant correlation:

struct Histogram2D {
  std::vector<double> x_boundaries;
  std::vector<double> y_boundaries;
  std::vector<std::vector<int64_t>> grid;  // Row-major counts
};

9.5.4 Statistics Collection

Statistics are collected by IndexStatisticsCollector:

struct Statistics {
  // Basic cardinality
  int64_t total_keys;
  int64_t distinct_values;
  HyperLogLog hll;  // Approximate distinct count

  // Range information
  std::unordered_map<std::string, double> min_values;
  std::unordered_map<std::string, double> max_values;
  std::unordered_map<std::string, std::string> min_string_values;
  std::unordered_map<std::string, std::string> max_string_values;

  // Null counts
  std::unordered_map<std::string, int64_t> null_counts;

  // Histograms
  std::unordered_map<std::string, Histogram> histograms;

  // Multi-column statistics
  std::unordered_map<std::string, int64_t> multi_column_ndistinct;
  std::unordered_map<std::string, double> functional_dependencies;
  std::unordered_map<std::string, Histogram2D> histograms_2d;

  // Staleness tracking
  TimePoint last_updated;
  bool is_stale;
  int64_t last_analyzed_write_count;
};

9.6 Cost Model

The cost model estimates the resource consumption of execution plans, enabling the optimizer to compare alternatives.

9.6.1 Cost Structure

Cognica uses a multi-dimensional cost model:

struct Cost {
  double cpu_cost;      // CPU cycles estimate
  double io_cost;       // I/O operations
  double memory_cost;   // Memory pressure
  double spill_cost;    // Disk spill overhead

  auto total() const -> double {
    return cpu_cost +
           io_cost * kIOCostWeight +
           memory_cost * kMemoryCostWeight +
           spill_cost * kSpillCostWeight;
  }
};

Cost Weights:

static constexpr double kIOCostWeight = 10.0;
static constexpr double kMemoryCostWeight = 0.1;
static constexpr double kSpillCostWeight = 5.0;

The weights reflect modern hardware characteristics:

  • I/O is heavily weighted because disk access is orders of magnitude slower than CPU
  • Memory cost is lightly weighted because it represents pressure, not direct time
  • Spill cost represents the penalty of exceeding memory budgets

9.6.2 Cost Constants

The cost model uses calibrated constants:

// CPU costs
static constexpr double kCPUTupleProcessing = 0.01;
static constexpr double kCPUComparison = 0.0001;
static constexpr double kCPUHashOp = 0.0005;

// I/O costs
static constexpr double kHeapAccessCost = 0.5;
static constexpr double kSpillIOCost = 0.01;  // Per byte

// Memory constants
static constexpr double kCompressionFactor = 0.5;  // LZ4 typical
static constexpr double kHashTableOverhead = 1.5;
static constexpr size_t kMergeWidth = 16;
static constexpr size_t kDefaultBlockSize = 8192;
static constexpr size_t kAggStateSize = 256;

9.6.3 Operator Cost Formulas

Sequential Scan:

Cseq(R)=R×(cputuple+iopage×1tuplesper_page)C_{seq}(R) = |R| \times (cpu_{tuple} + io_{page} \times \frac{1}{tuples_{per\_page}})

Index Scan:

Cidx(R,sel)=R×sel×(cputuple+iorandom+ioheap)C_{idx}(R, sel) = |R| \times sel \times (cpu_{tuple} + io_{random} + io_{heap})

The random I/O cost reflects the non-sequential access pattern.

Sort:

In-memory:

Csort(R)=R×log2(R)×cpucmpC_{sort}(R) = |R| \times \log_2(|R|) \times cpu_{cmp}

External:

Cext_sort(R)=Csort(R)+2×R×widthblock×(iowrite+ioread)×logM(runs)C_{ext\_sort}(R) = C_{sort}(R) + \frac{2 \times |R| \times width}{block} \times (io_{write} + io_{read}) \times \lceil \log_M(runs) \rceil

Where MM is the merge width and runsruns is the number of initial sorted runs.

Hash Join:

Chash(R,S)=R×cpuhash×kbuild+S×cpuhash×kprobeC_{hash}(R, S) = |R| \times cpu_{hash} \times k_{build} + |S| \times cpu_{hash} \times k_{probe}

Where kbuild=1.0k_{build} = 1.0 and kprobe=1.2k_{probe} = 1.2 (probe is slightly more expensive due to collision handling).

Nested Loop Join:

Cnl(R,S)=R×S×cputuple×knlC_{nl}(R, S) = |R| \times |S| \times cpu_{tuple} \times k_{nl}

Where knl=10.0k_{nl} = 10.0 reflects the high cost of repeated inner scans.

9.6.4 Spill Prediction

The cost model predicts when operations will exceed memory budgets:

auto CostEstimator::will_spill(double data_size, size_t memory_budget) const -> bool {
  return data_size > static_cast<double>(memory_budget);
}

When spill is predicted, the cost model adds spill overhead:

Cspill=data_sizeblock_size×iospill×(1+1compression)C_{spill} = \frac{data\_size}{block\_size} \times io_{spill} \times (1 + \frac{1}{compression})

9.7 Join Ordering

Join ordering is often the most impactful optimization decision. The order of joins can change query execution time by orders of magnitude.

9.7.1 The Join Ordering Problem

For nn tables, the number of possible join trees is:

T(n)=(2n2)!(n1)!T(n) = \frac{(2n-2)!}{(n-1)!}
TablesJoin Trees
21
312
4120
51,680
1017,643,225,600

Exhaustive enumeration is infeasible for large queries. Dynamic programming reduces the complexity to O(3n)O(3^n) while guaranteeing optimality.

9.7.2 Join Graph Representation

Cognica represents join relationships as a graph:

struct JoinVertex {
  std::string identifier;      // Table alias
  LogicalPlanPtr plan;         // Subplan for this table
  double cardinality;          // Estimated rows
};

struct JoinGraphEdge {
  size_t left_idx;             // Left table index
  size_t right_idx;            // Right table index
  ast::Expression* condition;  // Join predicate
  ast::JoinType type;          // Inner, left, right, full
  double selectivity;          // Join selectivity
};

Example:

For the query:

SELECT * FROM A
JOIN B ON A.x = B.x
JOIN C ON B.y = C.y
JOIN D ON A.z = D.z
Loading diagram...

9.7.3 DPccp Algorithm

Cognica implements the DPccp (Dynamic Programming with Connected-Complement-Pairs) algorithm from Moerkotte and Neumann. This algorithm generates optimal bushy join trees without cross products.

Key Insight: A valid join plan connects a subset SS of tables with its complement Sˉ\bar{S} only if there exists a join predicate between them.

Algorithm Structure:

auto JoinOrderEnumerator::enumerate(const JoinGraph& graph) -> LogicalPlanPtr {
  // Initialize DP table with single-table plans
  for (size_t i = 0; i < graph.vertex_count(); ++i) {
    auto singleton = 1ULL << i;
    dp_table_[singleton] = {
      .subset = singleton,
      .plan = graph.vertices()[i].plan,
      .cost = 0.0,
      .cardinality = graph.vertices()[i].cardinality
    };
  }

  // Enumerate connected subgraph-complement pairs
  enumerate_csg_cmp_pairs_(graph);

  // Return plan for all tables
  auto all_tables = (1ULL << graph.vertex_count()) - 1;
  return dp_table_[all_tables].plan;
}

CSG-CMP Enumeration:

The algorithm enumerates pairs (S1,S2)(S_1, S_2) where:

  1. S1S_1 and S2S_2 are both connected subgraphs
  2. S1S2=S_1 \cap S_2 = \emptyset
  3. There exists an edge between S1S_1 and S2S_2
void JoinOrderEnumerator::enumerate_csg_cmp_pairs_(const JoinGraph& graph) {
  auto n = graph.vertex_count();

  // For each starting vertex
  for (size_t i = 0; i < n; ++i) {
    auto start = 1ULL << i;

    // Enumerate connected subgraphs containing vertex i
    enumerate_csg_rec_(graph, start, start, i);
  }
}

void JoinOrderEnumerator::enumerate_csg_rec_(
    const JoinGraph& graph,
    uint64_t current,
    uint64_t excluded,
    size_t min_vertex) {

  // For current subgraph, enumerate complements
  enumerate_cmp_(graph, current);

  // Extend subgraph with neighbors
  auto neighbors = get_neighborhood_(graph, current, excluded, min_vertex);

  for (auto neighbor : neighbors) {
    auto extended = current | neighbor;
    enumerate_csg_rec_(graph, extended, excluded | neighbor, min_vertex);
  }
}

Complement Enumeration:

void JoinOrderEnumerator::enumerate_cmp_(const JoinGraph& graph, uint64_t s1) {
  // Find edges crossing from s1 to complement
  auto crossing = graph.get_crossing_edges(s1);

  if (crossing.empty()) {
    return;  // No valid complement
  }

  // Build complement subgraphs
  auto complement_vertices = get_complement_vertices_(graph, s1, crossing);

  for (auto s2 : enumerate_connected_subsets_(graph, complement_vertices)) {
    emit_csg_cmp_(graph, s1, s2);
  }
}

Plan Emission:

void JoinOrderEnumerator::emit_csg_cmp_(
    const JoinGraph& graph, uint64_t s1, uint64_t s2) {

  // Skip if either subset doesn't have a plan yet
  if (dp_table_.find(s1) == dp_table_.end() ||
      dp_table_.find(s2) == dp_table_.end()) {
    return;
  }

  auto& plan1 = dp_table_[s1];
  auto& plan2 = dp_table_[s2];

  // Create join plan and estimate cost
  auto join_plan = create_join_plan_(graph, s1, s2, plan1, plan2);
  auto cost = estimate_join_cost_(plan1, plan2);
  auto cardinality = estimate_join_cardinality_(graph, s1, s2, plan1, plan2);

  // Store if better than existing plan
  auto combined = s1 | s2;
  store_plan_(combined, std::move(join_plan), cost, cardinality);
}

9.7.4 Join Cost Estimation

The join cost depends on the algorithm chosen:

static constexpr double kHashBuildCostPerRow = 1.0;
static constexpr double kHashProbeCostPerRow = 1.2;
static constexpr double kNestLoopCostPerRow = 10.0;
static constexpr double kSortMergeCostPerRow = 2.0;

Hash Join Cost:

Chash(R,S)=R×kbuild+S×kprobeC_{hash}(R, S) = |R| \times k_{build} + |S| \times k_{probe}

The smaller relation is typically chosen as the build side.

Join Cardinality:

RS=R×S×Sjoin|R \bowtie S| = |R| \times |S| \times S_{join}

Join selectivity is estimated from:

  1. Key cardinalities (for equi-joins on keys)
  2. Default selectivity (0.1) when statistics are unavailable

9.7.5 Bushy vs Left-Deep Plans

DPccp generates bushy plans, which can be more efficient than left-deep plans:

Left-Deep Plan:

Loading diagram...

Bushy Plan:

Loading diagram...

Bushy plans enable parallelism and can reduce intermediate result sizes.

9.7.6 Join Reordering Safety Guard

The multi-way join enumeration algorithm (DPccp) assumes that the inputs to each join are independent, side-effect-free relations whose evaluation order does not affect semantics. This assumption holds for plain table scans but breaks down for several classes of join inputs:

  • Lateral joins — the right side of a LATERAL join references columns from the left side, creating a data dependency that the enumerator's bitmask representation cannot express.
  • Table-valued functions — function calls like generate_series() or unnest() may depend on correlated outer values and must remain in their original position.
  • Subquery inputs — derived tables (FROM (SELECT ...) AS t) may contain correlated references or side effects.

The optimizer guards against incorrect reordering by inspecting the join chain before invoking DPccp. It walks the left-spine of nested inner joins and checks two conditions:

  1. All joins in the chain must be INNER joins (outer and semi-joins have non-commutative semantics).
  2. Every right-side input of each join must be a plain LogicalScan node.

If either condition is violated, the optimizer skips multi-way enumeration and falls back to pairwise join optimization, which preserves the original tree shape and its associated evaluation order.

9.8 Access Path Selection

Access path selection determines how to read data from tables—full scan, index scan, or index-only scan.

9.8.1 Access Path Types

enum class AccessPathType {
  kSeqScan,          // Full table scan
  kIndexScan,        // Index scan with heap fetch
  kIndexOnlyScan,    // Covering index scan
  kPrimaryKeyLookup  // Direct point lookup
};

9.8.2 Access Path Enumeration

The AccessPathEnumerator generates all viable access paths:

auto AccessPathEnumerator::enumerate(const ast::Expression* filter)
    -> std::vector<AccessPath> {

  auto paths = std::vector<AccessPath> {};

  // Sequential scan is always available
  paths.push_back(create_seq_scan_());

  // Check for primary key equality
  if (is_pk_equality_(filter)) {
    paths.push_back(create_pk_lookup_(filter));
  }

  // Check each available index
  for (const auto& index : indexes_) {
    if (auto bounds = extract_index_bounds_(filter, index)) {
      paths.push_back(create_index_scan_(index, *bounds));

      // Check for covering index
      if (is_covering_index_(index)) {
        paths.push_back(create_index_only_scan_(index, *bounds));
      }
    }
  }

  // Sort by cost
  std::sort(paths.begin(), paths.end(),
            [](const auto& a, const auto& b) {
              return a.cost.total() < b.cost.total();
            });

  return paths;
}

9.8.3 Cost Comparison

Sequential Scan:

  • Cost: Proportional to table size
  • Best when: High selectivity (reading most of table) or no suitable index

Index Scan:

  • Cost: Index traversal + heap fetches
  • Best when: Low selectivity (reading small fraction of table)

Index-Only Scan:

  • Cost: Index traversal only
  • Best when: Index contains all required columns (covering index)

Primary Key Lookup:

  • Cost: Single point lookup
  • Best when: Equality predicate on primary key

Break-Even Analysis:

For an index scan to beat a sequential scan:

R×sel×(Cidx+Cheap)<R×Cseq|R| \times sel \times (C_{idx} + C_{heap}) < |R| \times C_{seq}

Solving for selectivity:

sel<CseqCidx+Cheapsel < \frac{C_{seq}}{C_{idx} + C_{heap}}

With typical constants (Cseq=1.0C_{seq} = 1.0, Cidx=0.1C_{idx} = 0.1, Cheap=4.0C_{heap} = 4.0):

sel<1.04.124%sel < \frac{1.0}{4.1} \approx 24\%

Index scans typically win when selecting less than ~25% of rows.

9.9 Strategy Selection

After determining access paths, the optimizer selects algorithms for each operator.

9.9.1 Sort Strategy Selection

enum class SortStrategy {
  kNoSort,        // Input already sorted
  kTopKHeap,      // Heap-based for small limits
  kInMemorySort,  // QuickSort in memory
  kExternalSort,  // External merge sort
  kIndexScan      // Use index ordering
};

Selection Logic:

auto SortStrategySelector::select(const SortConfig& config) -> SortStrategy {
  // Check if input is already sorted
  if (is_sorted_by_(config.input_ordering, config.sort_keys)) {
    return SortStrategy::kNoSort;
  }

  // Check for index providing order
  for (const auto& idx : config.available_indexes) {
    if (provides_ordering_(idx, config.sort_keys)) {
      return SortStrategy::kIndexScan;
    }
  }

  // Check for TopK optimization
  if (config.limit_hint && *config.limit_hint <= kTopKThreshold) {
    return SortStrategy::kTopKHeap;
  }

  // Choose between in-memory and external
  auto data_size = config.input_rows * config.row_width;
  if (data_size <= config.memory_budget) {
    return SortStrategy::kInMemorySort;
  }

  return SortStrategy::kExternalSort;
}

9.9.2 Join Strategy Selection

enum class JoinStrategy {
  kHashJoin,         // Hash build/probe
  kIndexNestedLoop,  // Index lookup on inner
  kMergeJoin,        // Sort-merge
  kNestedLoop        // Simple nested loop
};

Selection Logic:

auto JoinStrategySelector::select(const JoinConfig& config) -> JoinStrategy {
  // Check for index nested loop (small outer, index on inner)
  if (config.left_rows <= kINLOuterThreshold) {
    for (const auto& idx : config.right_indexes) {
      if (matches_join_keys_(idx, config.join_keys)) {
        return JoinStrategy::kIndexNestedLoop;
      }
    }
  }

  // Check if both sides are sorted on join keys
  if (is_sorted_on_(config.left_ordering, config.join_keys) &&
      is_sorted_on_(config.right_ordering, config.join_keys)) {
    return JoinStrategy::kMergeJoin;
  }

  // Default to hash join
  auto smaller_side = std::min(config.left_rows, config.right_rows);
  auto hash_size = smaller_side * config.row_width * kHashTableOverhead;

  if (hash_size <= config.memory_budget) {
    return JoinStrategy::kHashJoin;
  }

  // Fallback to nested loop (should be rare)
  return JoinStrategy::kNestedLoop;
}

9.9.3 Aggregate Strategy Selection

enum class AggregateStrategy {
  kStreamAggregate,  // Streaming on sorted input
  kHashAggregate,    // Hash-based grouping
  kSortAggregate     // Sort then stream
};

Selection Logic:

auto AggregateStrategySelector::select(const AggregateConfig& config)
    -> AggregateStrategy {

  // Check if input is sorted on group keys
  if (is_sorted_on_(config.input_ordering, config.group_keys)) {
    return AggregateStrategy::kStreamAggregate;
  }

  // Estimate hash table size
  auto groups = config.estimated_groups.value_or(
      config.input_rows * kDefaultGroupReduction);
  auto hash_size = groups * kBytesPerGroupState;

  if (hash_size <= config.memory_budget) {
    return AggregateStrategy::kHashAggregate;
  }

  // Sort then stream
  return AggregateStrategy::kSortAggregate;
}

9.10 Memory Budget Allocation

The final optimization phase allocates memory budgets to operators and predicts spill behavior.

9.10.1 Budget Distribution

Memory is distributed proportionally to operator data requirements:

void MemoryBudgetAllocator::allocate(PhysicalPlan& plan) {
  // Identify memory-consuming operators
  auto blocking_ops = find_blocking_operators_(plan);

  // Calculate total data volume
  auto total_data = 0.0;
  for (const auto& op : blocking_ops) {
    total_data += op->properties().cardinality.memory_estimate();
  }

  // Distribute budget proportionally
  for (auto& op : blocking_ops) {
    auto data_size = op->properties().cardinality.memory_estimate();
    auto weight = data_size / total_data;
    auto budget = static_cast<size_t>(weight * total_budget_);

    // Ensure minimum budget
    budget = std::max(budget, kMinBudget);

    op->properties().memory_budget = budget;
    op->properties().will_spill = (data_size > budget);
  }
}

9.10.2 Spill Prediction

When an operator's estimated data exceeds its budget, spill is predicted:

struct OperatorProperties {
  CardinalityEstimate cardinality;
  std::vector<PhysicalOrdering> ordering;
  size_t memory_budget;
  bool will_spill;
  std::optional<SpillOptions> spill_options;
  bool preserves_ordering;
};

Spill options configure the spill behavior:

struct SpillOptions {
  std::string spill_directory;
  size_t spill_block_size = 64 * 1024;
  CompressionType compression = CompressionType::kLZ4;
};

9.10.3 Minimum Budget Guarantee

Each operator receives at least 16 MB:

static constexpr size_t kMinBudget = 16 * 1024 * 1024;

This ensures operators can make progress even with limited memory. Below this threshold, the constant overhead of spilling exceeds any benefit from reduced memory usage.

9.11 Optimizer Configuration

The optimizer behavior is controlled by configuration options:

struct OptimizerConfig {
  // Memory configuration
  size_t memory_budget = 256 * 1024 * 1024;  // 256 MB default
  std::string spill_directory;

  // Optimization passes
  uint32_t max_optimization_passes = 10;

  // Feature flags
  bool enable_filter_pushdown = true;
  bool enable_topk_optimization = true;
  bool enable_index_selection = true;
  bool enable_cost_based_join = true;

  // Strategy forcing (for testing)
  std::optional<SortStrategy> force_sort_strategy;
  std::optional<JoinStrategy> force_join_strategy;
  std::optional<AggregateStrategy> force_aggregate_strategy;
};

Feature Flags: Individual optimizations can be disabled for debugging or when they cause regressions.

Strategy Forcing: For testing and debugging, specific strategies can be forced regardless of cost estimates.

9.12 Optimization Statistics

The optimizer collects detailed statistics about its operation:

struct OptimizerStats {
  // Timing (microseconds)
  int64_t parse_time_us;
  int64_t logical_build_time_us;
  int64_t rule_optimization_time_us;
  int64_t cardinality_estimation_time_us;
  int64_t physical_build_time_us;
  int64_t total_time_us;

  // Rule application
  uint32_t rules_applied;
  uint32_t filters_pushed_down;
  uint32_t topk_optimizations;

  // Plan characteristics
  double estimated_rows;
  double estimated_cost;
  std::string selected_access_path;
  std::string selected_sort_strategy;
  std::string selected_join_strategy;
  std::string selected_aggregate_strategy;

  // Memory
  size_t total_memory_allocated;
  size_t operators_will_spill;
};

These statistics enable:

  1. Performance monitoring of the optimizer itself
  2. Plan explanation for users
  3. Regression detection in optimizer changes
  4. Debugging of plan quality issues

9.13 Summary

Cognica's logical planning and optimization system implements a sophisticated multi-phase pipeline:

  1. Logical Plan Construction: Converts SQL AST to relational algebra tree with support for subqueries, table-valued functions, UNNEST, and expression-based LIMIT/OFFSET.
  2. Rule-Based Optimization: Applies algebraic transformations (filter merging, pushdown, predicate ordering, TopK optimization).
  3. Plan Optimization: An eight-pass pipeline that simplifies predicates, estimates selectivity, pushes down predicates, unnests correlated subqueries into joins, eliminates common subexpressions, expands OR-linked join predicates, reorders multi-way joins, and prunes unused columns.
  4. Cardinality Estimation: Estimates result sizes using histograms and statistics.
  5. Physical Planning: Selects access paths and execution algorithms.
  6. Memory Allocation: Distributes memory budgets and predicts spills.

Key innovations include:

  • Damped Selectivity: Uses geometric mean for correlated predicates
  • DPccp Join Ordering: Optimal bushy tree generation without cross products, with a safety guard that preserves lateral, table function, and subquery inputs
  • Subquery Unnesting: Converts correlated EXISTS/IN/NOT EXISTS subqueries into semi-joins and anti-joins, supporting table references, derived tables, and table-valued functions as subquery sources
  • Common Subexpression Elimination: Factors out repeated expression subtrees to avoid redundant computation
  • Multi-Dimensional Cost Model: Balances CPU, I/O, memory, and spill costs
  • Integrated Spill Prediction: Memory-aware planning from the start

The optimizer balances plan quality against optimization time, using heuristics where exact solutions are intractable while guaranteeing optimality for critical decisions like join ordering.

Copyright (c) 2023-2026 Cognica, Inc.