Chapter 14: CVM Compilation Pipeline

14.1 Introduction to Query Compilation

The CVM compilation pipeline transforms SQL expressions into efficient bytecode through a series of well-defined stages. This chapter examines each stage in detail, from the initial lowering of SQL Abstract Syntax Trees to the final bytecode generation.

Query compilation occupies a critical position in database performance. While interpretation offers flexibility, compilation enables optimizations impossible in an interpreted setting—constant folding, common subexpression elimination, and register allocation all occur at compile time, reducing runtime overhead.

14.1.1 Compilation Pipeline Overview

The compilation process follows a classic three-stage compiler architecture:

Loading diagram...

14.1.2 Design Philosophy

The compilation pipeline embodies several key principles:

  1. Separation of Concerns: Lowering, optimization, and code generation are independent phases with clean interfaces.

  2. Progressive Refinement: Each stage transforms the program into a form closer to executable bytecode.

  3. Optimization Composability: Passes can be added, removed, or reordered without affecting correctness.

  4. Debug Transparency: Source location information flows through all stages for error reporting.

14.2 Intermediate Representation

The IR (Intermediate Representation) serves as the central data structure connecting frontend and backend. CVM uses a DAG-based (Directed Acyclic Graph) IR that naturally represents expression trees with sharing.

14.2.1 IR Node Types

All IR nodes inherit from the IRNode base class:

enum class IRNodeKind : uint8_t {
  kConstant,         // Literal constant value
  kParameter,        // Input parameter (column, bound value)
  kBinaryOperation,  // Binary operation (+, -, ==, <, &&)
  kUnaryOperation,   // Unary operation (-, NOT, ABS)
  kCall,             // Function call (builtin, scalar, aggregate)
  kConditional,      // IF-THEN-ELSE / CASE expression
  kCast,             // Type cast
  kFieldAccess,      // Document field access
  kArrayAccess,      // Array element access
  kArrayConstruct,   // Array literal construction
  kCoalesce,         // COALESCE expression
  kNullTest          // IS NULL / IS NOT NULL
};

14.2.2 Node Properties

Each IR node carries metadata essential for optimization and code generation:

class IRNode {
protected:
  IRNodeKind kind_;           // Node type discriminant
  CVMType result_type_;       // Result type (kInt64, kDouble, etc.)
  bool is_nullable_;          // Can result be NULL?
  uint32_t node_id_;          // Unique ID for CSE
  std::vector<IRNodePtr> children_;  // Child nodes
};

Type Information: The result_type_ field enables type-specific code generation. When the type is known at compile time, the generator emits type-specialized opcodes (e.g., kAddInt64 vs kAddFloat64).

Nullability Tracking: The is_nullable_ flag propagates through operations, enabling NULL-check elimination when values are guaranteed non-NULL.

14.2.3 Constant Nodes

Constants represent literal values known at compile time:

class IRConstant : public IRNode {
  std::variant<
    std::monostate,    // NULL
    bool,              // Boolean
    int64_t,           // Integer
    double,            // Floating point
    std::string        // String
  > value_;
};

The IRBuilder deduplicates constants—multiple references to the same constant value share a single IRConstant node.

14.2.4 Parameter Nodes

Parameters represent runtime inputs:

class IRParameter : public IRNode {
  ParameterSource source_;    // Column, bound value, input doc
  uint32_t index_;            // Parameter index
  std::string name_;          // Optional name for debugging
};

enum class ParameterSource {
  kColumnReference,   // Column from input tuple
  kBoundValue,        // Prepared statement parameter
  kInputDocument,     // Root input document
  kCurrentRow         // Current row in iteration
};

14.2.5 Binary Operation Nodes

Binary operations cover arithmetic, comparison, logical, and string operations:

class IRBinaryOperation : public IRNode {
  BinaryOperationType op_;    // Operation type
  IRNodePtr left_;            // Left operand
  IRNodePtr right_;           // Right operand
};

enum class BinaryOperationType {
  // Arithmetic
  kAdd, kSubtract, kMultiply, kDivide, kModulo,

  // Comparison
  kEqual, kNotEqual, kLessThan, kLessEqual,
  kGreaterThan, kGreaterEqual,

  // Logical
  kLogicalAnd, kLogicalOr,

  // String
  kStringConcat, kLike, kILike, kRegexMatch,

  // Bitwise
  kBitwiseAnd, kBitwiseOr, kBitwiseXor,
  kShiftLeft, kShiftRight,

  // Array
  kArrayContains, kArrayConcat
};

14.2.6 Function Call Nodes

Function calls support multiple calling conventions:

class IRCall : public IRNode {
  CallType call_type_;              // Builtin, scalar, aggregate, etc.
  std::string function_name_;       // Function identifier
  std::vector<IRNodePtr> arguments_;
  bool is_distinct_;                // DISTINCT modifier
  IRNodePtr filter_;                // FILTER clause
  bool is_volatile_;                // Prevents CSE
};

enum class CallType {
  kBuiltin,     // Built-in function (UPPER, LENGTH)
  kScalar,      // User-defined scalar function
  kAggregate,   // Aggregate function (SUM, COUNT)
  kWindow,      // Window function (ROW_NUMBER, LAG)
  kSubquery,    // Scalar subquery
  kExternal     // External function call
};

The is_volatile_ flag marks non-deterministic functions (like RANDOM() or NOW()) to prevent common subexpression elimination.

14.2.7 IR Module Structure

An IR module packages the compiled expression with metadata:

class IRModule {
  IRNodePtr root_;                  // Root expression node
  std::vector<IRNodePtr> nodes_;    // All nodes (for traversal)
  std::string name_;                // Module name
  size_t parameter_count_;          // Input parameter count
};

14.3 SQL Lowering

The lowering phase transforms SQL AST nodes into CVM IR nodes. This translation handles SQL-specific constructs and prepares the expression for optimization.

14.3.1 Lowering Configuration

struct SQLLoweringOptions {
  bool fold_constants = true;       // Fold during lowering
  bool enable_cse = true;           // Enable CSE later
  bool preserve_names = true;       // Keep column names
  FunctionVolatilityChecker volatility_checker;
};

14.3.2 Column Binding

Column bindings map SQL column references to their runtime positions:

struct ColumnBinding {
  uint32_t index;          // Position in input tuple
  std::string name;        // Column name
  CVMType type;            // CVM type
  bool is_nullable = true; // Nullability
};

For a query like SELECT * FROM users WHERE age > 21, the column bindings might be:

IndexNameTypeNullable
0idkInt64false
1namekStringtrue
2agekInt64true
3emailkStringtrue

14.3.3 Expression Lowering

The SQLLowering class implements a visitor pattern over SQL AST nodes:

class SQLLowering {
public:
  auto lower(const Expr& expr) -> std::unique_ptr<IRModule>;
  auto lower(const Expr& expr, const std::vector<ColumnBinding>& columns)
      -> std::unique_ptr<IRModule>;

private:
  auto lower_expr_(const Expr& expr) -> IRNodePtr;
  auto lower_column_ref_(const ColumnRef& ref) -> IRNodePtr;
  auto lower_constant_(const Constant& c) -> IRNodePtr;
  auto lower_binary_expr_(const BinaryExpr& expr) -> IRNodePtr;
  auto lower_unary_expr_(const UnaryExpr& expr) -> IRNodePtr;
  auto lower_function_call_(const FunctionCall& call) -> IRNodePtr;
  auto lower_case_expr_(const CaseExpr& expr) -> IRNodePtr;
  auto lower_cast_expr_(const CastExpr& expr) -> IRNodePtr;
  auto lower_subquery_expr_(const SubqueryExpr& expr) -> IRNodePtr;
};

14.3.4 Lowering Examples

Column Reference:

SQL: age

auto lower_column_ref_(const ColumnRef& ref) -> IRNodePtr {
  auto binding = find_binding_(ref.column_name);
  return builder_.make_parameter(
      ParameterSource::kColumnReference,
      binding.index,
      binding.type,
      binding.name);
}

Binary Expression:

SQL: age >= 18

auto lower_binary_expr_(const BinaryExpr& expr) -> IRNodePtr {
  auto left = lower_expr_(expr.left);
  auto right = lower_expr_(expr.right);
  auto op = map_binary_op_(expr.op);  // SQL op -> IR op
  return builder_.make_binary(op, left, right);
}

CASE Expression:

SQL: CASE WHEN age >= 18 THEN 'adult' ELSE 'minor' END

auto lower_case_expr_(const CaseExpr& expr) -> IRNodePtr {
  // CASE expressions become nested conditionals
  auto condition = lower_expr_(expr.when_clause);
  auto then_branch = lower_expr_(expr.then_result);
  auto else_branch = lower_expr_(expr.else_result);
  return builder_.make_conditional(condition, then_branch, else_branch);
}

14.3.5 Operator Mapping

SQL operators map to IR operations:

SQL OperatorIR Operation
+kAdd
-kSubtract
*kMultiply
/kDivide
%kModulo
=kEqual
<>, !=kNotEqual
<kLessThan
<=kLessEqual
>kGreaterThan
>=kGreaterEqual
ANDkLogicalAnd
ORkLogicalOr
`
LIKEkLike
ILIKEkILike

14.3.6 Subquery Registration

Scalar subqueries are registered for runtime execution:

auto lower_subquery_expr_(const SubqueryExpr& expr) -> IRNodePtr {
  // Register subquery for runtime execution
  auto subquery_index = register_subquery_(expr);

  // Create call node referencing the registered subquery
  return builder_.make_call(
      CallType::kSubquery,
      "subquery_" + std::to_string(subquery_index),
      {}  // Arguments bound at runtime
  );
}

14.4 IR Optimization

The optimization phase transforms the IR to improve execution efficiency. Optimizations are implemented as composable passes.

14.4.1 Optimization Configuration

struct OptimizerConfig {
  uint8_t optimization_level = 1;      // 0-3
  bool enable_constant_folding = true;
  bool enable_dce = true;
  bool enable_cse = true;
  bool enable_strength_reduction = true;
  size_t max_iterations = 10;
};

enum class OptimizationLevel : uint8_t {
  kNone = 0,       // O0: No optimization
  kBasic = 1,      // O1: Constant folding + DCE
  kStandard = 2,   // O2: O1 + CSE + strength reduction
  kAggressive = 3  // O3: O2 + advanced transforms
};

14.4.2 Pass Framework

All optimization passes inherit from a common base:

class OptimizationPass {
public:
  virtual auto run(std::shared_ptr<IRModule> module)
      -> OptimizationResult = 0;
  virtual auto name() const -> std::string = 0;
  virtual auto enabled_at_level(uint8_t level) const -> bool = 0;
};

struct OptimizationResult {
  bool changed;              // IR was modified
  size_t nodes_removed;      // Nodes eliminated
  size_t nodes_added;        // New nodes created
  std::string pass_name;     // Pass identifier
};

14.4.3 Constant Folding

Purpose: Evaluate constant expressions at compile time.

Algorithm:

  1. Post-order traversal of IR DAG
  2. Recursively fold children first
  3. When all operands are constants, evaluate operation
  4. Replace subtree with IRConstant result

Example:

Before:

BinaryOp(kMultiply)
  BinaryOp(kAdd)
    Constant(3)
    Constant(4)
  Constant(2)

After:

Constant(14)

Implementation:

auto ConstantFoldingPass::fold_(IRNodePtr node) -> IRNodePtr {
  // Fold children first
  for (auto& child : node->children()) {
    child = fold_(child);
  }

  // Check if all children are constants
  if (!all_constants_(node->children())) {
    return node;
  }

  // Skip volatile functions
  if (is_volatile_(node)) {
    return node;
  }

  // Evaluate at compile time
  auto result = evaluate_(node);
  return builder_.make_constant(result);
}

Folded Operations:

  • Arithmetic: 3 + 4 -> 7
  • Comparison: 5 > 3 -> true
  • String: 'Hello' || ' World' -> 'Hello World'
  • Boolean: true AND false -> false

14.4.4 Dead Code Elimination

Purpose: Remove unreachable or unused code.

Algorithm:

  1. Mark live nodes reachable from root
  2. Count uses of each node
  3. Eliminate branches with dead successors
  4. Simplify conditionals with constant conditions

Example:

Before:

Conditional
  Constant(false)
  Call(expensive_function)  // then branch
  Parameter(x)              // else branch

After:

Parameter(x)

Implementation:

auto DeadCodeEliminationPass::eliminate_(IRNodePtr node) -> IRNodePtr {
  if (node->kind() == IRNodeKind::kConditional) {
    auto* cond = static_cast<IRConditional*>(node.get());

    // Check if condition is constant
    if (auto* const_cond = as_constant_(cond->condition())) {
      if (const_cond->as_bool()) {
        return cond->then_branch();  // Condition always true
      } else {
        return cond->else_branch();  // Condition always false
      }
    }
  }

  return node;
}

14.4.5 Common Subexpression Elimination

Purpose: Deduplicate equivalent expressions.

Algorithm:

  1. Hash each node based on operation and operand IDs
  2. Find nodes with identical hash
  3. Perform structural equality check
  4. Replace duplicates with first occurrence

Example:

Before:

BinaryOp(kAdd)
  BinaryOp(kMultiply)
    Parameter(a)
    Parameter(b)
  BinaryOp(kMultiply)   // Duplicate!
    Parameter(a)
    Parameter(b)

After:

BinaryOp(kAdd)
  t1 = BinaryOp(kMultiply)
    Parameter(a)
    Parameter(b)
  t1                     // Reuse!

Safety Considerations:

CSE must respect function volatility:

auto CSEPass::can_eliminate_(const IRNode* node) const -> bool {
  if (node->kind() == IRNodeKind::kCall) {
    auto* call = static_cast<const IRCall*>(node);
    // Don't eliminate volatile functions
    return !call->is_volatile();
  }
  return true;
}

Volatile functions include:

  • RANDOM() - Non-deterministic
  • NOW() - Time-dependent
  • NEXTVAL() - Sequence-dependent

14.4.6 Strength Reduction

Purpose: Replace expensive operations with cheaper equivalents.

Transformations:

OriginalOptimizedCondition
x * 2x + xAlways
x * 2^nx << nn is constant
x / 2^nx >> nn is constant, x unsigned
x % 2x & 1Always
x * 00Always
x * 1xAlways
x + 0xAlways
x - 0xAlways

Implementation:

auto StrengthReductionPass::reduce_(IRNodePtr node) -> IRNodePtr {
  if (node->kind() != IRNodeKind::kBinaryOperation) {
    return node;
  }

  auto* binop = static_cast<IRBinaryOperation*>(node.get());

  // x * 2 -> x + x
  if (binop->op() == BinaryOperationType::kMultiply) {
    if (is_constant_int_(binop->right(), 2)) {
      return builder_.make_binary(
          BinaryOperationType::kAdd,
          binop->left(),
          binop->left());
    }
  }

  // x * 2^n -> x << n
  if (binop->op() == BinaryOperationType::kMultiply) {
    if (auto power = get_power_of_two_(binop->right())) {
      return builder_.make_binary(
          BinaryOperationType::kShiftLeft,
          binop->left(),
          builder_.make_constant(*power));
    }
  }

  return node;
}

14.4.7 Register Allocation

Purpose: Assign virtual registers to IR values.

Algorithm: Linear scan with graph coloring hints

  1. Liveness Analysis: Compute live ranges for each node
  2. Interference Graph: Build graph where edges indicate simultaneous liveness
  3. Graph Coloring: Assign registers (colors) minimizing conflicts
  4. Coalescing: Merge nodes with no interference
  5. Spilling: Push excess values to stack

Configuration:

struct RegisterAllocationConfig {
  uint32_t num_general_registers = 16;
  uint32_t num_float_registers = 8;
  bool enable_coalescing = true;
};

Output:

// Maps IR nodes to allocated registers
std::unordered_map<const IRNode*, uint8_t> allocation_map_;

14.4.8 Optimization Pipeline

The optimizer runs passes iteratively until fixed point:

auto Optimizer::optimize(std::shared_ptr<IRModule> module)
    -> std::shared_ptr<IRModule> {

  for (size_t iter = 0; iter < config_.max_iterations; ++iter) {
    auto any_changed = false;

    for (auto& pass : passes_) {
      if (!pass->enabled_at_level(config_.optimization_level)) {
        continue;
      }

      auto result = pass->run(module);
      if (result.changed) {
        any_changed = true;
        stats_.nodes_removed += result.nodes_removed;
      }
    }

    if (!any_changed) {
      break;  // Fixed point reached
    }
  }

  return module;
}

14.5 Code Generation

The code generation phase transforms optimized IR into CVM bytecode.

14.5.1 Generator Configuration

struct BytecodeGenOptions {
  bool optimize_constants = true;   // Deduplicate constant loads
  bool optimize_registers = true;   // Minimize register usage
  bool emit_debug_info = true;      // Source mapping
  uint8_t max_registers = 16;       // Available registers
};

14.5.2 Code Generation Algorithm

The generator uses a depth-first visitor pattern:

class BytecodeGenerator {
public:
  auto generate(const IRModule& module) -> std::unique_ptr<BytecodeModule>;

private:
  // Returns register containing result
  auto visit_(const IRNode& node) -> uint8_t;

  auto visit_constant_(const IRConstant& node) -> uint8_t;
  auto visit_parameter_(const IRParameter& node) -> uint8_t;
  auto visit_binary_operation_(const IRBinaryOperation& node) -> uint8_t;
  auto visit_unary_operation_(const IRUnaryOperation& node) -> uint8_t;
  auto visit_call_(const IRCall& node) -> uint8_t;
  auto visit_conditional_(const IRConditional& node) -> uint8_t;
  auto visit_cast_(const IRCast& node) -> uint8_t;
  auto visit_field_access_(const IRFieldAccess& node) -> uint8_t;
};

14.5.3 Register Management

The generator maintains a register allocator:

class BytecodeGenerator {
private:
  // Register allocation state
  std::bitset<16> registers_in_use_;
  std::deque<uint8_t> allocation_order_;  // FIFO for eviction

  auto allocate_register_() -> uint8_t;
  auto allocate_temporary_() -> uint8_t;
  void free_register_(uint8_t reg);
  auto evict_register_() -> uint8_t;  // Spill to stack
};

FIFO Eviction: When all registers are in use, the oldest allocated register is evicted to the stack:

auto BytecodeGenerator::evict_register_() -> uint8_t {
  auto reg = allocation_order_.front();
  allocation_order_.pop_front();

  // Spill to stack
  emit_(Opcode::kStoreReg, reg, next_stack_slot_++);

  return reg;
}

14.5.4 Type-Aware Opcode Selection

The generator selects type-specific opcodes when types are known:

auto BytecodeGenerator::select_comparison_opcode_(
    BinaryOperationType op, CVMType type) -> Opcode {

  if (type == CVMType::kInt64) {
    switch (op) {
      case BinaryOperationType::kEqual:
        return Opcode::kCompareEqualInt64;
      case BinaryOperationType::kLessThan:
        return Opcode::kCompareLessThanInt64;
      // ...
    }
  } else if (type == CVMType::kDouble) {
    switch (op) {
      case BinaryOperationType::kEqual:
        return Opcode::kCompareEqualFloat64;
      // ...
    }
  } else if (type == CVMType::kString) {
    switch (op) {
      case BinaryOperationType::kEqual:
        return Opcode::kCompareEqualString;
      // ...
    }
  }

  // Fallback to polymorphic
  return Opcode::kCompareEqualPolymorphic;
}

14.5.5 Constant Pool Management

Constants are deduplicated in the constant pool:

class BytecodeGenerator {
private:
  ConstantPool constant_pool_;

  // Deduplication maps
  std::unordered_map<std::string, uint16_t> string_map_;
  std::unordered_map<int64_t, uint16_t> int64_map_;
  std::unordered_map<uint64_t, uint16_t> double_map_;  // Bit pattern

  auto add_string_constant_(const std::string& str) -> uint16_t {
    auto it = string_map_.find(str);
    if (it != string_map_.end()) {
      return it->second;
    }
    auto index = constant_pool_.add_string(str);
    string_map_[str] = index;
    return index;
  }
};

14.5.6 Jump Target Resolution

Jumps use PC-relative offsets that are patched after code generation:

struct PendingJump {
  uint32_t instruction_offset;  // Where to patch
  std::string target_label;     // Target label
};

void BytecodeGenerator::emit_jump_(const std::string& label) {
  auto offset = current_offset_();

  // Emit placeholder
  emit_instruction_(encode_format_c(
      Opcode::kJump, 0, 0));  // offset = 0 (placeholder)

  // Record for later patching
  pending_jumps_.push_back({offset, label});
}

void BytecodeGenerator::resolve_jumps_() {
  for (const auto& jump : pending_jumps_) {
    auto target = label_offsets_[jump.target_label];
    auto relative = (target - jump.instruction_offset) / 4;
    module_->patch_jump_offset(jump.instruction_offset, relative);
  }
}

14.6 Bytecode Module Structure

The compiled bytecode is packaged in a BytecodeModule with metadata.

14.6.1 Module Header

struct BytecodeHeader {
  uint32_t magic;                  // 0x304D5643 ("CVM0")
  uint16_t version_major;          // Major version
  uint16_t version_minor;          // Minor version
  uint32_t flags;                  // Module flags
  uint32_t constant_pool_offset;   // Constant pool location
  uint32_t constant_pool_size;     // Constant pool size
  uint32_t code_offset;            // Code section location
  uint32_t code_size;              // Code section size
  uint32_t debug_offset;           // Debug info location
  uint32_t debug_size;             // Debug info size
  uint32_t entry_point;            // Entry point offset
  uint64_t expression_hash;        // For caching
  uint8_t max_registers;           // GPRs used
  uint8_t max_float_registers;     // FPRs used
  uint8_t stack_depth;             // Stack depth required
  uint8_t reserved[13];            // Future expansion
};
static_assert(sizeof(BytecodeHeader) == 64);

14.6.2 Module Flags

enum class ModuleFlags : uint32_t {
  kNone = 0,
  kDebugInfo = 1 << 0,      // Contains debug info
  kOptimized = 1 << 1,      // Has been optimized
  kJitReady = 1 << 2,       // Prepared for JIT
  kSourceMapped = 1 << 3    // Contains source mapping
};

14.6.3 Constant Pool

The constant pool stores all constants referenced by bytecode:

enum class ConstantType : uint8_t {
  kNull = 0,       // NULL value
  kBool = 1,       // Boolean
  kInt64 = 2,      // 64-bit integer
  kDouble = 3,     // 64-bit float
  kString = 4,     // String (string table index)
  kFieldRef = 5,   // Field name reference
  kFuncRef = 6,    // Function reference
  kTypeId = 7,     // Type identifier
  kArray = 8       // Array of values
};

14.6.4 Debug Information

Debug info maps bytecode offsets to source locations:

struct DebugLocation {
  uint32_t code_offset;     // Bytecode offset
  uint32_t source_line;     // Source line (1-based)
  uint32_t source_column;   // Source column (1-based)
  uint32_t source_length;   // Source span length
};

struct DebugInfo {
  std::string source_name;                // Source identifier
  std::string source_text;                // Original source
  std::vector<DebugLocation> locations;   // Offset mapping
};

14.6.5 Module Layout

Loading diagram...

14.7 Compilation Examples

14.7.1 Simple Comparison: age >= 18

SQL AST:

BinaryExpr(GreaterEqual,
  ColumnRef("age"),
  Constant(18))

Lowered IR:

IRBinaryOperation(kGreaterEqual,
  IRParameter(index=0, name="age", type=kInt64),
  IRConstant(18))

Generated Bytecode:

LOAD_PARAM R0, 0              ; R0 = age
LOAD_CONST R1, pool[0]        ; R1 = 18
CMP_GE_I64 R2, R0, R1         ; R2 = (R0 >= R1)
RET_VALUE R2                   ; return R2

Bytecode Encoding:

0x0D 0x00 0x00 0x00    ; LOAD_PARAM R0, idx=0
0x07 0x10 0x00 0x00    ; LOAD_CONST R1, pool[0]
0x45 0x20 0x01 0x00    ; CMP_GE_I64 R2, R0, R1
0x78 0x20 0x00 0x00    ; RET_VALUE R2

14.7.2 Conditional: CASE WHEN age >= 18 THEN 'adult' ELSE 'minor' END

Lowered IR:

IRConditional(
  condition=IRBinaryOperation(kGreaterEqual, ...),
  then_branch=IRConstant("adult"),
  else_branch=IRConstant("minor"))

Generated Bytecode:

LOAD_PARAM R0, 0              ; R0 = age
LOAD_CONST R1, pool[0]        ; R1 = 18
CMP_GE_I64 R2, R0, R1         ; R2 = condition
JUMP_FALSE R2, else_label     ; if false, goto else
LOAD_CONST R3, pool[1]        ; R3 = "adult"
JUMP end_label
else_label:
LOAD_CONST R3, pool[2]        ; R3 = "minor"
end_label:
RET_VALUE R3                   ; return result

14.7.3 Constant Folding: (100 + 50) * 2

Before Optimization:

IRBinaryOperation(kMultiply,
  IRBinaryOperation(kAdd,
    IRConstant(100),
    IRConstant(50)),
  IRConstant(2))

After Constant Folding:

IRConstant(300)

Generated Bytecode (Optimized):

MOVE_I64 R0, 300              ; R0 = 300 (folded constant)
RET_VALUE R0

14.7.4 CSE: (a + b) + (a + b)

Before CSE:

IRBinaryOperation(kAdd,
  IRBinaryOperation(kAdd,     ; First (a + b)
    IRParameter(a),
    IRParameter(b)),
  IRBinaryOperation(kAdd,     ; Second (a + b) - duplicate!
    IRParameter(a),
    IRParameter(b)))

After CSE (DAG with sharing):

t1 = IRBinaryOperation(kAdd,
       IRParameter(a),
       IRParameter(b))
IRBinaryOperation(kAdd, t1, t1)  ; Reuse t1

Generated Bytecode:

LOAD_PARAM R0, 0              ; R0 = a
LOAD_PARAM R1, 1              ; R1 = b
ADD_I64 R2, R0, R1            ; R2 = a + b (computed once)
ADD_I64 R3, R2, R2            ; R3 = R2 + R2 (reused)
RET_VALUE R3

14.7.5 Function Call: UPPER(name)

Lowered IR:

IRCall(
  call_type=kBuiltin,
  function_name="upper",
  arguments=[IRParameter(index=1, name="name")])

Generated Bytecode:

LOAD_PARAM R0, 1              ; R0 = name
CALL_BUILTIN R1, funcid=42    ; R1 = upper(R0)
RET_VALUE R1

14.8 Compiler Integration

14.8.1 Compiler Class

The Compiler class provides the main compilation interface:

class Compiler {
public:
  // Main entry points
  auto compile_sql_expression(const sql::ast::Expr& expr)
      -> CompilationResult;

  auto compile_sql_expression(
      const sql::ast::Expr& expr,
      const std::vector<ColumnBinding>& bindings) -> CompilationResult;

  // Phase access for testing
  auto lower_sql_expression(const sql::ast::Expr& expr)
      -> std::shared_ptr<IRModule>;

  auto optimize_ir(std::shared_ptr<IRModule> module)
      -> std::shared_ptr<IRModule>;

  auto generate_bytecode(const IRModule& module)
      -> std::unique_ptr<BytecodeModule>;

private:
  CompilerOptions options_;
  std::unique_ptr<CompilationCache> cache_;
};

14.8.2 Compiler Options

struct CompilerOptions {
  // Optimization control
  OptimizationLevel optimization_level = OptimizationLevel::kBasic;
  bool fold_constants = true;
  bool eliminate_dead_code = true;
  bool enable_cse = true;
  bool strength_reduction = true;
  bool register_coalescing = true;

  // Debug output
  bool generate_debug_info = false;
  bool preserve_names = false;
  bool trace_compilation = false;

  // Code generation
  bool use_extended_opcodes = true;
  uint32_t max_registers = 16;

  // Cache
  bool enable_cache = true;
  size_t cache_max_entries = 1024;
};

14.8.3 Compilation Result

struct CompilationResult {
  bool success;
  std::unique_ptr<BytecodeModule> module;
  std::string error_message;
  uint32_t error_line;

  // Registered subqueries
  std::vector<const sql::ast::SubqueryExpr*> subqueries;

  // Statistics
  size_t ir_nodes_before_opt;
  size_t ir_nodes_after_opt;
  size_t bytecode_size;
  double compilation_time_ms;
};

14.8.4 Compilation Cache

The compiler caches compiled modules to avoid recompilation:

struct CompilationCacheKey {
  uint64_t hash;           // Expression hash
  std::string source_type; // "sql", "filter", etc.
};

class CompilationCache {
public:
  auto lookup(const CompilationCacheKey& key) const
      -> const BytecodeModule*;
  void insert(const CompilationCacheKey& key,
              std::unique_ptr<BytecodeModule> module);
  void clear();

  auto hit_count() const -> uint64_t;
  auto miss_count() const -> uint64_t;
  auto hit_rate() const -> double;

private:
  std::unordered_map<uint64_t, std::unique_ptr<BytecodeModule>> cache_;
  mutable uint64_t hits_ = 0;
  mutable uint64_t misses_ = 0;
};

Cache Effectiveness:

  • Same expression: 100% hit rate
  • Similar expressions (different literals): Cache miss
  • Typical mixed workload: 60-85% hit rate

14.9 Performance Characteristics

14.9.1 Compilation Time

Expression ComplexityTypical Time
Simple comparison (a > 5)50-100 us
Medium (a > 5 AND b < 10)100-200 us
Complex (10+ operations)200-500 us
Very complex (subqueries)500-1000 us

14.9.2 Bytecode Size

ExpressionInstructionsModule Size
age >= 184~100 bytes
a AND b AND c8-10~150 bytes
Complex CASE15-20~250 bytes

14.9.3 Optimization Impact

OptimizationTypical Reduction
Constant Folding10-30% fewer nodes
Dead Code Elimination5-15% fewer nodes
CSE5-20% fewer nodes
Strength Reduction0-5% faster execution

14.10 Summary

The CVM compilation pipeline transforms SQL expressions into efficient bytecode through a well-structured sequence of phases:

  1. Lowering: SQL AST nodes are translated to IR nodes, establishing type information and column bindings.

  2. Optimization: Multiple passes improve the IR:

    • Constant folding evaluates compile-time expressions
    • Dead code elimination removes unreachable code
    • CSE deduplicates equivalent subexpressions
    • Strength reduction replaces expensive operations
    • Register allocation assigns virtual registers
  3. Code Generation: The optimized IR is translated to bytecode with type-aware opcode selection and constant pool management.

  4. Module Packaging: The bytecode, constant pool, and metadata are packaged into a BytecodeModule for execution.

The compilation pipeline balances compilation speed against execution performance, with configurable optimization levels allowing users to choose the appropriate trade-off for their workload.

Copyright (c) 2023-2026 Cognica, Inc.