Chapter 13: CVM Architecture

13.1 Introduction to the Cognica Virtual Machine

The Cognica Virtual Machine (CVM) is a specialized bytecode interpreter designed for high-performance query execution. Unlike general-purpose virtual machines such as the JVM or CLR, CVM is purpose-built for database operations, with native support for document manipulation, aggregation, joins, sorting, and other query primitives.

CVM occupies a unique position in the database execution landscape. Traditional database engines use either interpreted expression evaluation (slow but flexible) or compile queries directly to native code (fast but complex). CVM provides a middle ground: a bytecode representation that is faster than interpretation, simpler than native compilation, and provides a clean target for optimization passes.

13.1.1 Design Philosophy

CVM's architecture reflects several key design principles:

Query-Centric Instruction Set: Rather than general-purpose operations, CVM provides opcodes specifically designed for database operations—field access, aggregation, sorting, and joins are first-class operations.
Register-Based Architecture: Like modern CPUs and high-performance VMs (Lua, Dalvik), CVM uses registers rather than a stack machine. This reduces instruction count and enables better optimization.
Computed Goto Dispatch: The interpreter uses computed goto (threaded code) for 10-20% faster dispatch compared to switch-based interpretation.
Spillable Data Structures: Memory-intensive operations (sort, hash join, aggregation) can automatically spill to disk when memory is exhausted.
Zero-Copy Optimizations: Composite rows enable multi-table joins without deep copying documents.

13.1.2 CVM in the Query Pipeline

CVM integrates into the query execution pipeline as follows:

Loading diagram...

13.2 Architectural Overview

13.2.1 Core Components

The CVM system consists of several interconnected components:

Component	Location	Purpose
Interpreter	`interpreter/interpreter.hpp`	Bytecode execution with computed-goto dispatch
VMContext	`interpreter/context.hpp`	Runtime state (registers, stacks, buffers)
Opcode System	`bytecode/opcode.hpp`	512+ opcode definitions
Instruction Format	`bytecode/instruction.hpp`	32-bit instruction encoding
BytecodeModule	`bytecode/bytecode_module.hpp`	Compiled bytecode unit
Value System	`types/value.hpp`	24-byte tagged union
Type System	`types/type_system.hpp`	16 CVM types

13.2.2 Key Dimensions

Dimension	Value
Primary Opcodes	256 (0x00-0xFF)
Extended Opcodes	256 (via 0xFE prefix)
General Registers	16 (R0-R15)
Float Registers	8 (F0-F7)
CVM Types	16
VMValue Size	24 bytes
Max Call Stack	256 frames
Max Operand Stack	1024 values
Max Cursors	16

13.3 Type System

CVM implements a rich type system supporting SQL data types, JSON values, and internal execution types.

13.3.1 Type Enumeration

enum class CVMType : uint8_t {
  kNull = 0x00,         // SQL NULL / JSON null
  kBool = 0x01,         // Boolean true/false
  kInt64 = 0x02,        // 64-bit signed integer
  kDouble = 0x03,       // IEEE 754 double precision
  kString = 0x04,       // UTF-8 string
  kArray = 0x05,        // Ordered collection
  kDocument = 0x06,     // Key-value object
  kBinary = 0x07,       // Raw byte array (BYTEA)
  kTimestamp = 0x08,    // TIMESTAMP WITHOUT TIME ZONE
  kTimestampTZ = 0x09,  // TIMESTAMP WITH TIME ZONE
  kDate = 0x0A,         // Date (year-month-day)
  kTime = 0x0B,         // Time of day
  kInterval = 0x0C,     // Time interval
  kDecimal = 0x0D,      // Arbitrary precision decimal
  kUUID = 0x0E,         // 128-bit UUID
  kCompositeRow = 0x0F, // Zero-copy JOIN composite
  kAggState = 0x10,     // Aggregation state (internal)
  kVoid = 0xFE,         // No value (side-effect only)
  kUnknown = 0xFF       // Type unknown at compile time
};

13.3.2 Type Categories

Types are classified into categories for operation dispatch:

Numeric Types (support arithmetic):

kInt64: 64-bit signed integer
kDouble: IEEE 754 double-precision floating point
kDecimal: Arbitrary precision decimal (128-bit coefficient)

Temporal Types:

kTimestamp: Microseconds since Unix epoch (no timezone)
kTimestampTZ: UTC microseconds with timezone awareness
kDate: Days since Unix epoch
kTime: Microseconds since midnight
kInterval: PostgreSQL-style (months + days + microseconds)

Container Types:

kArray: Ordered collection of heterogeneous values
kDocument: Key-value map (JSON object)

Special Types:

kCompositeRow: Zero-copy reference to multiple documents (for joins)
kAggState: Internal aggregation state

13.3.3 VMValue Structure

All values in CVM are represented using a 24-byte tagged union:

struct VMValue {
  CVMType type;       // 1 byte: type discriminant
  uint8_t flags;      // 1 byte: kFlagOwned, kFlagConst
  uint16_t reserved;  // 2 bytes: alignment padding
  uint32_t padding;   // 4 bytes: alignment padding

  union {             // 16 bytes: value storage
    bool bool_val;
    int64_t int64_val;
    double double_val;
    StringRef string_val;      // (ptr, len)
    ArrayRef array_val;        // (ptr, len, cap)
    Document* doc_val;
    CompositeRow* composite_row_val;
    TimestampVal timestamp_val;
    DateVal date_val;
    TimeVal time_val;
    IntervalVal interval_val;
    DecimalVal decimal_val;
    UUIDVal uuid_val;
    BinaryRef binary_val;
    void* ptr_val;
    uint8_t raw_bytes[16];
  };
};

static_assert(sizeof(VMValue) == 24, "Cache-efficient size");

The 24-byte size is carefully chosen for cache efficiency—three VMValues fit exactly in a single 64-byte cache line on most modern processors.

13.3.4 Temporal Value Representations

Timestamp:

timestamp = microseconds_{since\_epoch}

Where epoch is 1970-01-01 00:00:00 UTC.

Date:

date = days_{since\_epoch}

Time:

time = microseconds_{since\_midnight}

Interval (PostgreSQL-compatible):

struct IntervalVal {
  int32_t months;        // Total months
  int32_t days;          // Total days (separate for DST)
  int64_t microseconds;  // Remaining microseconds
};

The three-component interval representation follows PostgreSQL's design, allowing correct handling of month boundaries and daylight saving time transitions.

Decimal (128-bit):

struct DecimalVal {
  int64_t coefficient_high;  // Upper 64 bits
  int64_t coefficient_low;   // Lower 64 bits
};

13.4 Instruction Encoding

CVM uses fixed-width 32-bit instructions with six primary formats, enabling efficient decoding and cache-friendly code layout.

13.4.1 Instruction Formats

Format A: 3-Register (32 bits)

Loading diagram...

Format B: 2-Register + Imm16 (32 bits)

Loading diagram...

Format C: Jump (32 bits)

Loading diagram...

Format D: Extended Imm64 (96 bits)

Loading diagram...

Format A (3-operand register operations):

R[dst] = R[src1] OP R[src2]

Used for arithmetic, comparison, and logical operations.

Format B (2-operand with 16-bit immediate):

R[dst] = R[src] OP sign_extend(imm16)

Used for immediate operations and pool index references.

Format C (conditional jump):

if R[cond]: PC += offset * 4

Used for branches and loops. The 16-bit signed offset is measured in words (4 bytes), providing a range of approximately +/- 128KB.

Format D (64-bit immediate):

R[dst] = imm64

Used for loading large constants (8-byte instruction total).

Format E (unary operations):

R[dst] = OP R[src]

Used for type conversions, negation, and other single-operand operations.

Format F (no operands):

HALT, NOP, YIELD

Used for control operations with no data dependencies.

13.4.2 Extended Instruction Formats

For complex operations like composite rows, CVM uses extended formats:

Format G (register + pool index):

[Opcode:8][Dst:4][Reserved:4][PoolIdx:8][Reserved:8]

Used for batch constant instructions and pool references.

Format H (two-word extended):

Word 1: [0xFE:8][ExtOpcode:8][Dst:8][Src:8]
Word 2: [Operand1:16][Operand2:16]

Used for composite row operations requiring 16-bit operands.

13.4.3 Encoding/Decoding

Instructions are encoded and decoded using helper functions:

// Encoding
auto encode_format_a(Opcode op, uint8_t dst, uint8_t src1,
                     uint8_t src2, uint8_t flags) -> uint32_t {
  return (static_cast<uint32_t>(op) << 24) |
         (static_cast<uint32_t>(dst) << 20) |
         (static_cast<uint32_t>(src1) << 16) |
         (static_cast<uint32_t>(src2) << 12) |
         (static_cast<uint32_t>(flags) << 8);
}

// Decoding
auto decode_opcode(uint32_t word) -> Opcode {
  return static_cast<Opcode>(word >> 24);
}

auto decode_dst(uint32_t word) -> uint8_t {
  return (word >> 20) & 0xF;
}

13.5 Opcode System

CVM defines over 512 opcodes organized into functional categories.

13.5.1 Opcode Organization

0x00-0x0F: Data Movement
0x10-0x1F: Integer Arithmetic
0x20-0x2F: Floating Point Arithmetic
0x30-0x3F: Bitwise Operations
0x40-0x4F: Integer Comparison
0x50-0x57: Float Comparison
0x58-0x5F: String Comparison
0x60-0x6F: Logical Operations & Debug
0x70-0x7F: Control Flow
0x80-0x8F: Type Operations
0x90-0x9F: Field Access
0xA0-0xAF: Array Operations
0xB0-0xBF: String Operations
0xC0-0xCF: Aggregation
0xD0-0xDF: Query Buffers (Window, Hash, Sort, Set)
0xE0-0xEF: Working Tables & Functions
0xF0-0xF9: Cursor Operations
0xFA-0xFD: External Calls
0xFE: Extended Opcode Prefix
0xFF: Undefined

13.5.2 Data Movement Operations (0x00-0x0F)

Opcode	Mnemonic	Operation
0x00	NOP	No operation
0x01	MOVE	R[dst] = R[src]
0x02	MOVE_INT64	R[dst] = imm64
0x03	MOVE_FLOAT64	R[dst] = imm64 (as double)
0x04	MOVE_NULL	R[dst] = NULL
0x05	MOVE_TRUE	R[dst] = true
0x06	MOVE_FALSE	R[dst] = false
0x07	LOAD_CONST	R[dst] = constant_pool[imm16]
0x08	LOAD_STRING	R[dst] = string_pool[imm16]
0x09	LOAD_ARRAY	R[dst] = array_pool[imm16]
0x0A	LOAD_DOC	R[dst] = doc_pool[imm16]
0x0B	STORE_REG	temp_stack[imm16] = R[src]
0x0C	LOAD_REG	R[dst] = temp_stack[imm16]
0x0D	LOAD_PARAM	R[dst] = parameters[imm16]
0x0E	LOAD_INPUT	R[dst] = input_document

13.5.3 Arithmetic Operations (0x10-0x2F)

Integer Arithmetic (0x10-0x1F):

Opcode	Mnemonic	Operation
0x10	ADD_I64	R[dst] = R[src1] + R[src2]
0x11	SUB_I64	R[dst] = R[src1] - R[src2]
0x12	MUL_I64	R[dst] = R[src1] * R[src2]
0x13	DIV_I64	R[dst] = R[src1] / R[src2]
0x14	MOD_I64	R[dst] = R[src1] % R[src2]
0x15	NEG_I64	R[dst] = -R[src]
0x16	ABS_I64	R[dst] = abs(R[src])
0x17	INC_I64	R[dst] = R[src] + 1
0x18	DEC_I64	R[dst] = R[src] - 1

Floating Point Arithmetic (0x20-0x2F):

Opcode	Mnemonic	Operation
0x20	ADD_F64	R[dst] = R[src1] + R[src2]
0x21	SUB_F64	R[dst] = R[src1] - R[src2]
0x22	MUL_F64	R[dst] = R[src1] * R[src2]
0x23	DIV_F64	R[dst] = R[src1] / R[src2]
0x24	NEG_F64	R[dst] = -R[src]
0x25	ABS_F64	R[dst] = abs(R[src])
0x26	SQRT	R[dst] = sqrt(R[src])
0x27	POW	R[dst] = pow(R[src1], R[src2])
0x28	FLOOR	R[dst] = floor(R[src])
0x29	CEIL	R[dst] = ceil(R[src])
0x2A	ROUND	R[dst] = round(R[src])

13.5.4 Comparison Operations (0x40-0x5F)

Integer Comparison (0x40-0x4F):

Opcode	Mnemonic	Operation
0x40	CMP_EQ_I64	R[dst] = (R[src1] == R[src2])
0x41	CMP_NE_I64	R[dst] = (R[src1] != R[src2])
0x42	CMP_LT_I64	R[dst] = (R[src1] < R[src2])
0x43	CMP_LE_I64	R[dst] = (R[src1] <= R[src2])
0x44	CMP_GT_I64	R[dst] = (R[src1] > R[src2])
0x45	CMP_GE_I64	R[dst] = (R[src1] >= R[src2])

String Comparison (0x58-0x5F):

Opcode	Mnemonic	Operation
0x58	CMP_EQ_STR	R[dst] = (R[src1] == R[src2])
0x59	CMP_NE_STR	R[dst] = (R[src1] != R[src2])
0x5A	CMP_LT_STR	R[dst] = (R[src1] < R[src2])
0x5B	CMP_LE_STR	R[dst] = (R[src1] <= R[src2])

13.5.5 Control Flow Operations (0x70-0x7F)

Opcode	Mnemonic	Operation
0x70	JUMP	PC += offset * 4
0x71	JUMP_TRUE	if R[cond]: PC += offset * 4
0x72	JUMP_FALSE	if !R[cond]: PC += offset * 4
0x73	JUMP_NULL	if R[cond] is NULL: PC += offset * 4
0x74	JUMP_NOT_NULL	if R[cond] is not NULL: PC += offset * 4
0x75	CALL	Push frame, jump to function
0x76	RETURN	Pop frame, return value
0x77	HALT	Stop execution
0x78	YIELD	Yield for streaming

13.5.6 Field Access Operations (0x90-0x9F)

Opcode	Mnemonic	Operation
0x90	GET_FIELD	R[dst] = R[doc].field[pool_idx]
0x91	SET_FIELD	R[doc].field[pool_idx] = R[src]
0x92	HAS_FIELD	R[dst] = R[doc].has(field[pool_idx])
0x93	GET_NESTED	R[dst] = R[doc].path[pool_idx]
0x94	DELETE_FIELD	R[doc].delete(field[pool_idx])

In addition to the primary field access opcodes above, the extended opcode GET_OUTER_FIELD (ExtOp 0x85) reads a field from the outer query's current row rather than the current document. This opcode is emitted by the plan lowering pass when compiling correlated subqueries — column references whose table alias does not belong to the inner query's local alias set are lowered to GET_OUTER_FIELD instead of GET_FIELD. The interpreter resolves the field from the outer row context, supporting both Document and CompositeRow outer rows. This enables the CVM to execute correlated subqueries natively without falling back to the Volcano executor.

13.5.7 Aggregation Operations (0xC0-0xCF)

Single Aggregation (0xC0-0xC7):

Opcode	Mnemonic	Operation
0xC0	AGG_INIT	Initialize aggregation state
0xC1	AGG_ACCUMULATE	Accumulate value into state
0xC2	AGG_FINALIZE	Compute final aggregate result
0xC3	AGG_MERGE	Merge two aggregate states

Aggregation Table (0xC8-0xCF):

Opcode	Mnemonic	Operation
0xC8	AGG_TABLE_NEW	Create aggregation table
0xC9	AGG_GET_OR_CREATE	Get/create group state
0xCA	AGG_ITER_INIT	Initialize table iterator
0xCB	AGG_ITER_NEXT	Get next group
0xCC	AGG_TABLE_NEW_MULTI	Multi-function agg table
0xCD	AGG_STATE_AT	Access state at index
0xCE	AGG_ITER_HAS_NEXT	Check for more groups

13.5.8 Query Buffer Operations (0xD0-0xDF)

Window Buffer (0xD0-0xD3):

Opcode	Mnemonic	Operation
0xD0	WIN_BUF_NEW	Create window buffer
0xD1	WIN_BUF_ADD	Add row to window
0xD2	WIN_COMPUTE	Compute window function
0xD3	WIN_BUF_NEXT	Get next windowed row

Hash Table (0xD4-0xD7):

Opcode	Mnemonic	Operation
0xD4	HASH_TABLE_NEW	Create hash table
0xD5	HASH_TABLE_INSERT	Insert key-value pair
0xD6	HASH_TABLE_PROBE	Probe for key
0xD7	HASH_TABLE_DESTROY	Destroy hash table

Sort Buffer (0xD8-0xDB):

Opcode	Mnemonic	Operation
0xD8	SORT_BUF_NEW	Create sort buffer
0xD9	SORT_BUF_ADD	Add row to buffer
0xDA	SORT_BUF_NEXT	Get next sorted row
0xDB	SORT_BUF_DESTROY	Destroy sort buffer

Set Operations (0xDC-0xDF):

Opcode	Mnemonic	Operation
0xDC	SET_OP_NEW	Create set operation buffer
0xDD	SET_OP_ADD	Add row from source
0xDE	SET_OP_NEXT	Get next result row
0xDF	SET_OP_DESTROY	Destroy set buffer

13.5.9 Cursor Operations (0xF0-0xF7)

Opcode	Mnemonic	Operation
0xF0	CURSOR_OPEN	Open collection cursor
0xF1	CURSOR_NEXT	Get next document
0xF2	CURSOR_CLOSE	Close cursor
0xF3	CURSOR_RESET	Reset to beginning
0xF4	CURSOR_IS_VALID	Check if more rows
0xF5	EMIT_ROW	Emit result row
0xF6	CURSOR_TAKE_DOC	Move document (no copy)

13.5.10 Extended Opcodes (0xFE prefix)

Extended opcodes use a two-byte opcode sequence (0xFE + extended opcode):

Array/Object Iteration (0x01-0x0A):

ITER_ARRAY_BEGIN, ITER_ARRAY_NEXT, ITER_ARRAY_END
ITER_OBJECT_BEGIN, ITER_OBJECT_NEXT_KEY, ITER_OBJECT_NEXT_VAL, ITER_OBJECT_END
ITER_RANGE_BEGIN, ITER_RANGE_NEXT, ITER_RANGE_END

Document Construction (0x0B-0x12):

DOC_NEW, DOC_FROM_JSON, DOC_TO_JSON
DOC_MERGE, DOC_DEEP_COPY

Composite Row Operations (0x13-0x1C):

COMPOSITE_NEW, COMPOSITE_ADD_SLOT
COMPOSITE_GET_FIELD, COMPOSITE_GET_SLOT
COMPOSITE_MATERIALIZE, COMPOSITE_EMIT

Batch/Vectorized Operations (0x20-0x5F):

BATCH_SCAN_OPEN, BATCH_SCAN_NEXT, BATCH_EMIT
BATCH_ADD_I64, BATCH_CMP_LT_I64, etc.
BATCH_AND, BATCH_OR, BATCH_SELECT
BATCH_HASH, BATCH_HASH_BUILD, BATCH_HASH_PROBE

Parallel Execution (0x5C-0x67):

PARALLEL_SCAN_OPEN, PARALLEL_FILTER_EVAL
PARALLEL_PARTITION, PARALLEL_MERGE, PARALLEL_BARRIER

Outer Context Operations (0x85):

GET_OUTER_FIELD: Read a field from the outer query's current row for correlated subquery support

13.6 Register Architecture

CVM employs a register-based architecture with 24 registers organized into two files.

13.6.1 Register Organization

Loading diagram...

General Purpose Registers (R0-R15):

16 registers for scalar values
Can hold any CVMType
Used for integers, booleans, pointers

Float Registers (F0-F7):

8 registers for floating-point values
Optimized for double-precision arithmetic
Separate to avoid type checks in hot loops

13.6.2 Register Storage

class VMContext {
private:
  std::array<VMValue, 16> registers_;        // R0-R15
  std::array<VMValue, 8> float_registers_;   // F0-F7
  uint32_t pc_;                              // Program counter
};

13.6.3 Register Access

auto VMContext::get_register(uint8_t reg) const -> const VMValue& {
  assert(reg < 16);
  return registers_[reg];
}

void VMContext::set_register(uint8_t reg, VMValue value) {
  assert(reg < 16);
  registers_[reg] = std::move(value);
}

// Convenience accessors for common types
auto VMContext::get_int64(uint8_t reg) const -> int64_t {
  assert(registers_[reg].type == CVMType::kInt64);
  return registers_[reg].int64_val;
}

void VMContext::set_int64(uint8_t reg, int64_t value) {
  registers_[reg].type = CVMType::kInt64;
  registers_[reg].int64_val = value;
}

13.6.4 Register vs Stack Architecture

CVM's register-based design offers several advantages over stack machines:

Property	Register Machine	Stack Machine
Instruction count	Lower	Higher
Operand encoding	Explicit registers	Implicit stack
Code size	Larger instructions	Smaller instructions
Optimization	Easier	Harder
Dispatch overhead	Lower	Higher

For database workloads, the reduced instruction count and better optimization potential outweigh the slightly larger code size.

13.7 Execution Context

The VMContext maintains all runtime state during bytecode execution.

13.7.1 State Categories

Loading diagram...

13.7.2 Context Structure

class VMContext {
private:
  // === Registers ===
  std::array<VMValue, 16> registers_;
  std::array<VMValue, 8> float_registers_;
  uint32_t pc_;

  // === Control Flow ===
  std::vector<VMValue> operand_stack_;
  std::vector<CallFrame> call_stack_;

  // === Status ===
  VMStatus status_;
  VMFlags flags_;
  std::optional<VMError> error_;

  // === Module ===
  const BytecodeModule* module_;

  // === I/O ===
  VMValue input_document_;
  RowEmitCallback row_emit_callback_;
  uint64_t rows_emitted_;
  uint64_t rows_scanned_;

  // === Cursors ===
  std::array<std::unique_ptr<Cursor>, 16> cursors_;
  std::array<VMValue, 16> current_documents_;
  std::array<bool, 16> cursor_valid_;

  // === Query Buffers (spillable) ===
  std::array<SortBuffer, 4> sort_buffers_;
  std::array<HashTable, 4> hash_tables_;
  std::array<AggregationTable, 4> agg_tables_;
  std::array<WindowBuffer, 4> window_buffers_;
  std::array<SetOpBuffer, 4> set_op_buffers_;
  std::array<WorkingTableSlot, 8> working_tables_;

  // === Memory Management ===
  std::vector<std::unique_ptr<Document>> owned_documents_;
  std::vector<std::unique_ptr<std::string>> owned_strings_;
  std::vector<std::unique_ptr<std::vector<VMValue>>> owned_arrays_;
  std::vector<std::unique_ptr<CompositeRow>> owned_composite_rows_;

  // === Parameters ===
  std::array<VMValue, 256> parameters_;
  std::bitset<256> parameters_bound_;

  // === Statistics ===
  uint64_t instructions_executed_;
};

13.7.3 Status and Flags

Execution Status:

enum class VMStatus : uint8_t {
  kRunning = 0,     // Normal execution
  kHalted = 1,      // Execution completed
  kError = 2,       // Runtime error
  kBreakpoint = 3,  // Hit debugger breakpoint
  kYielded = 4      // Yielded for streaming
};

CPU Flags (set by comparisons):

struct VMFlags {
  bool zero = false;      // Result was zero/equal
  bool negative = false;  // Result was negative
  bool overflow = false;  // Arithmetic overflow
  bool carry = false;     // Carry/borrow
};

13.7.4 Resource Limits

Resource	Limit	Purpose
Call Stack	256 frames	Prevent stack overflow
Operand Stack	1024 values	Function argument passing
Cursors	16	Concurrent table scans
Sort Buffers	4	Concurrent ORDER BY
Hash Tables	4	Concurrent joins
Agg Tables	4	Concurrent GROUP BY
Window Buffers	4	Window functions
Set Op Buffers	4	UNION/INTERSECT/EXCEPT
Working Tables	8	Recursive CTEs
Parameters	256	Prepared statement bindings

13.8 Interpreter Implementation

The CVM interpreter uses computed goto (threaded code) for efficient dispatch.

13.8.1 Dispatch Table

class Interpreter {
private:
  // Dispatch table: 256 void* pointers to instruction handlers
  static void* dispatch_table_[256];

public:
  auto execute(VMContext& ctx) -> ExecutionResult;
};

13.8.2 Computed Goto Dispatch

The interpreter main loop uses computed goto for minimal dispatch overhead:

auto Interpreter::execute(VMContext& ctx) -> ExecutionResult {
  // Initialize dispatch table (once)
  static void* dispatch_table[] = {
    &&op_nop,          // 0x00
    &&op_move,         // 0x01
    &&op_move_int64,   // 0x02
    // ... 256 entries
  };

  // Fetch first instruction
  auto word = ctx.fetch_instruction();
  auto opcode = word >> 24;
  goto *dispatch_table[opcode];

op_nop:
  ctx.advance_pc(4);
  word = ctx.fetch_instruction();
  opcode = word >> 24;
  goto *dispatch_table[opcode];

op_move:
  {
    auto dst = (word >> 20) & 0xF;
    auto src = (word >> 16) & 0xF;
    ctx.set_register(dst, ctx.get_register(src));
  }
  ctx.advance_pc(4);
  word = ctx.fetch_instruction();
  opcode = word >> 24;
  goto *dispatch_table[opcode];

op_add_int64:
  {
    auto dst = (word >> 20) & 0xF;
    auto src1 = (word >> 16) & 0xF;
    auto src2 = (word >> 12) & 0xF;
    auto a = ctx.get_int64(src1);
    auto b = ctx.get_int64(src2);
    ctx.set_int64(dst, a + b);
  }
  ctx.advance_pc(4);
  word = ctx.fetch_instruction();
  opcode = word >> 24;
  goto *dispatch_table[opcode];

// ... handlers for all 256 opcodes

op_halt:
  ctx.set_status(VMStatus::kHalted);
  return ExecutionResult::kSuccess;
}

13.8.3 Why Computed Goto?

Computed goto provides significant performance benefits:

No Bounds Check: Direct table lookup, no range validation
Direct Jumps: Each handler jumps directly to the next
Better Branch Prediction: Dedicated indirect branch per handler
10-20% Faster: Measured improvement over switch-case

Comparison with Switch:

// Switch-based dispatch (slower)
while (running) {
  auto opcode = fetch_opcode();
  switch (opcode) {
    case OP_ADD: /* handler */ break;
    case OP_SUB: /* handler */ break;
    // ...
  }
}

// Computed goto (faster)
goto *dispatch_table[opcode];
op_add:
  /* handler */
  goto *dispatch_table[next_opcode];

The switch version has a single indirect branch that must predict across all opcodes. Computed goto gives each handler its own branch, allowing the branch predictor to learn patterns specific to that instruction.

13.8.4 Execution Configuration

struct InterpreterConfig {
  uint64_t max_instructions = 100'000'000'000;  // Anti-DoS limit
  bool enable_tracing = false;                  // Debug tracing
  bool enable_profiling = false;                // Performance profiling
};

13.8.5 Execution Result

enum class ExecutionResult : uint8_t {
  kSuccess,          // Completed normally
  kError,            // Runtime error occurred
  kBreakpoint,       // Hit debugger breakpoint
  kMaxInstructions,  // Exceeded instruction limit
  kYield             // Yielded for streaming
};

13.9 Specialized Execution Buffers

CVM provides specialized data structures for memory-intensive query operations, all capable of spilling to disk when memory is exhausted.

13.9.1 Sort Buffer

The sort buffer implements ORDER BY with automatic external sort fallback:

class SortBuffer {
public:
  // Field-based sorting
  void configure(const std::vector<SortComparator>& comparators);

  // Value-based sorting (for expression keys)
  void configure_with_values(size_t num_keys,
                             const std::vector<bool>& ascending,
                             const std::vector<bool>& nulls_first);

  void add(Document* row);
  void add_with_values(Document* row, std::vector<SortableValue> keys);

  auto next() -> Document*;  // Sorts on first call
  void destroy();

private:
  std::vector<Document*> rows_;
  std::vector<SortComparator> comparators_;
  size_t memory_limit_;
  bool sorted_ = false;
};

Spill Strategy: When memory is exceeded, the buffer:

Sorts current in-memory data
Writes sorted run to temporary file
Clears memory
Continues accepting rows
On iteration, performs merge sort across runs

13.9.2 Hash Table

The hash table supports hash joins and hash aggregation:

class HashTable {
public:
  void insert(const VMValue& key, Document* value);
  auto probe_first(const VMValue& key) -> Document*;
  auto probe_next() -> Document*;  // For multi-match
  void destroy();

private:
  std::unordered_multimap<int64_t, Document*> entries_;
  size_t memory_limit_;
};

Spill Strategy (Grace Hash Join): When memory is exceeded:

Partition data by hash(key) into N partitions
Spill largest partition to disk
Continue with remaining partitions in memory
Process spilled partitions recursively

13.9.3 Aggregation Table

The aggregation table implements GROUP BY with multiple aggregation functions:

class AggregationTable {
public:
  void configure(const std::vector<AggregateFunction>& functions);

  auto get_or_create(const VMValue& key)
      -> std::vector<std::unique_ptr<AggregateState>>*;

  void iter_init();
  auto iter_next() -> std::pair<VMValue*, std::vector<AggregateState*>*>;
  auto iter_has_next() const -> bool;

private:
  std::unordered_map<int64_t, AggGroupEntry> groups_;
  std::vector<AggregateFunction> functions_;
  Iterator current_iter_;
};

Supported Aggregate Functions:

enum class AggregateFunction : uint8_t {
  kCount = 0,       // COUNT(expr)
  kSum = 1,         // SUM(expr)
  kAvg = 2,         // AVG(expr)
  kMin = 3,         // MIN(expr)
  kMax = 4,         // MAX(expr)
  kCountStar = 5,   // COUNT(*)
  kStddevPop = 6,   // STDDEV_POP(expr)
  kStddevSamp = 7,  // STDDEV_SAMP(expr)
  kVarPop = 8,      // VAR_POP(expr)
  kVarSamp = 9,     // VAR_SAMP(expr)
  kFirst = 10,      // FIRST(expr)
  kLast = 11,       // LAST(expr)
  kStringAgg = 12,  // STRING_AGG(expr, delimiter)
  kArrayAgg = 13    // ARRAY_AGG(expr)
};

13.9.4 Window Buffer

The window buffer implements window functions:

class WindowBuffer {
public:
  void add(Document* row);
  void compute(uint16_t window_spec_index);
  auto next() -> Document*;
  auto get_results() const -> const WindowResultMap*;

private:
  std::vector<Document*> rows_;
  std::vector<WindowSpec> window_specs_;
  WindowResultMap results_;
};

13.9.5 Set Operation Buffer

The set operation buffer implements UNION, INTERSECT, and EXCEPT:

class SetOpBuffer {
public:
  enum class Type { kUnion, kIntersect, kExcept };

  void configure(Type type, bool all);
  void add(const Document* doc, uint8_t source);  // source: 0=left, 1=right
  auto next() -> Document*;

private:
  Type type_;
  bool all_;  // ALL variant (no dedup)
  std::unordered_set<int64_t> seen_;
  std::vector<Document> results_;
};

13.9.6 Working Table (Recursive CTE)

The working table implements recursive common table expressions:

class WorkingTable {
public:
  void add(const Document* doc);
  void add(Document&& doc);  // Move semantics
  void swap(WorkingTable& other);
  auto scan_next() -> Document*;
  auto is_empty() const -> bool;
  void scan_reset();

private:
  std::vector<Document> rows_;
  size_t scan_position_ = 0;
};

Recursive CTE Execution Pattern:

1. Execute anchor query, add to working_table_0
2. While working_table_0 is not empty:
   a. Swap working_table_0 and working_table_1
   b. Clear working_table_0
   c. For each row in working_table_1:
      - Execute recursive query
      - Add results to working_table_0
   d. Emit rows from working_table_1

13.10 Composite Rows (Zero-Copy Joins)

Composite rows enable efficient multi-table joins without deep copying documents.

13.10.1 Motivation

In a traditional join implementation:

// Deep copy approach (expensive)
auto result = Document {};
for (const auto& field : left_doc) {
  result.set(field.name, field.value.deep_copy());
}
for (const auto& field : right_doc) {
  result.set(field.name, field.value.deep_copy());
}

This involves:

Memory allocation for new document
Deep copying all field values
String duplication
Array/nested document copying

13.10.2 Composite Row Structure

class CompositeRow {
public:
  void add_slot(const std::string& alias, Document* doc);
  auto get_field(const std::string& qualified_name) const -> const Value*;
  auto get_slot(uint8_t slot_index, uint16_t field_index) const -> const Value*;
  auto materialize() const -> Document;  // Create actual document

private:
  struct Slot {
    std::string alias;
    Document* doc;  // Non-owning pointer
  };
  std::vector<Slot> slots_;
};

13.10.3 Composite Row Operations

Opcode	Operation
COMPOSITE_NEW	Create empty CompositeRow
COMPOSITE_ADD_SLOT	Add aliased document reference
COMPOSITE_GET_FIELD	Get field by qualified name
COMPOSITE_GET_SLOT	Get field by slot + index
COMPOSITE_MATERIALIZE	Create concrete Document
COMPOSITE_EMIT	Emit with callback
COMPOSITE_CLEAR	Clear slots for reuse

13.10.4 Performance Impact

For a 3-way join producing 100,000 result rows:

Approach	Memory	Time
Deep Copy	~800 MB	~500 ms
Composite Row	~8 MB	~50 ms

Composite rows reduce both memory usage and execution time by approximately 10x for join-heavy queries.

13.11 Built-in Functions

CVM provides 200+ built-in functions accessible via the CALL_BUILTIN opcode.

13.11.1 Function Categories

Category	ID Range	Examples
Mathematical	0x0000-0x00FF	abs, floor, ceil, sqrt, pow, log
String	0x0100-0x01FF	length, upper, lower, substring, replace
Type Conversion	0x0200-0x02FF	to_int, to_float, to_string, typeof
Conditional	0x0300-0x03FF	min, max, coalesce, nullif, greatest
Array	0x0400-0x04FF	length, element, append, reverse, sort
JSON/Document	0x0500-0x05FF	json_extract, json_set, json_merge
Date/Time	0x0600-0x06FF	now, date_add, date_diff, extract

13.11.2 Native Operations

Some operations are implemented as dedicated opcodes for maximum performance:

Arithmetic: ADD, SUB, MUL, DIV, MOD, NEG, ABS Comparison: EQ, NE, LT, LE, GT, GE Bitwise: AND, OR, XOR, NOT, SHL, SHR Logical: AND, OR, NOT (with short-circuit) Type Casting: int64 <-> float64, string <-> numeric Field Access: GET_FIELD, SET_FIELD, HAS_FIELD Array Operations: LENGTH, GET, SET, PUSH, POP String Operations: LENGTH, CONCAT, SUBSTRING, LIKE

13.11.3 Aggregate State Machine

Aggregate functions use a three-phase state machine:

Loading diagram...

Aggregate State Structure:

class AggregateState {
public:
  void reset();
  void accumulate(const VMValue& value);
  auto finalize() -> VMValue;
  void merge(const AggregateState& other);

private:
  AggregateFunction function_;
  int64_t count_ = 0;
  int64_t sum_int64_ = 0;
  double sum_double_ = 0.0;
  double sum_squared_ = 0.0;  // For STDDEV/VARIANCE
  double mean_ = 0.0;         // Welford's algorithm
  VMValue min_value_, max_value_;
  VMValue first_value_, last_value_;
  std::string string_result_;  // STRING_AGG
  std::vector<VMValue> array_values_;  // ARRAY_AGG
};

13.12 Cursor System

Cursors provide the interface between CVM and the storage layer.

13.12.1 Cursor Operations

// Open cursor on collection
auto open_cursor(uint8_t slot, const std::string& name,
                 bool is_cte = false) -> bool;

// Advance to next document
auto cursor_next(uint8_t slot) -> VMValue;

// Check if cursor has more rows
auto cursor_is_valid(uint8_t slot) const -> bool;

// Reset cursor to beginning (for nested loops)
void cursor_reset(uint8_t slot);

// Close cursor and release resources
void cursor_close(uint8_t slot);

// Move document without copying
auto cursor_take_doc(uint8_t slot) -> VMValue;

13.12.2 Cursor Lifecycle

Loading diagram...

13.12.3 Iterator Operations

For UNNEST and lateral joins, CVM provides iterators:

Array Iterator:

ITER_ARRAY_BEGIN R1, R0      // Create iterator from array in R0
ITER_ARRAY_NEXT R2, R1, done // Get next element or jump to done
// Process R2
JUMP loop
done:
ITER_ARRAY_END R1            // Clean up

Object Iterator:

ITER_OBJECT_BEGIN R1, R0     // Create iterator from document
ITER_OBJECT_NEXT_KEY R2, R1, done  // Get next key
ITER_OBJECT_NEXT_VAL R3, R1        // Get value for current key
// Process key R2 and value R3
JUMP loop
done:
ITER_OBJECT_END R1

Range Iterator:

ITER_RANGE_BEGIN R1, start, end, step
ITER_RANGE_NEXT R2, R1, done  // Get next value
// Process R2
JUMP loop
done:
ITER_RANGE_END R1

13.13 Error Handling

CVM provides structured error handling for runtime errors.

13.13.1 Error Structure

struct VMError {
  std::string message;       // Error description
  uint32_t pc;               // Program counter at error
  uint32_t source_line;      // Source line if debug info
  Opcode opcode;             // Instruction that failed
};

13.13.2 Exception Handling Opcodes

Extended opcodes provide try-catch semantics:

Opcode	Operation
EXCEPTION_PUSH	Push exception handler
EXCEPTION_POP	Pop exception handler
RAISE_EXCEPTION	Raise exception
RERAISE	Re-raise current exception
GET_DIAGNOSTICS	Get exception details

13.13.3 Error Recovery

// Check for error after each instruction
if (ctx.status() == VMStatus::kError) {
  auto& error = ctx.error();
  log_error("CVM Error at PC {}: {} (opcode 0x{:02X})",
            error.pc, error.message,
            static_cast<uint8_t>(error.opcode));
  return ExecutionResult::kError;
}

13.14 Debugging and Profiling

CVM includes comprehensive debugging support.

13.14.1 Debug Operations

Opcode	Operation
DEBUG_PRINT	Print register value
DEBUG_BREAK	Trigger breakpoint
DEBUG_TRACE	Emit trace event
DEBUG_DUMP	Dump VM state
DEBUG_ASSERT	Assert condition
DEBUG_PROFILE	Profile section marker

13.14.2 Debug Callbacks

using DebugPrintCallback = std::function<void(const VMValue&)>;
using DebugTraceCallback = std::function<void(
    uint16_t label_index, const VMValue& value, uint32_t pc)>;
using DebugDumpCallback = std::function<void(
    uint32_t pc, std::span<const VMValue> registers)>;
using BreakpointCallback = std::function<bool(uint32_t pc)>;

13.14.3 Execution Statistics

auto instructions_executed() const -> uint64_t;
auto rows_emitted() const -> uint64_t;
auto rows_scanned() const -> uint64_t;

13.15 Summary

The Cognica Virtual Machine provides a high-performance bytecode execution environment optimized for database query processing:

Register-Based Architecture: 16 general-purpose + 8 floating-point registers minimize memory traffic and enable efficient optimization.
Rich Type System: 16 types covering SQL data types, JSON values, temporal types, and internal execution types.
Query-Centric Instructions: 512+ opcodes providing first-class support for aggregation, sorting, joins, window functions, and set operations.
Computed Goto Dispatch: Threaded interpretation delivers 10-20% better performance than switch-based dispatch.
Spillable Buffers: Sort, hash, and aggregation buffers automatically spill to disk when memory is exhausted.
Zero-Copy Joins: Composite rows eliminate deep copying during multi-table joins.
Comprehensive Cursor System: Efficient iteration over collections with support for nested loops and resets.

The CVM architecture strikes a balance between the flexibility of interpretation and the performance of native code, providing a clean compilation target for query optimization while maintaining reasonable execution speed.