Appendix A: CVM Opcode Reference
This appendix provides a comprehensive reference for the Cognica Virtual Machine (CVM) instruction set architecture, including all opcodes, their encodings, operand formats, and execution semantics.
A.1 Instruction Format Overview
The CVM uses a fixed-width 32-bit instruction encoding optimized for cache efficiency and decode simplicity. Extended instructions requiring 64-bit immediates use an additional 8-byte word.
A.1.1 Format Types
The CVM defines seven instruction formats:
| Format | Description | Encoding |
|---|---|---|
| A | 3-operand register | [Opcode:8][Dst:4][Src1:4][Src2:4][Flags:4][Reserved:8] |
| B | 2-operand with immediate | [Opcode:8][Dst:4][Src:4][Imm16:16] |
| C | Conditional branch | [Opcode:8][Cond:4][Reserved:4][Offset16:16] |
| D | Extended 64-bit immediate | [Opcode:8][Dst:4][Reserved:20] + [Imm64:64] |
| E | Single/dual operand (unary) | [Opcode:8][Dst:4][Src:4][Reserved:16] |
| F | No operands | [Opcode:8][Reserved:24] |
| G | Register with pool index | [Opcode:8][Dst:4][Reserved:4][PoolIdx:8][Reserved:8] |
| H | Extended composite row | [0xFE:8][ExtOp:8][Dst:8][Src:8] + [Op1:16][Op2:16] |
A.1.2 Register Allocation
The CVM provides a virtualized register file with the following conventions:
| Register Range | Purpose |
|---|---|
| R0-R15 | General purpose registers |
| R16+ | Spill registers (allocated by register allocator) |
| F0-F15 | Floating-point registers (aliased to R0-R15 for bit-level operations) |
Insight
- The CVM uses a computed-goto dispatch mechanism for efficient opcode execution, achieving 2-5ns per instruction on modern processors.
- Register allocation is performed at compile time by the
RegisterAllocationPass, using a linear-scan algorithm for hot paths.- The 4-bit register fields support 16 logical registers; spill slots extend this to 256 virtual registers.
A.2 Opcode Space Layout
The 256-entry opcode space is organized into functional categories:
| Range | Category |
|---|---|
| 0x00-0x0F | Data Movement Operations |
| 0x10-0x1F | Integer Arithmetic |
| 0x20-0x2F | Floating-Point Arithmetic |
| 0x30-0x3F | Bitwise Operations |
| 0x40-0x4F | Integer Comparisons |
| 0x50-0x57 | Float Comparisons |
| 0x58-0x5F | String Comparisons |
| 0x60-0x6F | Logical and Debug Operations |
| 0x70-0x7F | Control Flow |
| 0x80-0x8F | Type Operations |
| 0x90-0x9F | Field Access |
| 0xA0-0xAF | Array Operations |
| 0xB0-0xBF | String Operations |
| 0xC0-0xCF | Aggregation Operations |
| 0xD0-0xDF | Query Buffer Operations (Window, Hash, Sort) |
| 0xE0-0xE7 | Working Table and Sort Extended |
| 0xE8-0xEF | Function Call Operations |
| 0xF0-0xF7 | Cursor Operations |
| 0xF8-0xFC | Subquery, External, Table Functions |
| 0xFD-0xFF | Error, Extended, Undefined |
A.3 Data Movement Operations (0x00-0x0F)
Data movement operations transfer values between registers, memory, and the constant pool.
A.3.1 Basic Movement
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x00 | NOP | F | No operation |
| 0x01 | MOVE | E | R[dst] = R[src] |
| 0x02 | MOVE_I64 | D | R[dst] = imm64 |
| 0x03 | MOVE_F64 | D | F[dst] = imm64 (bit reinterpret) |
| 0x04 | MOVE_NULL | E | R[dst] = NULL |
| 0x05 | MOVE_TRUE | E | R[dst] = true |
| 0x06 | MOVE_FALSE | E | R[dst] = false |
| 0x07 | LOAD_CONST | B | R[dst] = constant_pool[imm16] |
| 0x08 | COPY | E | R[dst] = deep_copy(R[src]) |
| 0x09 | SWAP | E | swap(R[dst], R[src]) |
| 0x0A | MOVE_IMM | B | R[dst] = sign_extend(imm16) |
| 0x0B | MOVE_F2R | E | R[dst] = F[src] (float to GPR) |
| 0x0C | MOVE_R2F | E | F[dst] = R[src] (GPR to float) |
| 0x0D | LOAD_PARAM | B | R[dst] = context.get_parameter(imm16) |
Example: Loading a string constant
LOAD_CONST R3, 42 ; R3 = pool[42] (string "hello")
A.4 Integer Arithmetic Operations (0x10-0x1F)
Integer arithmetic operates on 64-bit signed integers stored in general-purpose registers.
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x10 | ADD_I64 | A | R[dst] = R[src1] + R[src2] |
| 0x11 | SUB_I64 | A | R[dst] = R[src1] - R[src2] |
| 0x12 | MUL_I64 | A | R[dst] = R[src1] * R[src2] |
| 0x13 | DIV_I64 | A | R[dst] = R[src1] / R[src2] |
| 0x14 | MOD_I64 | A | R[dst] = R[src1] % R[src2] |
| 0x15 | NEG_I64 | E | R[dst] = -R[src] |
| 0x16 | ABS_I64 | E | R[dst] = abs(R[src]) |
| 0x17 | ADD_I64_IMM | B | R[dst] = R[src] + sign_extend(imm16) |
| 0x18 | SUB_I64_IMM | B | R[dst] = R[src] - sign_extend(imm16) |
| 0x19 | MUL_I64_IMM | B | R[dst] = R[src] * sign_extend(imm16) |
| 0x1A | INC_I64 | E | R[dst] = R[src] + 1 |
| 0x1B | DEC_I64 | E | R[dst] = R[src] - 1 |
Overflow Semantics: Integer overflow wraps according to two's complement arithmetic. Division by zero raises an error.
A.5 Floating-Point Arithmetic (0x20-0x2F)
Floating-point operations follow IEEE 754 double precision semantics.
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x20 | ADD_F64 | A | F[dst] = F[src1] + F[src2] |
| 0x21 | SUB_F64 | A | F[dst] = F[src1] - F[src2] |
| 0x22 | MUL_F64 | A | F[dst] = F[src1] * F[src2] |
| 0x23 | DIV_F64 | A | F[dst] = F[src1] / F[src2] |
| 0x24 | NEG_F64 | E | F[dst] = -F[src] |
| 0x25 | ABS_F64 | E | F[dst] = abs(F[src]) |
| 0x26 | SQRT_F64 | E | F[dst] = sqrt(F[src]) |
| 0x27 | FLOOR_F64 | E | F[dst] = floor(F[src]) |
| 0x28 | CEIL_F64 | E | F[dst] = ceil(F[src]) |
| 0x29 | ROUND_F64 | E | F[dst] = round(F[src]) |
| 0x2A | POW_F64 | A | F[dst] = pow(F[src1], F[src2]) |
| 0x2B | LOG_F64 | E | F[dst] = log(F[src]) |
| 0x2C | LOG10_F64 | E | F[dst] = log10(F[src]) |
| 0x2D | EXP_F64 | E | F[dst] = exp(F[src]) |
| 0x2E | MOD_F64 | A | F[dst] = fmod(F[src1], F[src2]) |
A.6 Bitwise Operations (0x30-0x3F)
Bitwise operations manipulate 64-bit integers at the bit level.
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x30 | AND_I64 | A | R[dst] = R[src1] & R[src2] |
| 0x31 | OR_I64 | A | R[dst] = R[src1] | R[src2] |
| 0x32 | XOR_I64 | A | R[dst] = R[src1] ^ R[src2] |
| 0x33 | NOT_I64 | E | R[dst] = ~R[src] |
| 0x34 | SHL_I64 | A | R[dst] = R[src1] << R[src2] |
| 0x35 | SHR_I64 | A | R[dst] = R[src1] >> R[src2] (logical) |
| 0x36 | SAR_I64 | A | R[dst] = R[src1] >> R[src2] (arithmetic) |
| 0x37 | SHL_I64_IMM | B | R[dst] = R[src] << imm16 |
| 0x38 | SHR_I64_IMM | B | R[dst] = R[src] >> imm16 (logical) |
| 0x39 | SAR_I64_IMM | B | R[dst] = R[src] >> imm16 (arithmetic) |
| 0x3A | AND_I64_IMM | B | R[dst] = R[src] & imm16 |
| 0x3B | OR_I64_IMM | B | R[dst] = R[src] | imm16 |
A.7 Comparison Operations (0x40-0x5F)
Comparison operations produce boolean results. The CVM supports type-specialized comparisons for optimal performance.
A.7.1 Integer Comparisons (0x40-0x4F)
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x40 | CMP_EQ_I64 | A | R[dst] = (R[src1] == R[src2]) |
| 0x41 | CMP_NE_I64 | A | R[dst] = (R[src1] != R[src2]) |
| 0x42 | CMP_LT_I64 | A | R[dst] = (R[src1] < R[src2]) |
| 0x43 | CMP_LE_I64 | A | R[dst] = (R[src1] <= R[src2]) |
| 0x44 | CMP_GT_I64 | A | R[dst] = (R[src1] > R[src2]) |
| 0x45 | CMP_GE_I64 | A | R[dst] = (R[src1] >= R[src2]) |
| 0x46 | CMP_EQ_I64_IMM | B | R[dst] = (R[src] == imm16) |
| 0x47 | CMP_NE_I64_IMM | B | R[dst] = (R[src] != imm16) |
| 0x48 | CMP_LT_I64_IMM | B | R[dst] = (R[src] < imm16) |
| 0x49 | CMP_LE_I64_IMM | B | R[dst] = (R[src] <= imm16) |
| 0x4A | CMP_GT_I64_IMM | B | R[dst] = (R[src] > imm16) |
| 0x4B | CMP_GE_I64_IMM | B | R[dst] = (R[src] >= imm16) |
| 0x4C | CMP_LT_POLY | A | Runtime type dispatch for < |
| 0x4D | CMP_LE_POLY | A | Runtime type dispatch for <= |
| 0x4E | CMP_GT_POLY | A | Runtime type dispatch for > |
| 0x4F | CMP_GE_POLY | A | Runtime type dispatch for >= |
A.7.2 Float Comparisons (0x50-0x57)
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x50 | CMP_EQ_F64 | A | R[dst] = (F[src1] == F[src2]) |
| 0x51 | CMP_NE_F64 | A | R[dst] = (F[src1] != F[src2]) |
| 0x52 | CMP_LT_F64 | A | R[dst] = (F[src1] < F[src2]) |
| 0x53 | CMP_LE_F64 | A | R[dst] = (F[src1] <= F[src2]) |
| 0x54 | CMP_GT_F64 | A | R[dst] = (F[src1] > F[src2]) |
| 0x55 | CMP_GE_F64 | A | R[dst] = (F[src1] >= F[src2]) |
A.7.3 String Comparisons (0x58-0x5F)
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x58 | CMP_EQ_STR | A | Lexicographic equality |
| 0x59 | CMP_NE_STR | A | Lexicographic inequality |
| 0x5A | CMP_LT_STR | A | Lexicographic less-than |
| 0x5B | CMP_LE_STR | A | Lexicographic less-or-equal |
| 0x5C | CMP_GT_STR | A | Lexicographic greater-than |
| 0x5D | CMP_GE_STR | A | Lexicographic greater-or-equal |
| 0x5E | CMP_EQ_POLY | A | Runtime type dispatch for == |
| 0x5F | CMP_NE_POLY | A | Runtime type dispatch for != |
A.8 Logical and Debug Operations (0x60-0x6F)
A.8.1 Logical Operations (0x60-0x65)
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x60 | AND | A | R[dst] = R[src1] && R[src2] |
| 0x61 | OR | A | R[dst] = R[src1] || R[src2] |
| 0x62 | NOT | E | R[dst] = !R[src] |
| 0x63 | AND_SC | C | Short-circuit: if !R[cond], skip |
| 0x64 | OR_SC | C | Short-circuit: if R[cond], skip |
| 0x65 | XOR | A | R[dst] = R[src1] XOR R[src2] |
A.8.2 Debug Operations (0x66-0x6B)
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x66 | DBG_PRINT | E | Print R[src] to debug log |
| 0x67 | DBG_BREAK | F | Debugger breakpoint |
| 0x68 | DBG_TRACE | B | Trace with label imm16 |
| 0x69 | DBG_DUMP | F | Dump VM state |
| 0x6A | DBG_ASSERT | E | Assert R[src] is truthy |
| 0x6B | DBG_PROFILE | B | Profile section marker |
A.9 Control Flow Operations (0x70-0x7F)
Control flow operations manage the program counter and function calls.
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x70 | JMP | C | PC += offset16 (unconditional) |
| 0x71 | JMP_TRUE | C | if R[cond]: PC += offset16 |
| 0x72 | JMP_FALSE | C | if !R[cond]: PC += offset16 |
| 0x73 | JMP_NULL | C | if R[cond] is NULL: PC += offset16 |
| 0x74 | JMP_NOT_NULL | C | if R[cond] is not NULL: PC += offset16 |
| 0x75 | JMP_ABS | D | PC = imm32 (absolute) |
| 0x76 | CALL | B | Push frame, PC = target |
| 0x77 | RET | F | Return from function (no value) |
| 0x78 | RET_VAL | E | Return R[src] |
| 0x79 | HALT | F | Stop execution |
| 0x7A | JMP_ZERO | C | if R[cond] == 0: PC += offset16 |
| 0x7B | JMP_NOT_ZERO | C | if R[cond] != 0: PC += offset16 |
| 0x7C | RET_NEXT | E | SRF: Add R[src] to result accumulator |
| 0x7D | RET_QUERY | B | SRF: Execute query pool[imm16], add results |
Branch Offset Encoding: The 16-bit signed offset is relative to the instruction following the branch, measured in 4-byte instruction units.
A.10 Type Operations (0x80-0x8F)
Type operations handle runtime type checking, casting, and NULL handling.
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x80 | TYPEOF | E | R[dst] = typeof(R[src]) as type ID |
| 0x81 | CAST_I64_F64 | E | F[dst] = (double)R[src] |
| 0x82 | CAST_F64_I64 | E | R[dst] = (int64_t)F[src] |
| 0x83 | CAST_STR_I64 | E | R[dst] = parse_int64(R[src]) |
| 0x84 | CAST_STR_F64 | E | F[dst] = parse_double(R[src]) |
| 0x85 | CAST_I64_STR | E | R[dst] = to_string(R[src]) |
| 0x86 | CAST_F64_STR | E | R[dst] = to_string(F[src]) |
| 0x87 | CAST_BOOL_I64 | E | R[dst] = R[src] ? 1 : 0 |
| 0x88 | CAST_I64_BOOL | E | R[dst] = R[src] != 0 |
| 0x89 | IS_NULL | E | R[dst] = (R[src] is NULL) |
| 0x8A | IS_NOT_NULL | E | R[dst] = (R[src] is not NULL) |
| 0x8B | COALESCE | A | R[dst] = R[src1] ?? R[src2] |
| 0x8C | NULLIF | A | R[dst] = (R[src1]==R[src2]) ? NULL : R[src1] |
| 0x8D | CAST_BOOL_STR | E | R[dst] = R[src] ? "true" : "false" |
| 0x8E | CAST_STR_BOOL | E | R[dst] = parse_bool(R[src]) |
| 0x8F | CAST | B | R[dst] = cast(R[src], target_type) |
A.11 Field Access Operations (0x90-0x9F)
Field access operations extract and modify document fields.
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x90 | GET_FIELD | B | R[dst] = doc.field[pool[imm16]] |
| 0x91 | GET_FIELD_DYN | A | R[dst] = doc.field[R[src]] (dynamic) |
| 0x92 | SET_FIELD | B | doc.field[pool[imm16]] = R[src] |
| 0x93 | HAS_FIELD | B | R[dst] = doc.has(pool[imm16]) |
| 0x94 | DEL_FIELD | B | doc.remove(pool[imm16]) |
| 0x95 | GET_NESTED | B | R[dst] = doc.path(pool[imm16]) |
| 0x96 | SET_NESTED | B | doc.path(pool[imm16]) = R[src] |
| 0x97 | FIELD_COUNT | E | R[dst] = doc.field_count() |
| 0x98 | FIELD_NAMES | E | R[dst] = doc.field_names() as array |
| 0x99 | GET_DOC | E | R[dst] = context.input_document |
| 0x9A | GET_FIELD_IDX | B | R[dst] = doc.field_by_index(imm16) |
A.12 Array Operations (0xA0-0xAF)
Array operations manipulate ordered collections of values.
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0xA0 | ARR_NEW | E | R[dst] = new empty array |
| 0xA1 | ARR_LEN | E | R[dst] = R[src].length |
| 0xA2 | ARR_GET | A | R[dst] = R[src1][R[src2]] |
| 0xA3 | ARR_GET_IMM | B | R[dst] = R[src][imm16] |
| 0xA4 | ARR_SET | A | R[src1][R[src2]] = R[dst] |
| 0xA5 | ARR_SET_IMM | B | R[src][imm16] = R[dst] |
| 0xA6 | ARR_PUSH | A | R[dst].push(R[src]) |
| 0xA7 | ARR_POP | E | R[dst] = R[src].pop() |
| 0xA8 | ARR_SLICE | A | R[dst] = R[src1].slice(R[src2], R[flags]) |
| 0xA9 | ARR_CONCAT | A | R[dst] = R[src1].concat(R[src2]) |
| 0xAA | ARR_CONTAINS | A | R[dst] = R[src1].contains(R[src2]) |
| 0xAB | ARR_INDEXOF | A | R[dst] = R[src1].indexOf(R[src2]) |
| 0xAC | ARR_REVERSE | E | R[dst] = R[src].reverse() |
| 0xAD | ARR_SORT | E | R[dst] = R[src].sort() |
A.13 String Operations (0xB0-0xBF)
String operations handle UTF-8 encoded text.
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0xB0 | STR_LEN | E | R[dst] = R[src].length |
| 0xB1 | STR_CONCAT | A | R[dst] = R[src1] + R[src2] |
| 0xB2 | STR_SUBSTR | A | R[dst] = R[src1].substr(R[src2], R[flags]) |
| 0xB3 | STR_UPPER | E | R[dst] = R[src].toUpperCase() |
| 0xB4 | STR_LOWER | E | R[dst] = R[src].toLowerCase() |
| 0xB5 | STR_TRIM | E | R[dst] = R[src].trim() |
| 0xB6 | STR_LIKE | A | R[dst] = R[src1] LIKE R[src2] |
| 0xB7 | STR_ILIKE | A | R[dst] = R[src1] ILIKE R[src2] |
| 0xB8 | STR_REGEX | A | R[dst] = R[src1] ~ R[src2] (regex) |
| 0xB9 | STR_REPLACE | A | R[dst] = R[src1].replace(R[src2], R[flags]) |
| 0xBA | STR_SPLIT | A | R[dst] = R[src1].split(R[src2]) |
| 0xBB | STR_STARTS | A | R[dst] = R[src1].startsWith(R[src2]) |
| 0xBC | STR_ENDS | A | R[dst] = R[src1].endsWith(R[src2]) |
| 0xBD | STR_CONTAINS | A | R[dst] = R[src1].contains(R[src2]) |
| 0xBE | STR_INDEXOF | A | R[dst] = R[src1].indexOf(R[src2]) |
| 0xBF | STR_WILDCARD | A | R[dst] = R[src1] matches R[src2] (wildcard) |
A.14 Aggregation Operations (0xC0-0xCF)
Aggregation operations support SQL aggregate functions and GROUP BY processing.
A.14.1 Single Aggregation (0xC0-0xC7)
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0xC0 | AGG_INIT | B | R[dst] = new_agg_state(type=imm16) |
| 0xC1 | AGG_ACCUM | A | R[dst].accumulate(R[src]) |
| 0xC2 | AGG_ACCUM_COND | A | if R[src2]: R[dst].accumulate(R[src1]) |
| 0xC3 | AGG_FINAL | E | R[dst] = R[src].finalize() |
| 0xC4 | AGG_MERGE | A | R[dst].merge(R[src]) |
| 0xC5 | AGG_RESET | E | R[dst].reset() |
| 0xC6 | AGG_COUNT | E | R[dst] = R[src].count() |
| 0xC7 | AGG_SUM | E | R[dst] = R[src].sum() |
A.14.2 Aggregation Tables (0xC8-0xCF)
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0xC8 | AGG_TBL_NEW | B | Create aggregation table |
| 0xC9 | AGG_GET_CREATE | A | Get/create group for key |
| 0xCA | AGG_ITER_INIT | E | Initialize group iterator |
| 0xCB | AGG_ITER_NEXT | E | Get next (key, states) pair |
| 0xCC | AGG_TBL_NEW_MULTI | B | Create multi-function table |
| 0xCD | AGG_STATE_AT | B | Get state at index |
| 0xCE | AGG_ITER_HAS_NEXT | E | Check if more groups |
Aggregation Function Types:
| ID | Function | Description |
|---|---|---|
| 0 | COUNT | Count non-NULL values |
| 1 | SUM | Sum of values |
| 2 | AVG | Average (sum/count) |
| 3 | MIN | Minimum value |
| 4 | MAX | Maximum value |
| 5 | COUNT(*) | Count all rows |
| 6 | STDDEV_POP | Population standard deviation |
| 7 | STDDEV_SAMP | Sample standard deviation |
| 8 | VAR_POP | Population variance |
| 9 | VAR_SAMP | Sample variance |
| 10 | FIRST | First non-NULL value |
| 11 | LAST | Last non-NULL value |
| 12 | STRING_AGG | Concatenate strings |
| 13 | ARRAY_AGG | Collect into array |
A.15 Query Buffer Operations (0xD0-0xDF)
Query buffer operations support window functions, hash joins, sorting, and set operations.
A.15.1 Window Buffer (0xD0-0xD3)
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0xD0 | WIN_NEW | E | Create window buffer |
| 0xD1 | WIN_ADD | E | Add row to window buffer |
| 0xD2 | WIN_COMPUTE | B | Compute window functions |
| 0xD3 | WIN_NEXT | E | Get next row with results |
A.15.2 Hash Table (0xD4-0xD7)
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0xD4 | HT_NEW | E | Create hash table |
| 0xD5 | HT_INSERT | A | Insert (key, value) |
| 0xD6 | HT_PROBE | A | Lookup key, get matches |
| 0xD7 | HT_DESTROY | E | Destroy hash table |
A.15.3 Sort Buffer (0xD8-0xDB)
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0xD8 | SORT_NEW | B | Create sort buffer |
| 0xD9 | SORT_ADD | E | Add row to buffer |
| 0xDA | SORT_NEXT | E | Get next sorted row |
| 0xDB | SORT_DESTROY | E | Destroy sort buffer |
A.15.4 Set Operations (0xDC-0xDF)
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0xDC | SET_OP_NEW | B | Create set operation buffer |
| 0xDD | SET_OP_ADD | A | Add document from source |
| 0xDE | SET_OP_NEXT | E | Get next result |
| 0xDF | SET_OP_DESTROY | E | Destroy buffer |
Set Operation Types (encoded in imm16):
- 0: UNION
- 1: INTERSECT
- 2: EXCEPT
A.16 Working Table Operations (0xE0-0xE7)
Working table operations support recursive Common Table Expressions (CTEs).
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0xE0 | WT_NEW | B | Create working table |
| 0xE1 | WT_ADD | E | Add document to table |
| 0xE2 | WT_SWAP | E | Swap working/result tables |
| 0xE3 | WT_SCAN | E | Open scan, return first doc |
| 0xE4 | WT_EMPTY | E | Check if table is empty |
| 0xE5 | WT_DESTROY | E | Destroy working table |
| 0xE6 | SORT_NEW_VALUES | B | Create value-based sort buffer |
| 0xE7 | SORT_ADD_VALUES | A | Add row with computed keys |
A.17 Function Call Operations (0xE8-0xEF)
Function call operations invoke built-in, user-defined, and external functions.
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0xE8 | CALL_BUILTIN | B | R[dst] = builtin[imm16](args) |
| 0xE9 | CALL_SCALAR | B | R[dst] = scalar_func(args) |
| 0xEA | CALL_UDF | B | R[dst] = udf[imm16](args) |
| 0xEB | PUSH_ARG | E | Push R[src] to argument stack |
| 0xEC | POP_ARG | E | R[dst] = pop from argument stack |
| 0xED | CLEAR_ARGS | F | Clear argument stack |
| 0xEE | GET_ARG_COUNT | E | R[dst] = argument stack size |
| 0xEF | GET_ARG | B | R[dst] = argument_stack[imm16] |
A.18 Cursor Operations (0xF0-0xF7)
Cursor operations manage iteration over tables and CTEs.
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0xF0 | CURSOR_OPEN | B | Open cursor, return first doc |
| 0xF1 | CURSOR_NEXT | E | Return current doc, advance |
| 0xF2 | CURSOR_CLOSE | E | Close cursor, release resources |
| 0xF3 | CURSOR_VALID | E | R[dst] = cursor.is_valid() |
| 0xF4 | CURSOR_RESET | E | Reset cursor to beginning |
| 0xF5 | EMIT_ROW | E | Emit row to output callback |
| 0xF6 | YIELD | F | Suspend for streaming results |
| 0xF7 | CURSOR_TAKE | E | Move document with ownership |
Cursor Open Flags (imm16 encoding):
- Bit 15: is_cte flag (1=CTE, 0=collection)
- Bits 0-14: constant pool index for name
A.19 Subquery and External Operations (0xF8-0xFC)
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0xF8 | CALL_SUBQUERY | B | Execute subquery pool[imm16] |
| 0xF9 | CALL_EXTERNAL | B | Call external function |
| 0xFA | TABLEFUNC_OPEN | B | Open table function iterator |
| 0xFB | TABLEFUNC_NEXT | E | Get next row from table func |
| 0xFC | TABLEFUNC_CLOSE | E | Close table function iterator |
External Function IDs:
| ID | Function | Description |
|---|---|---|
| 0x0000 | kScriptEval | Evaluate Lua/Python script |
| 0x0100 | kFTSMatch | Full-text search match (@@ operator) |
| 0x0101 | kFTSScore | Full-text search relevance score |
The kFTSMatch external function implements the @@ (text search match) operator in WHERE clauses. When the planner encounters a column @@ to_tsquery('...') predicate, it lowers the expression to a CALL_EXTERNAL instruction with function ID 0x0100. The function accepts alternating field/query pairs on the argument stack and returns a boolean indicating whether the document matches the full-text search query. This enables CVM-compiled queries to evaluate FTS predicates inline without falling back to the Volcano executor.
A.20 Error and Extension (0xFD-0xFF)
| Opcode | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0xFD | ERROR | B | Raise error from pool[imm16] |
| 0xFE | EXTENDED | - | Extended opcode prefix |
| 0xFF | UNDEFINED | - | Invalid opcode (trap) |
A.21 Extended Opcodes (0xFE prefix)
Extended opcodes provide 256 additional instructions accessed via the 0xFE prefix.
A.21.1 Iteration Operations (0x01-0x0A)
| ExtOp | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x01 | ITER_ARR_BEGIN | E | Create array iterator |
| 0x02 | ITER_ARR_NEXT | E | Get next element or branch |
| 0x03 | ITER_ARR_END | E | Close array iterator |
| 0x04 | ITER_OBJ_BEGIN | E | Create object key iterator |
| 0x05 | ITER_OBJ_NEXT_KEY | E | Get next key or branch |
| 0x06 | ITER_OBJ_NEXT_VAL | E | Get value for current key |
| 0x07 | ITER_OBJ_END | E | Close object iterator |
| 0x08 | ITER_RANGE_BEGIN | A | Create range iterator |
| 0x09 | ITER_RANGE_NEXT | E | Get next value or branch |
| 0x0A | ITER_RANGE_END | E | Close range iterator |
A.21.2 Document Construction (0x0B-0x12)
| ExtOp | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x0B | DOC_NEW | E | Create empty document |
| 0x0C | DOC_FROM_JSON | E | Parse JSON to document |
| 0x0D | DOC_TO_JSON | E | Serialize document to JSON |
| 0x0E | DOC_CLONE | E | Deep clone document |
| 0x0F | DOC_MERGE | A | Merge two documents |
| 0x10 | DOC_PATCH | A | Apply JSON Patch |
| 0x11 | DOC_KEYS | E | Get keys as array |
| 0x12 | DOC_VALUES | E | Get values as array |
A.21.3 Composite Row Operations (0x13-0x1C)
Composite row operations enable zero-copy JOIN processing.
| ExtOp | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x13 | COMPOSITE_NEW | H | Create empty CompositeRow |
| 0x14 | COMPOSITE_ADD | H | Add document slot |
| 0x15 | COMPOSITE_GET | H | Get field by qualified name |
| 0x16 | COMPOSITE_GET_SLOT | H | Get field by slot index |
| 0x17 | COMPOSITE_MAT | H | Materialize to Document |
| 0x18 | COMPOSITE_EMIT | H | Emit composite row |
| 0x19 | COMPOSITE_CLEAR | H | Clear all slots |
| 0x1A | COMPOSITE_EMIT_MAPPED | H | Emit with column mapping |
| 0x1B | WT_SCAN_RESET | H | Reset working table scan |
| 0x1C | COMPOSITE_MAT_ALL_QUAL | H | Materialize with qualified names |
A.21.3a Outer Context Operations (0x85)
Outer context operations support correlated subquery execution within the CVM. When a subquery references columns from an outer query, these opcodes resolve the outer column values without falling back to the Volcano executor.
| ExtOp | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x85 | GET_OUTER_FIELD | H | R[dst] = outer_context.get_field(pool[pool_idx]) |
GET_OUTER_FIELD reads a field from the outer query's current row. During plan lowering, column references whose table alias is not in the local alias set are emitted as GET_OUTER_FIELD instead of GET_FIELD. The interpreter resolves the field from the outer row context, which may be either a plain Document or a CompositeRow (for outer queries involving joins).
Example: Correlated subquery
SELECT d.name,
(SELECT COUNT(*) FROM employees e WHERE e.dept_id = d.id)
FROM departments d
The inner subquery's reference to d.id compiles to:
GET_OUTER_FIELD R3, pool["d.id"] ; R3 = outer_row.d.id
GET_FIELD R4, pool["dept_id"] ; R4 = current_row.dept_id
CMP_EQ_POLY R5, R4, R3 ; R5 = (dept_id == d.id)
A.21.4 Vectorized/Batch Operations (0x20-0x67)
Batch operations enable SIMD-accelerated columnar processing.
Batch Scan (0x20-0x27):
| ExtOp | Mnemonic | Semantics |
|---|---|---|
| 0x20 | BATCH_SCAN_OPEN | Open columnar batch scan |
| 0x21 | BATCH_SCAN_NEXT | Get next ColumnBatch |
| 0x22 | BATCH_SCAN_CLOSE | Close batch scan |
| 0x23 | BATCH_EMIT | Emit column batch |
| 0x24 | BATCH_CONST_I64 | Create constant int64 batch |
| 0x25 | BATCH_CONST_F64 | Create constant float64 batch |
| 0x26 | BATCH_EXTRACT_COL | Extract column by index |
| 0x27 | BATCH_EXTRACT_COL_NAME | Extract column by name |
Batch Arithmetic (0x28-0x37):
| ExtOp | Mnemonic | Semantics |
|---|---|---|
| 0x28 | BATCH_ADD_I64 | Vectorized int64 addition |
| 0x29 | BATCH_SUB_I64 | Vectorized int64 subtraction |
| 0x2A | BATCH_MUL_I64 | Vectorized int64 multiply |
| 0x2B | BATCH_DIV_I64 | Vectorized int64 division |
| 0x30-0x37 | BATCH_*_F64 | Vectorized float64 ops |
Batch Comparison (0x38-0x47):
| ExtOp | Mnemonic | Semantics |
|---|---|---|
| 0x38-0x3D | BATCH_CMP_*_I64 | Vectorized int64 comparisons |
| 0x40-0x45 | BATCH_CMP_*_F64 | Vectorized float64 comparisons |
Batch Logical (0x48-0x4F):
| ExtOp | Mnemonic | Semantics |
|---|---|---|
| 0x48 | BATCH_AND | Selection vector intersection |
| 0x49 | BATCH_OR | Selection vector union |
| 0x4A | BATCH_NOT | Selection vector complement |
| 0x4B | BATCH_IS_NULL | Select null rows |
Parallel Operations (0x5C-0x67):
| ExtOp | Mnemonic | Semantics |
|---|---|---|
| 0x5C | PARALLEL_SCAN_OPEN | Open parallel scan |
| 0x5D | PARALLEL_SCAN_NEXT | Get next filtered batch |
| 0x5E | PARALLEL_SCAN_CLOSE | Close parallel scan |
| 0x60 | PARALLEL_PARTITION | Partition batch |
| 0x61 | PARALLEL_MERGE | Merge batch results |
| 0x62 | PARALLEL_BARRIER | Wait for workers |
A.21.5 SPI Cursor Operations (0x68-0x6F)
SPI operations support PL/pgSQL FOR-query loops.
| ExtOp | Mnemonic | Format | Semantics |
|---|---|---|---|
| 0x68 | SPI_CURSOR_OPEN | G | Open SPI cursor for query |
| 0x69 | SPI_CURSOR_FETCH | E | Fetch next row |
| 0x6A | SPI_CURSOR_CLOSE | E | Close SPI cursor |
| 0x6B | SPI_CURSOR_VALID | E | Check if more rows |
| 0x6C | SPI_EXECUTE | A | Execute dynamic SQL |
| 0x6D | SPI_EXECUTE_INTO | A | Execute into variable |
| 0x6E | SPI_PERFORM | E | Execute, discard result |
| 0x6F | SPI_CALL | E | Call procedure |
A.21.6 Exception Handling (0x70-0x76)
| ExtOp | Mnemonic | Semantics |
|---|---|---|
| 0x70 | EXCEPTION_PUSH | Push exception handler |
| 0x71 | EXCEPTION_POP | Pop exception handler |
| 0x72 | RAISE_EXCEPTION | Raise exception |
| 0x73 | RERAISE | Re-raise current exception |
| 0x74 | GET_DIAGNOSTICS | Get diagnostic item |
| 0x75 | SET_DIAGNOSTICS | Set diagnostic item |
| 0x76 | ASSERT | Assert condition |
A.22 Runtime Type System
The CVM uses a dynamic type system with the following type enumeration:
| Type ID | Type Name | Size | Description |
|---|---|---|---|
| 0x00 | Null | 0 | SQL NULL / JSON null |
| 0x01 | Bool | 1 | Boolean true/false |
| 0x02 | Int64 | 8 | 64-bit signed integer |
| 0x03 | Double | 8 | IEEE 754 double |
| 0x04 | String | var | UTF-8 string (ptr + len) |
| 0x05 | Array | var | Ordered collection |
| 0x06 | Document | var | Key-value object |
| 0x07 | Binary | var | Raw byte array (BYTEA) |
| 0x08 | Timestamp | 8 | Microseconds since epoch |
| 0x09 | TimestampTZ | 8 | UTC microseconds |
| 0x0A | Date | 4 | Days since epoch |
| 0x0B | Time | 8 | Microseconds since midnight |
| 0x0C | Interval | 16 | months + days + microseconds |
| 0x0D | Decimal | 16 | 128-bit arbitrary precision |
| 0x0E | UUID | 16 | 128-bit UUID |
| 0x0F | CompositeRow | var | Zero-copy JOIN result |
| 0x10 | AggState | var | Aggregation state (internal) |
A.22.1 VMValue Structure
The VMValue structure is a 24-byte discriminated union:
struct VMValue {
CVMType type; // 1 byte
uint8_t flags; // 1 byte (kFlagOwned=0x01, kFlagConst=0x02)
uint16_t reserved;
uint32_t padding;
union { // 16 bytes
bool bool_val;
int64_t int64_val;
double double_val;
StringRef string_val;
ArrayRef array_val;
Document* doc_val;
CompositeRow* composite_row_val;
TimestampVal timestamp_val;
// ... other types
};
};
A.23 Builtin Function Reference
The CVM provides an extensive library of built-in functions organized by category.
A.23.1 Mathematical Functions (0x0000-0x00FF)
| ID | Function | Signature | Description |
|---|---|---|---|
| 0x0000 | abs | (x) -> num | Absolute value |
| 0x0001 | floor | (x) -> num | Floor (round down) |
| 0x0002 | ceil | (x) -> num | Ceiling (round up) |
| 0x0003 | round | (x) -> num | Round to nearest |
| 0x0004 | trunc | (x) -> num | Truncate toward zero |
| 0x0005 | sqrt | (x) -> num | Square root |
| 0x0006 | pow | (x, y) -> num | Power function |
| 0x0007 | exp | (x) -> num | Exponential (e^x) |
| 0x0008 | log | (x) -> num | Natural logarithm |
| 0x0009 | log10 | (x) -> num | Base-10 logarithm |
| 0x000A | log2 | (x) -> num | Base-2 logarithm |
| 0x000B-0x0010 | sin/cos/tan/asin/acos/atan | (x) -> num | Trigonometric |
| 0x0011 | atan2 | (y, x) -> num | Two-argument arctangent |
| 0x0012 | sign | (x) -> int | Sign (-1, 0, 1) |
| 0x0014 | random | () -> double | Random [0, 1) |
| 0x0016 | pi | () -> double | Pi constant |
A.23.2 String Functions (0x0100-0x01FF)
| ID | Function | Signature | Description |
|---|---|---|---|
| 0x0100 | length | (s) -> int | String length |
| 0x0101 | upper | (s) -> str | Uppercase |
| 0x0102 | lower | (s) -> str | Lowercase |
| 0x0103 | trim | (s) -> str | Trim whitespace |
| 0x0106 | substring | (s, start, len) -> str | Extract substring |
| 0x0107 | concat | (s1, s2, ...) -> str | Concatenation |
| 0x0108 | replace | (s, from, to) -> str | String replacement |
| 0x010D | position | (substr, s) -> int | Find substring (1-based) |
| 0x010E | starts_with | (s, prefix) -> bool | Prefix check |
| 0x010F | ends_with | (s, suffix) -> bool | Suffix check |
| 0x0115 | regex_match | (s, pattern) -> bool | Regex match |
| 0x0119 | md5 | (s) -> str | MD5 hash |
| 0x011A | sha256 | (s) -> str | SHA-256 hash |
A.23.3 JSON/Document Functions (0x0500-0x05FF)
| ID | Function | Signature | Description |
|---|---|---|---|
| 0x0500 | json_extract | (doc, path) -> val | Extract at path |
| 0x0504 | json_keys | (doc) -> array | Object keys |
| 0x0505 | json_values | (doc) -> array | Object values |
| 0x0506 | json_contains | (doc, val) -> bool | Containment check |
| 0x0508 | json_parse | (s) -> doc | Parse JSON string |
| 0x0509 | json_stringify | (doc) -> str | Serialize to JSON |
| 0x0511 | jsonb_path_exists | (doc, path) -> bool | JSONPath exists |
| 0x0514 | json_build_array | (...) -> array | Construct array |
| 0x0515 | json_build_object | (...) -> obj | Construct object |
A.23.4 Date/Time Functions (0x0600-0x06FF)
| ID | Function | Signature | Description |
|---|---|---|---|
| 0x0600 | now | () -> timestamp | Current timestamp |
| 0x0601 | current_date | () -> date | Current date |
| 0x0603 | date_part | (part, ts) -> num | Extract part |
| 0x0604 | date_trunc | (part, ts) -> ts | Truncate to unit |
| 0x0605 | date_add | (ts, interval) -> ts | Add interval |
| 0x0607 | date_diff | (part, t1, t2) -> int | Difference |
| 0x0608 | format_date | (ts, fmt) -> str | Format timestamp |
A.24 Table Function Reference
Table functions return multiple rows and are used in FROM clauses.
| ID | Function | Args | Description |
|---|---|---|---|
| 0x0001 | generate_series | (start, stop) | Integer series |
| 0x0002 | generate_series | (start, stop, step) | With step |
| 0x0003 | generate_series | (start, stop, interval) | Timestamp series |
| 0x0010 | unnest | (array) | Expand array to rows |
| 0x0020 | json_each | (json) | Key-value pairs |
| 0x0030 | json_array_elements | (json) | Array to rows |
| 0x0040 | regexp_matches | (text, pattern) | Regex captures |
| 0x0041 | regexp_split_to_table | (text, pattern) | Split by regex |
| 0x0050 | string_to_table | (text, delim) | Split by delimiter |
A.25 Execution Examples
A.25.1 Simple Arithmetic Query
SELECT a + b * 2 FROM t
Compiled Bytecode:
00: CURSOR_OPEN R0, 0, 42 ; Open cursor for table t
04: JMP_NULL R0, 28 ; Jump to end if exhausted
08: GET_FIELD R1, 43 ; R1 = doc.a
0C: GET_FIELD R2, 44 ; R2 = doc.b
10: MOVE_IMM R3, 2 ; R3 = 2
14: MUL_I64 R4, R2, R3 ; R4 = b * 2
18: ADD_I64 R5, R1, R4 ; R5 = a + (b * 2)
1C: EMIT_ROW R5 ; Output result
20: CURSOR_NEXT R0, 0 ; Advance cursor
24: JMP -20 ; Loop back
28: CURSOR_CLOSE 0 ; Close cursor
2C: HALT ; Done
A.25.2 Aggregation Query
SELECT SUM(amount) FROM orders GROUP BY customer_id
Compiled Bytecode:
00: AGG_TBL_NEW R0, 1 ; Create agg table (SUM)
04: CURSOR_OPEN R1, 0, 50 ; Open orders cursor
08: JMP_NULL R1, 40 ; Jump if exhausted
0C: GET_FIELD R2, 51 ; R2 = customer_id
10: GET_FIELD R3, 52 ; R3 = amount
14: AGG_GET_CREATE R4, R0, R2 ; Get/create group for key
18: AGG_STATE_AT R5, R4, 0 ; Get SUM state
1C: AGG_ACCUM R5, R3 ; Accumulate amount
20: CURSOR_NEXT R1, 0 ; Advance
24: JMP -28 ; Loop
28: AGG_ITER_INIT R0 ; Init group iterator
2C: AGG_ITER_NEXT R6, R0 ; Get next group
30: JMP_NULL R6, 48 ; Done if null
34: AGG_FINAL R7, R6 ; Finalize SUM
38: EMIT_ROW R7 ; Output
3C: JMP -16 ; Next group
40: CURSOR_CLOSE 0
44: HALT
A.26 Performance Characteristics
A.26.1 Instruction Timing
| Category | Typical Cycles | Notes |
|---|---|---|
| Data movement | 1-2 | Register-to-register |
| Integer arithmetic | 1 | Single-cycle ALU |
| Float arithmetic | 3-5 | FPU latency |
| Comparison | 1 | Produces boolean |
| Branch (taken) | 3-5 | Pipeline flush |
| Branch (not taken) | 0 | Predicted fall-through |
| Function call | 10-20 | Stack frame setup |
| Hash table probe | 5-15 | Cache-dependent |
| Field access | 10-50 | Document traversal |
A.26.2 Opcode Frequency Analysis
Typical query workloads show the following opcode distribution:
| Category | Frequency | Optimization Target |
|---|---|---|
| Field access | 25-35% | Column pruning, caching |
| Comparisons | 15-25% | Predicate pushdown |
| Control flow | 15-20% | Branch prediction |
| Arithmetic | 10-15% | SIMD vectorization |
| Data movement | 10-15% | Register allocation |
| Aggregation | 5-10% | Parallel execution |
Insight
- The CVM achieves 2-5 million instructions per second for typical OLTP workloads through computed-goto dispatch and careful cache optimization.
- Vectorized batch operations (0x20-0x5F extended) can process 1000+ rows per opcode execution, achieving 10-50x throughput for analytical queries.
- Copy-and-patch JIT compilation elevates hot bytecode sequences to native code with 2-5x additional speedup at ~100us compilation latency.
A.27 Summary
The CVM instruction set provides a comprehensive foundation for executing SQL queries:
- 256 core opcodes organized into functional categories
- 256 extended opcodes via the 0xFE prefix for advanced operations
- Fixed 32-bit encoding for cache efficiency and fast decode
- Type-specialized operations for integer, float, and string processing
- Query-specific operations for cursors, aggregation, joins, and window functions
- Vectorized batch operations for SIMD-accelerated analytical processing
- PL/pgSQL support via SPI and exception handling opcodes
- Correlated subquery support via
GET_OUTER_FIELDfor outer row field access - Full-text search integration via
CALL_EXTERNALwithkFTSMatch/kFTSScorefor inline@@operator evaluation
The instruction set balances:
- Decode efficiency through fixed-width encoding
- Expressiveness through comprehensive operation coverage
- Performance through type specialization and batch operations
- Extensibility through the extended opcode mechanism