Appendix A: CVM Opcode Reference

This appendix provides a comprehensive reference for the Cognica Virtual Machine (CVM) instruction set architecture, including all opcodes, their encodings, operand formats, and execution semantics.

A.1 Instruction Format Overview

The CVM uses a fixed-width 32-bit instruction encoding optimized for cache efficiency and decode simplicity. Extended instructions requiring 64-bit immediates use an additional 8-byte word.

A.1.1 Format Types

The CVM defines seven instruction formats:

FormatDescriptionEncoding
A3-operand register[Opcode:8][Dst:4][Src1:4][Src2:4][Flags:4][Reserved:8]
B2-operand with immediate[Opcode:8][Dst:4][Src:4][Imm16:16]
CConditional branch[Opcode:8][Cond:4][Reserved:4][Offset16:16]
DExtended 64-bit immediate[Opcode:8][Dst:4][Reserved:20] + [Imm64:64]
ESingle/dual operand (unary)[Opcode:8][Dst:4][Src:4][Reserved:16]
FNo operands[Opcode:8][Reserved:24]
GRegister with pool index[Opcode:8][Dst:4][Reserved:4][PoolIdx:8][Reserved:8]
HExtended composite row[0xFE:8][ExtOp:8][Dst:8][Src:8] + [Op1:16][Op2:16]

A.1.2 Register Allocation

The CVM provides a virtualized register file with the following conventions:

Register RangePurpose
R0-R15General purpose registers
R16+Spill registers (allocated by register allocator)
F0-F15Floating-point registers (aliased to R0-R15 for bit-level operations)

Insight

  • The CVM uses a computed-goto dispatch mechanism for efficient opcode execution, achieving 2-5ns per instruction on modern processors.
  • Register allocation is performed at compile time by the RegisterAllocationPass, using a linear-scan algorithm for hot paths.
  • The 4-bit register fields support 16 logical registers; spill slots extend this to 256 virtual registers.

A.2 Opcode Space Layout

The 256-entry opcode space is organized into functional categories:

RangeCategory
0x00-0x0FData Movement Operations
0x10-0x1FInteger Arithmetic
0x20-0x2FFloating-Point Arithmetic
0x30-0x3FBitwise Operations
0x40-0x4FInteger Comparisons
0x50-0x57Float Comparisons
0x58-0x5FString Comparisons
0x60-0x6FLogical and Debug Operations
0x70-0x7FControl Flow
0x80-0x8FType Operations
0x90-0x9FField Access
0xA0-0xAFArray Operations
0xB0-0xBFString Operations
0xC0-0xCFAggregation Operations
0xD0-0xDFQuery Buffer Operations (Window, Hash, Sort)
0xE0-0xE7Working Table and Sort Extended
0xE8-0xEFFunction Call Operations
0xF0-0xF7Cursor Operations
0xF8-0xFCSubquery, External, Table Functions
0xFD-0xFFError, Extended, Undefined

A.3 Data Movement Operations (0x00-0x0F)

Data movement operations transfer values between registers, memory, and the constant pool.

A.3.1 Basic Movement

OpcodeMnemonicFormatSemantics
0x00NOPFNo operation
0x01MOVEER[dst] = R[src]
0x02MOVE_I64DR[dst] = imm64
0x03MOVE_F64DF[dst] = imm64 (bit reinterpret)
0x04MOVE_NULLER[dst] = NULL
0x05MOVE_TRUEER[dst] = true
0x06MOVE_FALSEER[dst] = false
0x07LOAD_CONSTBR[dst] = constant_pool[imm16]
0x08COPYER[dst] = deep_copy(R[src])
0x09SWAPEswap(R[dst], R[src])
0x0AMOVE_IMMBR[dst] = sign_extend(imm16)
0x0BMOVE_F2RER[dst] = F[src] (float to GPR)
0x0CMOVE_R2FEF[dst] = R[src] (GPR to float)
0x0DLOAD_PARAMBR[dst] = context.get_parameter(imm16)

Example: Loading a string constant

LOAD_CONST  R3, 42    ; R3 = pool[42] (string "hello")

A.4 Integer Arithmetic Operations (0x10-0x1F)

Integer arithmetic operates on 64-bit signed integers stored in general-purpose registers.

OpcodeMnemonicFormatSemantics
0x10ADD_I64AR[dst] = R[src1] + R[src2]
0x11SUB_I64AR[dst] = R[src1] - R[src2]
0x12MUL_I64AR[dst] = R[src1] * R[src2]
0x13DIV_I64AR[dst] = R[src1] / R[src2]
0x14MOD_I64AR[dst] = R[src1] % R[src2]
0x15NEG_I64ER[dst] = -R[src]
0x16ABS_I64ER[dst] = abs(R[src])
0x17ADD_I64_IMMBR[dst] = R[src] + sign_extend(imm16)
0x18SUB_I64_IMMBR[dst] = R[src] - sign_extend(imm16)
0x19MUL_I64_IMMBR[dst] = R[src] * sign_extend(imm16)
0x1AINC_I64ER[dst] = R[src] + 1
0x1BDEC_I64ER[dst] = R[src] - 1

Overflow Semantics: Integer overflow wraps according to two's complement arithmetic. Division by zero raises an error.

A.5 Floating-Point Arithmetic (0x20-0x2F)

Floating-point operations follow IEEE 754 double precision semantics.

OpcodeMnemonicFormatSemantics
0x20ADD_F64AF[dst] = F[src1] + F[src2]
0x21SUB_F64AF[dst] = F[src1] - F[src2]
0x22MUL_F64AF[dst] = F[src1] * F[src2]
0x23DIV_F64AF[dst] = F[src1] / F[src2]
0x24NEG_F64EF[dst] = -F[src]
0x25ABS_F64EF[dst] = abs(F[src])
0x26SQRT_F64EF[dst] = sqrt(F[src])
0x27FLOOR_F64EF[dst] = floor(F[src])
0x28CEIL_F64EF[dst] = ceil(F[src])
0x29ROUND_F64EF[dst] = round(F[src])
0x2APOW_F64AF[dst] = pow(F[src1], F[src2])
0x2BLOG_F64EF[dst] = log(F[src])
0x2CLOG10_F64EF[dst] = log10(F[src])
0x2DEXP_F64EF[dst] = exp(F[src])
0x2EMOD_F64AF[dst] = fmod(F[src1], F[src2])

A.6 Bitwise Operations (0x30-0x3F)

Bitwise operations manipulate 64-bit integers at the bit level.

OpcodeMnemonicFormatSemantics
0x30AND_I64AR[dst] = R[src1] & R[src2]
0x31OR_I64AR[dst] = R[src1] | R[src2]
0x32XOR_I64AR[dst] = R[src1] ^ R[src2]
0x33NOT_I64ER[dst] = ~R[src]
0x34SHL_I64AR[dst] = R[src1] << R[src2]
0x35SHR_I64AR[dst] = R[src1] >> R[src2] (logical)
0x36SAR_I64AR[dst] = R[src1] >> R[src2] (arithmetic)
0x37SHL_I64_IMMBR[dst] = R[src] << imm16
0x38SHR_I64_IMMBR[dst] = R[src] >> imm16 (logical)
0x39SAR_I64_IMMBR[dst] = R[src] >> imm16 (arithmetic)
0x3AAND_I64_IMMBR[dst] = R[src] & imm16
0x3BOR_I64_IMMBR[dst] = R[src] | imm16

A.7 Comparison Operations (0x40-0x5F)

Comparison operations produce boolean results. The CVM supports type-specialized comparisons for optimal performance.

A.7.1 Integer Comparisons (0x40-0x4F)

OpcodeMnemonicFormatSemantics
0x40CMP_EQ_I64AR[dst] = (R[src1] == R[src2])
0x41CMP_NE_I64AR[dst] = (R[src1] != R[src2])
0x42CMP_LT_I64AR[dst] = (R[src1] < R[src2])
0x43CMP_LE_I64AR[dst] = (R[src1] <= R[src2])
0x44CMP_GT_I64AR[dst] = (R[src1] > R[src2])
0x45CMP_GE_I64AR[dst] = (R[src1] >= R[src2])
0x46CMP_EQ_I64_IMMBR[dst] = (R[src] == imm16)
0x47CMP_NE_I64_IMMBR[dst] = (R[src] != imm16)
0x48CMP_LT_I64_IMMBR[dst] = (R[src] < imm16)
0x49CMP_LE_I64_IMMBR[dst] = (R[src] <= imm16)
0x4ACMP_GT_I64_IMMBR[dst] = (R[src] > imm16)
0x4BCMP_GE_I64_IMMBR[dst] = (R[src] >= imm16)
0x4CCMP_LT_POLYARuntime type dispatch for <
0x4DCMP_LE_POLYARuntime type dispatch for <=
0x4ECMP_GT_POLYARuntime type dispatch for >
0x4FCMP_GE_POLYARuntime type dispatch for >=

A.7.2 Float Comparisons (0x50-0x57)

OpcodeMnemonicFormatSemantics
0x50CMP_EQ_F64AR[dst] = (F[src1] == F[src2])
0x51CMP_NE_F64AR[dst] = (F[src1] != F[src2])
0x52CMP_LT_F64AR[dst] = (F[src1] < F[src2])
0x53CMP_LE_F64AR[dst] = (F[src1] <= F[src2])
0x54CMP_GT_F64AR[dst] = (F[src1] > F[src2])
0x55CMP_GE_F64AR[dst] = (F[src1] >= F[src2])

A.7.3 String Comparisons (0x58-0x5F)

OpcodeMnemonicFormatSemantics
0x58CMP_EQ_STRALexicographic equality
0x59CMP_NE_STRALexicographic inequality
0x5ACMP_LT_STRALexicographic less-than
0x5BCMP_LE_STRALexicographic less-or-equal
0x5CCMP_GT_STRALexicographic greater-than
0x5DCMP_GE_STRALexicographic greater-or-equal
0x5ECMP_EQ_POLYARuntime type dispatch for ==
0x5FCMP_NE_POLYARuntime type dispatch for !=

A.8 Logical and Debug Operations (0x60-0x6F)

A.8.1 Logical Operations (0x60-0x65)

OpcodeMnemonicFormatSemantics
0x60ANDAR[dst] = R[src1] && R[src2]
0x61ORAR[dst] = R[src1] || R[src2]
0x62NOTER[dst] = !R[src]
0x63AND_SCCShort-circuit: if !R[cond], skip
0x64OR_SCCShort-circuit: if R[cond], skip
0x65XORAR[dst] = R[src1] XOR R[src2]

A.8.2 Debug Operations (0x66-0x6B)

OpcodeMnemonicFormatSemantics
0x66DBG_PRINTEPrint R[src] to debug log
0x67DBG_BREAKFDebugger breakpoint
0x68DBG_TRACEBTrace with label imm16
0x69DBG_DUMPFDump VM state
0x6ADBG_ASSERTEAssert R[src] is truthy
0x6BDBG_PROFILEBProfile section marker

A.9 Control Flow Operations (0x70-0x7F)

Control flow operations manage the program counter and function calls.

OpcodeMnemonicFormatSemantics
0x70JMPCPC += offset16 (unconditional)
0x71JMP_TRUECif R[cond]: PC += offset16
0x72JMP_FALSECif !R[cond]: PC += offset16
0x73JMP_NULLCif R[cond] is NULL: PC += offset16
0x74JMP_NOT_NULLCif R[cond] is not NULL: PC += offset16
0x75JMP_ABSDPC = imm32 (absolute)
0x76CALLBPush frame, PC = target
0x77RETFReturn from function (no value)
0x78RET_VALEReturn R[src]
0x79HALTFStop execution
0x7AJMP_ZEROCif R[cond] == 0: PC += offset16
0x7BJMP_NOT_ZEROCif R[cond] != 0: PC += offset16
0x7CRET_NEXTESRF: Add R[src] to result accumulator
0x7DRET_QUERYBSRF: Execute query pool[imm16], add results

Branch Offset Encoding: The 16-bit signed offset is relative to the instruction following the branch, measured in 4-byte instruction units.

A.10 Type Operations (0x80-0x8F)

Type operations handle runtime type checking, casting, and NULL handling.

OpcodeMnemonicFormatSemantics
0x80TYPEOFER[dst] = typeof(R[src]) as type ID
0x81CAST_I64_F64EF[dst] = (double)R[src]
0x82CAST_F64_I64ER[dst] = (int64_t)F[src]
0x83CAST_STR_I64ER[dst] = parse_int64(R[src])
0x84CAST_STR_F64EF[dst] = parse_double(R[src])
0x85CAST_I64_STRER[dst] = to_string(R[src])
0x86CAST_F64_STRER[dst] = to_string(F[src])
0x87CAST_BOOL_I64ER[dst] = R[src] ? 1 : 0
0x88CAST_I64_BOOLER[dst] = R[src] != 0
0x89IS_NULLER[dst] = (R[src] is NULL)
0x8AIS_NOT_NULLER[dst] = (R[src] is not NULL)
0x8BCOALESCEAR[dst] = R[src1] ?? R[src2]
0x8CNULLIFAR[dst] = (R[src1]==R[src2]) ? NULL : R[src1]
0x8DCAST_BOOL_STRER[dst] = R[src] ? "true" : "false"
0x8ECAST_STR_BOOLER[dst] = parse_bool(R[src])
0x8FCASTBR[dst] = cast(R[src], target_type)

A.11 Field Access Operations (0x90-0x9F)

Field access operations extract and modify document fields.

OpcodeMnemonicFormatSemantics
0x90GET_FIELDBR[dst] = doc.field[pool[imm16]]
0x91GET_FIELD_DYNAR[dst] = doc.field[R[src]] (dynamic)
0x92SET_FIELDBdoc.field[pool[imm16]] = R[src]
0x93HAS_FIELDBR[dst] = doc.has(pool[imm16])
0x94DEL_FIELDBdoc.remove(pool[imm16])
0x95GET_NESTEDBR[dst] = doc.path(pool[imm16])
0x96SET_NESTEDBdoc.path(pool[imm16]) = R[src]
0x97FIELD_COUNTER[dst] = doc.field_count()
0x98FIELD_NAMESER[dst] = doc.field_names() as array
0x99GET_DOCER[dst] = context.input_document
0x9AGET_FIELD_IDXBR[dst] = doc.field_by_index(imm16)

A.12 Array Operations (0xA0-0xAF)

Array operations manipulate ordered collections of values.

OpcodeMnemonicFormatSemantics
0xA0ARR_NEWER[dst] = new empty array
0xA1ARR_LENER[dst] = R[src].length
0xA2ARR_GETAR[dst] = R[src1][R[src2]]
0xA3ARR_GET_IMMBR[dst] = R[src][imm16]
0xA4ARR_SETAR[src1][R[src2]] = R[dst]
0xA5ARR_SET_IMMBR[src][imm16] = R[dst]
0xA6ARR_PUSHAR[dst].push(R[src])
0xA7ARR_POPER[dst] = R[src].pop()
0xA8ARR_SLICEAR[dst] = R[src1].slice(R[src2], R[flags])
0xA9ARR_CONCATAR[dst] = R[src1].concat(R[src2])
0xAAARR_CONTAINSAR[dst] = R[src1].contains(R[src2])
0xABARR_INDEXOFAR[dst] = R[src1].indexOf(R[src2])
0xACARR_REVERSEER[dst] = R[src].reverse()
0xADARR_SORTER[dst] = R[src].sort()

A.13 String Operations (0xB0-0xBF)

String operations handle UTF-8 encoded text.

OpcodeMnemonicFormatSemantics
0xB0STR_LENER[dst] = R[src].length
0xB1STR_CONCATAR[dst] = R[src1] + R[src2]
0xB2STR_SUBSTRAR[dst] = R[src1].substr(R[src2], R[flags])
0xB3STR_UPPERER[dst] = R[src].toUpperCase()
0xB4STR_LOWERER[dst] = R[src].toLowerCase()
0xB5STR_TRIMER[dst] = R[src].trim()
0xB6STR_LIKEAR[dst] = R[src1] LIKE R[src2]
0xB7STR_ILIKEAR[dst] = R[src1] ILIKE R[src2]
0xB8STR_REGEXAR[dst] = R[src1] ~ R[src2] (regex)
0xB9STR_REPLACEAR[dst] = R[src1].replace(R[src2], R[flags])
0xBASTR_SPLITAR[dst] = R[src1].split(R[src2])
0xBBSTR_STARTSAR[dst] = R[src1].startsWith(R[src2])
0xBCSTR_ENDSAR[dst] = R[src1].endsWith(R[src2])
0xBDSTR_CONTAINSAR[dst] = R[src1].contains(R[src2])
0xBESTR_INDEXOFAR[dst] = R[src1].indexOf(R[src2])
0xBFSTR_WILDCARDAR[dst] = R[src1] matches R[src2] (wildcard)

A.14 Aggregation Operations (0xC0-0xCF)

Aggregation operations support SQL aggregate functions and GROUP BY processing.

A.14.1 Single Aggregation (0xC0-0xC7)

OpcodeMnemonicFormatSemantics
0xC0AGG_INITBR[dst] = new_agg_state(type=imm16)
0xC1AGG_ACCUMAR[dst].accumulate(R[src])
0xC2AGG_ACCUM_CONDAif R[src2]: R[dst].accumulate(R[src1])
0xC3AGG_FINALER[dst] = R[src].finalize()
0xC4AGG_MERGEAR[dst].merge(R[src])
0xC5AGG_RESETER[dst].reset()
0xC6AGG_COUNTER[dst] = R[src].count()
0xC7AGG_SUMER[dst] = R[src].sum()

A.14.2 Aggregation Tables (0xC8-0xCF)

OpcodeMnemonicFormatSemantics
0xC8AGG_TBL_NEWBCreate aggregation table
0xC9AGG_GET_CREATEAGet/create group for key
0xCAAGG_ITER_INITEInitialize group iterator
0xCBAGG_ITER_NEXTEGet next (key, states) pair
0xCCAGG_TBL_NEW_MULTIBCreate multi-function table
0xCDAGG_STATE_ATBGet state at index
0xCEAGG_ITER_HAS_NEXTECheck if more groups

Aggregation Function Types:

IDFunctionDescription
0COUNTCount non-NULL values
1SUMSum of values
2AVGAverage (sum/count)
3MINMinimum value
4MAXMaximum value
5COUNT(*)Count all rows
6STDDEV_POPPopulation standard deviation
7STDDEV_SAMPSample standard deviation
8VAR_POPPopulation variance
9VAR_SAMPSample variance
10FIRSTFirst non-NULL value
11LASTLast non-NULL value
12STRING_AGGConcatenate strings
13ARRAY_AGGCollect into array

A.15 Query Buffer Operations (0xD0-0xDF)

Query buffer operations support window functions, hash joins, sorting, and set operations.

A.15.1 Window Buffer (0xD0-0xD3)

OpcodeMnemonicFormatSemantics
0xD0WIN_NEWECreate window buffer
0xD1WIN_ADDEAdd row to window buffer
0xD2WIN_COMPUTEBCompute window functions
0xD3WIN_NEXTEGet next row with results

A.15.2 Hash Table (0xD4-0xD7)

OpcodeMnemonicFormatSemantics
0xD4HT_NEWECreate hash table
0xD5HT_INSERTAInsert (key, value)
0xD6HT_PROBEALookup key, get matches
0xD7HT_DESTROYEDestroy hash table

A.15.3 Sort Buffer (0xD8-0xDB)

OpcodeMnemonicFormatSemantics
0xD8SORT_NEWBCreate sort buffer
0xD9SORT_ADDEAdd row to buffer
0xDASORT_NEXTEGet next sorted row
0xDBSORT_DESTROYEDestroy sort buffer

A.15.4 Set Operations (0xDC-0xDF)

OpcodeMnemonicFormatSemantics
0xDCSET_OP_NEWBCreate set operation buffer
0xDDSET_OP_ADDAAdd document from source
0xDESET_OP_NEXTEGet next result
0xDFSET_OP_DESTROYEDestroy buffer

Set Operation Types (encoded in imm16):

  • 0: UNION
  • 1: INTERSECT
  • 2: EXCEPT

A.16 Working Table Operations (0xE0-0xE7)

Working table operations support recursive Common Table Expressions (CTEs).

OpcodeMnemonicFormatSemantics
0xE0WT_NEWBCreate working table
0xE1WT_ADDEAdd document to table
0xE2WT_SWAPESwap working/result tables
0xE3WT_SCANEOpen scan, return first doc
0xE4WT_EMPTYECheck if table is empty
0xE5WT_DESTROYEDestroy working table
0xE6SORT_NEW_VALUESBCreate value-based sort buffer
0xE7SORT_ADD_VALUESAAdd row with computed keys

A.17 Function Call Operations (0xE8-0xEF)

Function call operations invoke built-in, user-defined, and external functions.

OpcodeMnemonicFormatSemantics
0xE8CALL_BUILTINBR[dst] = builtin[imm16](args)
0xE9CALL_SCALARBR[dst] = scalar_func(args)
0xEACALL_UDFBR[dst] = udf[imm16](args)
0xEBPUSH_ARGEPush R[src] to argument stack
0xECPOP_ARGER[dst] = pop from argument stack
0xEDCLEAR_ARGSFClear argument stack
0xEEGET_ARG_COUNTER[dst] = argument stack size
0xEFGET_ARGBR[dst] = argument_stack[imm16]

A.18 Cursor Operations (0xF0-0xF7)

Cursor operations manage iteration over tables and CTEs.

OpcodeMnemonicFormatSemantics
0xF0CURSOR_OPENBOpen cursor, return first doc
0xF1CURSOR_NEXTEReturn current doc, advance
0xF2CURSOR_CLOSEEClose cursor, release resources
0xF3CURSOR_VALIDER[dst] = cursor.is_valid()
0xF4CURSOR_RESETEReset cursor to beginning
0xF5EMIT_ROWEEmit row to output callback
0xF6YIELDFSuspend for streaming results
0xF7CURSOR_TAKEEMove document with ownership

Cursor Open Flags (imm16 encoding):

  • Bit 15: is_cte flag (1=CTE, 0=collection)
  • Bits 0-14: constant pool index for name

A.19 Subquery and External Operations (0xF8-0xFC)

OpcodeMnemonicFormatSemantics
0xF8CALL_SUBQUERYBExecute subquery pool[imm16]
0xF9CALL_EXTERNALBCall external function
0xFATABLEFUNC_OPENBOpen table function iterator
0xFBTABLEFUNC_NEXTEGet next row from table func
0xFCTABLEFUNC_CLOSEEClose table function iterator

External Function IDs:

IDFunctionDescription
0x0000kScriptEvalEvaluate Lua/Python script
0x0100kFTSMatchFull-text search match (@@ operator)
0x0101kFTSScoreFull-text search relevance score

The kFTSMatch external function implements the @@ (text search match) operator in WHERE clauses. When the planner encounters a column @@ to_tsquery('...') predicate, it lowers the expression to a CALL_EXTERNAL instruction with function ID 0x0100. The function accepts alternating field/query pairs on the argument stack and returns a boolean indicating whether the document matches the full-text search query. This enables CVM-compiled queries to evaluate FTS predicates inline without falling back to the Volcano executor.

A.20 Error and Extension (0xFD-0xFF)

OpcodeMnemonicFormatSemantics
0xFDERRORBRaise error from pool[imm16]
0xFEEXTENDED-Extended opcode prefix
0xFFUNDEFINED-Invalid opcode (trap)

A.21 Extended Opcodes (0xFE prefix)

Extended opcodes provide 256 additional instructions accessed via the 0xFE prefix.

A.21.1 Iteration Operations (0x01-0x0A)

ExtOpMnemonicFormatSemantics
0x01ITER_ARR_BEGINECreate array iterator
0x02ITER_ARR_NEXTEGet next element or branch
0x03ITER_ARR_ENDEClose array iterator
0x04ITER_OBJ_BEGINECreate object key iterator
0x05ITER_OBJ_NEXT_KEYEGet next key or branch
0x06ITER_OBJ_NEXT_VALEGet value for current key
0x07ITER_OBJ_ENDEClose object iterator
0x08ITER_RANGE_BEGINACreate range iterator
0x09ITER_RANGE_NEXTEGet next value or branch
0x0AITER_RANGE_ENDEClose range iterator

A.21.2 Document Construction (0x0B-0x12)

ExtOpMnemonicFormatSemantics
0x0BDOC_NEWECreate empty document
0x0CDOC_FROM_JSONEParse JSON to document
0x0DDOC_TO_JSONESerialize document to JSON
0x0EDOC_CLONEEDeep clone document
0x0FDOC_MERGEAMerge two documents
0x10DOC_PATCHAApply JSON Patch
0x11DOC_KEYSEGet keys as array
0x12DOC_VALUESEGet values as array

A.21.3 Composite Row Operations (0x13-0x1C)

Composite row operations enable zero-copy JOIN processing.

ExtOpMnemonicFormatSemantics
0x13COMPOSITE_NEWHCreate empty CompositeRow
0x14COMPOSITE_ADDHAdd document slot
0x15COMPOSITE_GETHGet field by qualified name
0x16COMPOSITE_GET_SLOTHGet field by slot index
0x17COMPOSITE_MATHMaterialize to Document
0x18COMPOSITE_EMITHEmit composite row
0x19COMPOSITE_CLEARHClear all slots
0x1ACOMPOSITE_EMIT_MAPPEDHEmit with column mapping
0x1BWT_SCAN_RESETHReset working table scan
0x1CCOMPOSITE_MAT_ALL_QUALHMaterialize with qualified names

A.21.3a Outer Context Operations (0x85)

Outer context operations support correlated subquery execution within the CVM. When a subquery references columns from an outer query, these opcodes resolve the outer column values without falling back to the Volcano executor.

ExtOpMnemonicFormatSemantics
0x85GET_OUTER_FIELDHR[dst] = outer_context.get_field(pool[pool_idx])

GET_OUTER_FIELD reads a field from the outer query's current row. During plan lowering, column references whose table alias is not in the local alias set are emitted as GET_OUTER_FIELD instead of GET_FIELD. The interpreter resolves the field from the outer row context, which may be either a plain Document or a CompositeRow (for outer queries involving joins).

Example: Correlated subquery

SELECT d.name,
       (SELECT COUNT(*) FROM employees e WHERE e.dept_id = d.id)
FROM departments d

The inner subquery's reference to d.id compiles to:

GET_OUTER_FIELD  R3, pool["d.id"]   ; R3 = outer_row.d.id
GET_FIELD        R4, pool["dept_id"] ; R4 = current_row.dept_id
CMP_EQ_POLY      R5, R4, R3         ; R5 = (dept_id == d.id)

A.21.4 Vectorized/Batch Operations (0x20-0x67)

Batch operations enable SIMD-accelerated columnar processing.

Batch Scan (0x20-0x27):

ExtOpMnemonicSemantics
0x20BATCH_SCAN_OPENOpen columnar batch scan
0x21BATCH_SCAN_NEXTGet next ColumnBatch
0x22BATCH_SCAN_CLOSEClose batch scan
0x23BATCH_EMITEmit column batch
0x24BATCH_CONST_I64Create constant int64 batch
0x25BATCH_CONST_F64Create constant float64 batch
0x26BATCH_EXTRACT_COLExtract column by index
0x27BATCH_EXTRACT_COL_NAMEExtract column by name

Batch Arithmetic (0x28-0x37):

ExtOpMnemonicSemantics
0x28BATCH_ADD_I64Vectorized int64 addition
0x29BATCH_SUB_I64Vectorized int64 subtraction
0x2ABATCH_MUL_I64Vectorized int64 multiply
0x2BBATCH_DIV_I64Vectorized int64 division
0x30-0x37BATCH_*_F64Vectorized float64 ops

Batch Comparison (0x38-0x47):

ExtOpMnemonicSemantics
0x38-0x3DBATCH_CMP_*_I64Vectorized int64 comparisons
0x40-0x45BATCH_CMP_*_F64Vectorized float64 comparisons

Batch Logical (0x48-0x4F):

ExtOpMnemonicSemantics
0x48BATCH_ANDSelection vector intersection
0x49BATCH_ORSelection vector union
0x4ABATCH_NOTSelection vector complement
0x4BBATCH_IS_NULLSelect null rows

Parallel Operations (0x5C-0x67):

ExtOpMnemonicSemantics
0x5CPARALLEL_SCAN_OPENOpen parallel scan
0x5DPARALLEL_SCAN_NEXTGet next filtered batch
0x5EPARALLEL_SCAN_CLOSEClose parallel scan
0x60PARALLEL_PARTITIONPartition batch
0x61PARALLEL_MERGEMerge batch results
0x62PARALLEL_BARRIERWait for workers

A.21.5 SPI Cursor Operations (0x68-0x6F)

SPI operations support PL/pgSQL FOR-query loops.

ExtOpMnemonicFormatSemantics
0x68SPI_CURSOR_OPENGOpen SPI cursor for query
0x69SPI_CURSOR_FETCHEFetch next row
0x6ASPI_CURSOR_CLOSEEClose SPI cursor
0x6BSPI_CURSOR_VALIDECheck if more rows
0x6CSPI_EXECUTEAExecute dynamic SQL
0x6DSPI_EXECUTE_INTOAExecute into variable
0x6ESPI_PERFORMEExecute, discard result
0x6FSPI_CALLECall procedure

A.21.6 Exception Handling (0x70-0x76)

ExtOpMnemonicSemantics
0x70EXCEPTION_PUSHPush exception handler
0x71EXCEPTION_POPPop exception handler
0x72RAISE_EXCEPTIONRaise exception
0x73RERAISERe-raise current exception
0x74GET_DIAGNOSTICSGet diagnostic item
0x75SET_DIAGNOSTICSSet diagnostic item
0x76ASSERTAssert condition

A.22 Runtime Type System

The CVM uses a dynamic type system with the following type enumeration:

Type IDType NameSizeDescription
0x00Null0SQL NULL / JSON null
0x01Bool1Boolean true/false
0x02Int64864-bit signed integer
0x03Double8IEEE 754 double
0x04StringvarUTF-8 string (ptr + len)
0x05ArrayvarOrdered collection
0x06DocumentvarKey-value object
0x07BinaryvarRaw byte array (BYTEA)
0x08Timestamp8Microseconds since epoch
0x09TimestampTZ8UTC microseconds
0x0ADate4Days since epoch
0x0BTime8Microseconds since midnight
0x0CInterval16months + days + microseconds
0x0DDecimal16128-bit arbitrary precision
0x0EUUID16128-bit UUID
0x0FCompositeRowvarZero-copy JOIN result
0x10AggStatevarAggregation state (internal)

A.22.1 VMValue Structure

The VMValue structure is a 24-byte discriminated union:

struct VMValue {
  CVMType type;     // 1 byte
  uint8_t flags;    // 1 byte (kFlagOwned=0x01, kFlagConst=0x02)
  uint16_t reserved;
  uint32_t padding;
  union {           // 16 bytes
    bool bool_val;
    int64_t int64_val;
    double double_val;
    StringRef string_val;
    ArrayRef array_val;
    Document* doc_val;
    CompositeRow* composite_row_val;
    TimestampVal timestamp_val;
    // ... other types
  };
};

A.23 Builtin Function Reference

The CVM provides an extensive library of built-in functions organized by category.

A.23.1 Mathematical Functions (0x0000-0x00FF)

IDFunctionSignatureDescription
0x0000abs(x) -> numAbsolute value
0x0001floor(x) -> numFloor (round down)
0x0002ceil(x) -> numCeiling (round up)
0x0003round(x) -> numRound to nearest
0x0004trunc(x) -> numTruncate toward zero
0x0005sqrt(x) -> numSquare root
0x0006pow(x, y) -> numPower function
0x0007exp(x) -> numExponential (e^x)
0x0008log(x) -> numNatural logarithm
0x0009log10(x) -> numBase-10 logarithm
0x000Alog2(x) -> numBase-2 logarithm
0x000B-0x0010sin/cos/tan/asin/acos/atan(x) -> numTrigonometric
0x0011atan2(y, x) -> numTwo-argument arctangent
0x0012sign(x) -> intSign (-1, 0, 1)
0x0014random() -> doubleRandom [0, 1)
0x0016pi() -> doublePi constant

A.23.2 String Functions (0x0100-0x01FF)

IDFunctionSignatureDescription
0x0100length(s) -> intString length
0x0101upper(s) -> strUppercase
0x0102lower(s) -> strLowercase
0x0103trim(s) -> strTrim whitespace
0x0106substring(s, start, len) -> strExtract substring
0x0107concat(s1, s2, ...) -> strConcatenation
0x0108replace(s, from, to) -> strString replacement
0x010Dposition(substr, s) -> intFind substring (1-based)
0x010Estarts_with(s, prefix) -> boolPrefix check
0x010Fends_with(s, suffix) -> boolSuffix check
0x0115regex_match(s, pattern) -> boolRegex match
0x0119md5(s) -> strMD5 hash
0x011Asha256(s) -> strSHA-256 hash

A.23.3 JSON/Document Functions (0x0500-0x05FF)

IDFunctionSignatureDescription
0x0500json_extract(doc, path) -> valExtract at path
0x0504json_keys(doc) -> arrayObject keys
0x0505json_values(doc) -> arrayObject values
0x0506json_contains(doc, val) -> boolContainment check
0x0508json_parse(s) -> docParse JSON string
0x0509json_stringify(doc) -> strSerialize to JSON
0x0511jsonb_path_exists(doc, path) -> boolJSONPath exists
0x0514json_build_array(...) -> arrayConstruct array
0x0515json_build_object(...) -> objConstruct object

A.23.4 Date/Time Functions (0x0600-0x06FF)

IDFunctionSignatureDescription
0x0600now() -> timestampCurrent timestamp
0x0601current_date() -> dateCurrent date
0x0603date_part(part, ts) -> numExtract part
0x0604date_trunc(part, ts) -> tsTruncate to unit
0x0605date_add(ts, interval) -> tsAdd interval
0x0607date_diff(part, t1, t2) -> intDifference
0x0608format_date(ts, fmt) -> strFormat timestamp

A.24 Table Function Reference

Table functions return multiple rows and are used in FROM clauses.

IDFunctionArgsDescription
0x0001generate_series(start, stop)Integer series
0x0002generate_series(start, stop, step)With step
0x0003generate_series(start, stop, interval)Timestamp series
0x0010unnest(array)Expand array to rows
0x0020json_each(json)Key-value pairs
0x0030json_array_elements(json)Array to rows
0x0040regexp_matches(text, pattern)Regex captures
0x0041regexp_split_to_table(text, pattern)Split by regex
0x0050string_to_table(text, delim)Split by delimiter

A.25 Execution Examples

A.25.1 Simple Arithmetic Query

SELECT a + b * 2 FROM t

Compiled Bytecode:

00: CURSOR_OPEN     R0, 0, 42     ; Open cursor for table t
04: JMP_NULL        R0, 28        ; Jump to end if exhausted
08: GET_FIELD       R1, 43        ; R1 = doc.a
0C: GET_FIELD       R2, 44        ; R2 = doc.b
10: MOVE_IMM        R3, 2         ; R3 = 2
14: MUL_I64         R4, R2, R3    ; R4 = b * 2
18: ADD_I64         R5, R1, R4    ; R5 = a + (b * 2)
1C: EMIT_ROW        R5            ; Output result
20: CURSOR_NEXT     R0, 0         ; Advance cursor
24: JMP             -20           ; Loop back
28: CURSOR_CLOSE    0             ; Close cursor
2C: HALT                          ; Done

A.25.2 Aggregation Query

SELECT SUM(amount) FROM orders GROUP BY customer_id

Compiled Bytecode:

00: AGG_TBL_NEW     R0, 1         ; Create agg table (SUM)
04: CURSOR_OPEN     R1, 0, 50     ; Open orders cursor
08: JMP_NULL        R1, 40        ; Jump if exhausted
0C: GET_FIELD       R2, 51        ; R2 = customer_id
10: GET_FIELD       R3, 52        ; R3 = amount
14: AGG_GET_CREATE  R4, R0, R2    ; Get/create group for key
18: AGG_STATE_AT    R5, R4, 0     ; Get SUM state
1C: AGG_ACCUM       R5, R3        ; Accumulate amount
20: CURSOR_NEXT     R1, 0         ; Advance
24: JMP             -28           ; Loop
28: AGG_ITER_INIT   R0            ; Init group iterator
2C: AGG_ITER_NEXT   R6, R0        ; Get next group
30: JMP_NULL        R6, 48        ; Done if null
34: AGG_FINAL       R7, R6        ; Finalize SUM
38: EMIT_ROW        R7            ; Output
3C: JMP             -16           ; Next group
40: CURSOR_CLOSE    0
44: HALT

A.26 Performance Characteristics

A.26.1 Instruction Timing

CategoryTypical CyclesNotes
Data movement1-2Register-to-register
Integer arithmetic1Single-cycle ALU
Float arithmetic3-5FPU latency
Comparison1Produces boolean
Branch (taken)3-5Pipeline flush
Branch (not taken)0Predicted fall-through
Function call10-20Stack frame setup
Hash table probe5-15Cache-dependent
Field access10-50Document traversal

A.26.2 Opcode Frequency Analysis

Typical query workloads show the following opcode distribution:

CategoryFrequencyOptimization Target
Field access25-35%Column pruning, caching
Comparisons15-25%Predicate pushdown
Control flow15-20%Branch prediction
Arithmetic10-15%SIMD vectorization
Data movement10-15%Register allocation
Aggregation5-10%Parallel execution

Insight

  • The CVM achieves 2-5 million instructions per second for typical OLTP workloads through computed-goto dispatch and careful cache optimization.
  • Vectorized batch operations (0x20-0x5F extended) can process 1000+ rows per opcode execution, achieving 10-50x throughput for analytical queries.
  • Copy-and-patch JIT compilation elevates hot bytecode sequences to native code with 2-5x additional speedup at ~100us compilation latency.

A.27 Summary

The CVM instruction set provides a comprehensive foundation for executing SQL queries:

  1. 256 core opcodes organized into functional categories
  2. 256 extended opcodes via the 0xFE prefix for advanced operations
  3. Fixed 32-bit encoding for cache efficiency and fast decode
  4. Type-specialized operations for integer, float, and string processing
  5. Query-specific operations for cursors, aggregation, joins, and window functions
  6. Vectorized batch operations for SIMD-accelerated analytical processing
  7. PL/pgSQL support via SPI and exception handling opcodes
  8. Correlated subquery support via GET_OUTER_FIELD for outer row field access
  9. Full-text search integration via CALL_EXTERNAL with kFTSMatch/kFTSScore for inline @@ operator evaluation

The instruction set balances:

  • Decode efficiency through fixed-width encoding
  • Expressiveness through comprehensive operation coverage
  • Performance through type specialization and batch operations
  • Extensibility through the extended opcode mechanism

Copyright (c) 2023-2026 Cognica, Inc.