Appendix A: CVM Opcode Reference

This appendix provides a comprehensive reference for the Cognica Virtual Machine (CVM) instruction set architecture, including all opcodes, their encodings, operand formats, and execution semantics.

A.1 Instruction Format Overview

The CVM uses a fixed-width 32-bit instruction encoding optimized for cache efficiency and decode simplicity. Extended instructions requiring 64-bit immediates use an additional 8-byte word.

A.1.1 Format Types

The CVM defines seven instruction formats:

Format	Description	Encoding
A	3-operand register	`[Opcode:8][Dst:4][Src1:4][Src2:4][Flags:4][Reserved:8]`
B	2-operand with immediate	`[Opcode:8][Dst:4][Src:4][Imm16:16]`
C	Conditional branch	`[Opcode:8][Cond:4][Reserved:4][Offset16:16]`
D	Extended 64-bit immediate	`[Opcode:8][Dst:4][Reserved:20]` + `[Imm64:64]`
E	Single/dual operand (unary)	`[Opcode:8][Dst:4][Src:4][Reserved:16]`
F	No operands	`[Opcode:8][Reserved:24]`
G	Register with pool index	`[Opcode:8][Dst:4][Reserved:4][PoolIdx:8][Reserved:8]`
H	Extended composite row	`[0xFE:8][ExtOp:8][Dst:8][Src:8]` + `[Op1:16][Op2:16]`

A.1.2 Register Allocation

The CVM provides a virtualized register file with the following conventions:

Register Range	Purpose
R0-R15	General purpose registers
R16+	Spill registers (allocated by register allocator)
F0-F15	Floating-point registers (aliased to R0-R15 for bit-level operations)

Insight

The CVM uses a computed-goto dispatch mechanism for efficient opcode execution, achieving 2-5ns per instruction on modern processors.

Register allocation is performed at compile time by the RegisterAllocationPass, using a linear-scan algorithm for hot paths.

The 4-bit register fields support 16 logical registers; spill slots extend this to 256 virtual registers.

A.2 Opcode Space Layout

The 256-entry opcode space is organized into functional categories:

Range	Category
0x00-0x0F	Data Movement Operations
0x10-0x1F	Integer Arithmetic
0x20-0x2F	Floating-Point Arithmetic
0x30-0x3F	Bitwise Operations
0x40-0x4F	Integer Comparisons
0x50-0x57	Float Comparisons
0x58-0x5F	String Comparisons
0x60-0x6F	Logical and Debug Operations
0x70-0x7F	Control Flow
0x80-0x8F	Type Operations
0x90-0x9F	Field Access
0xA0-0xAF	Array Operations
0xB0-0xBF	String Operations
0xC0-0xCF	Aggregation Operations
0xD0-0xDF	Query Buffer Operations (Window, Hash, Sort)
0xE0-0xE7	Working Table and Sort Extended
0xE8-0xEF	Function Call Operations
0xF0-0xF7	Cursor Operations
0xF8-0xFC	Subquery, External, Table Functions
0xFD-0xFF	Error, Extended, Undefined

A.3 Data Movement Operations (0x00-0x0F)

Data movement operations transfer values between registers, memory, and the constant pool.

A.3.1 Basic Movement

Opcode	Mnemonic	Format	Semantics
0x00	`NOP`	F	No operation
0x01	`MOVE`	E	`R[dst] = R[src]`
0x02	`MOVE_I64`	D	`R[dst] = imm64`
0x03	`MOVE_F64`	D	`F[dst] = imm64` (bit reinterpret)
0x04	`MOVE_NULL`	E	`R[dst] = NULL`
0x05	`MOVE_TRUE`	E	`R[dst] = true`
0x06	`MOVE_FALSE`	E	`R[dst] = false`
0x07	`LOAD_CONST`	B	`R[dst] = constant_pool[imm16]`
0x08	`COPY`	E	`R[dst] = deep_copy(R[src])`
0x09	`SWAP`	E	`swap(R[dst], R[src])`
0x0A	`MOVE_IMM`	B	`R[dst] = sign_extend(imm16)`
0x0B	`MOVE_F2R`	E	`R[dst] = F[src]` (float to GPR)
0x0C	`MOVE_R2F`	E	`F[dst] = R[src]` (GPR to float)
0x0D	`LOAD_PARAM`	B	`R[dst] = context.get_parameter(imm16)`

Example: Loading a string constant

LOAD_CONST  R3, 42    ; R3 = pool[42] (string "hello")

A.4 Integer Arithmetic Operations (0x10-0x1F)

Integer arithmetic operates on 64-bit signed integers stored in general-purpose registers.

Opcode	Mnemonic	Format	Semantics
0x10	`ADD_I64`	A	`R[dst] = R[src1] + R[src2]`
0x11	`SUB_I64`	A	`R[dst] = R[src1] - R[src2]`
0x12	`MUL_I64`	A	`R[dst] = R[src1] * R[src2]`
0x13	`DIV_I64`	A	`R[dst] = R[src1] / R[src2]`
0x14	`MOD_I64`	A	`R[dst] = R[src1] % R[src2]`
0x15	`NEG_I64`	E	`R[dst] = -R[src]`
0x16	`ABS_I64`	E	`R[dst] = abs(R[src])`
0x17	`ADD_I64_IMM`	B	`R[dst] = R[src] + sign_extend(imm16)`
0x18	`SUB_I64_IMM`	B	`R[dst] = R[src] - sign_extend(imm16)`
0x19	`MUL_I64_IMM`	B	`R[dst] = R[src] * sign_extend(imm16)`
0x1A	`INC_I64`	E	`R[dst] = R[src] + 1`
0x1B	`DEC_I64`	E	`R[dst] = R[src] - 1`

Overflow Semantics: Integer overflow wraps according to two's complement arithmetic. Division by zero raises an error.

A.5 Floating-Point Arithmetic (0x20-0x2F)

Floating-point operations follow IEEE 754 double precision semantics.

Opcode	Mnemonic	Format	Semantics
0x20	`ADD_F64`	A	`F[dst] = F[src1] + F[src2]`
0x21	`SUB_F64`	A	`F[dst] = F[src1] - F[src2]`
0x22	`MUL_F64`	A	`F[dst] = F[src1] * F[src2]`
0x23	`DIV_F64`	A	`F[dst] = F[src1] / F[src2]`
0x24	`NEG_F64`	E	`F[dst] = -F[src]`
0x25	`ABS_F64`	E	`F[dst] = abs(F[src])`
0x26	`SQRT_F64`	E	`F[dst] = sqrt(F[src])`
0x27	`FLOOR_F64`	E	`F[dst] = floor(F[src])`
0x28	`CEIL_F64`	E	`F[dst] = ceil(F[src])`
0x29	`ROUND_F64`	E	`F[dst] = round(F[src])`
0x2A	`POW_F64`	A	`F[dst] = pow(F[src1], F[src2])`
0x2B	`LOG_F64`	E	`F[dst] = log(F[src])`
0x2C	`LOG10_F64`	E	`F[dst] = log10(F[src])`
0x2D	`EXP_F64`	E	`F[dst] = exp(F[src])`
0x2E	`MOD_F64`	A	`F[dst] = fmod(F[src1], F[src2])`

A.6 Bitwise Operations (0x30-0x3F)

Bitwise operations manipulate 64-bit integers at the bit level.

Opcode	Mnemonic	Format	Semantics
0x30	`AND_I64`	A	`R[dst] = R[src1] & R[src2]`
0x31	`OR_I64`	A	`R[dst] = R[src1] \| R[src2]`
0x32	`XOR_I64`	A	`R[dst] = R[src1] ^ R[src2]`
0x33	`NOT_I64`	E	`R[dst] = ~R[src]`
0x34	`SHL_I64`	A	`R[dst] = R[src1] << R[src2]`
0x35	`SHR_I64`	A	`R[dst] = R[src1] >> R[src2]` (logical)
0x36	`SAR_I64`	A	`R[dst] = R[src1] >> R[src2]` (arithmetic)
0x37	`SHL_I64_IMM`	B	`R[dst] = R[src] << imm16`
0x38	`SHR_I64_IMM`	B	`R[dst] = R[src] >> imm16` (logical)
0x39	`SAR_I64_IMM`	B	`R[dst] = R[src] >> imm16` (arithmetic)
0x3A	`AND_I64_IMM`	B	`R[dst] = R[src] & imm16`
0x3B	`OR_I64_IMM`	B	`R[dst] = R[src] \| imm16`

A.7 Comparison Operations (0x40-0x5F)

Comparison operations produce boolean results. The CVM supports type-specialized comparisons for optimal performance.

A.7.1 Integer Comparisons (0x40-0x4F)

Opcode	Mnemonic	Format	Semantics
0x40	`CMP_EQ_I64`	A	`R[dst] = (R[src1] == R[src2])`
0x41	`CMP_NE_I64`	A	`R[dst] = (R[src1] != R[src2])`
0x42	`CMP_LT_I64`	A	`R[dst] = (R[src1] < R[src2])`
0x43	`CMP_LE_I64`	A	`R[dst] = (R[src1] <= R[src2])`
0x44	`CMP_GT_I64`	A	`R[dst] = (R[src1] > R[src2])`
0x45	`CMP_GE_I64`	A	`R[dst] = (R[src1] >= R[src2])`
0x46	`CMP_EQ_I64_IMM`	B	`R[dst] = (R[src] == imm16)`
0x47	`CMP_NE_I64_IMM`	B	`R[dst] = (R[src] != imm16)`
0x48	`CMP_LT_I64_IMM`	B	`R[dst] = (R[src] < imm16)`
0x49	`CMP_LE_I64_IMM`	B	`R[dst] = (R[src] <= imm16)`
0x4A	`CMP_GT_I64_IMM`	B	`R[dst] = (R[src] > imm16)`
0x4B	`CMP_GE_I64_IMM`	B	`R[dst] = (R[src] >= imm16)`
0x4C	`CMP_LT_POLY`	A	Runtime type dispatch for `<`
0x4D	`CMP_LE_POLY`	A	Runtime type dispatch for `<=`
0x4E	`CMP_GT_POLY`	A	Runtime type dispatch for `>`
0x4F	`CMP_GE_POLY`	A	Runtime type dispatch for `>=`

A.7.2 Float Comparisons (0x50-0x57)

Opcode	Mnemonic	Format	Semantics
0x50	`CMP_EQ_F64`	A	`R[dst] = (F[src1] == F[src2])`
0x51	`CMP_NE_F64`	A	`R[dst] = (F[src1] != F[src2])`
0x52	`CMP_LT_F64`	A	`R[dst] = (F[src1] < F[src2])`
0x53	`CMP_LE_F64`	A	`R[dst] = (F[src1] <= F[src2])`
0x54	`CMP_GT_F64`	A	`R[dst] = (F[src1] > F[src2])`
0x55	`CMP_GE_F64`	A	`R[dst] = (F[src1] >= F[src2])`

A.7.3 String Comparisons (0x58-0x5F)

Opcode	Mnemonic	Format	Semantics
0x58	`CMP_EQ_STR`	A	Lexicographic equality
0x59	`CMP_NE_STR`	A	Lexicographic inequality
0x5A	`CMP_LT_STR`	A	Lexicographic less-than
0x5B	`CMP_LE_STR`	A	Lexicographic less-or-equal
0x5C	`CMP_GT_STR`	A	Lexicographic greater-than
0x5D	`CMP_GE_STR`	A	Lexicographic greater-or-equal
0x5E	`CMP_EQ_POLY`	A	Runtime type dispatch for `==`
0x5F	`CMP_NE_POLY`	A	Runtime type dispatch for `!=`

A.8 Logical and Debug Operations (0x60-0x6F)

A.8.1 Logical Operations (0x60-0x65)

Opcode	Mnemonic	Format	Semantics
0x60	`AND`	A	`R[dst] = R[src1] && R[src2]`
0x61	`OR`	A	`R[dst] = R[src1] \|\| R[src2]`
0x62	`NOT`	E	`R[dst] = !R[src]`
0x63	`AND_SC`	C	Short-circuit: if `!R[cond]`, skip
0x64	`OR_SC`	C	Short-circuit: if `R[cond]`, skip
0x65	`XOR`	A	`R[dst] = R[src1] XOR R[src2]`

A.8.2 Debug Operations (0x66-0x6B)

Opcode	Mnemonic	Format	Semantics
0x66	`DBG_PRINT`	E	Print `R[src]` to debug log
0x67	`DBG_BREAK`	F	Debugger breakpoint
0x68	`DBG_TRACE`	B	Trace with label `imm16`
0x69	`DBG_DUMP`	F	Dump VM state
0x6A	`DBG_ASSERT`	E	Assert `R[src]` is truthy
0x6B	`DBG_PROFILE`	B	Profile section marker

A.9 Control Flow Operations (0x70-0x7F)

Control flow operations manage the program counter and function calls.

Opcode	Mnemonic	Format	Semantics
0x70	`JMP`	C	`PC += offset16` (unconditional)
0x71	`JMP_TRUE`	C	`if R[cond]: PC += offset16`
0x72	`JMP_FALSE`	C	`if !R[cond]: PC += offset16`
0x73	`JMP_NULL`	C	`if R[cond] is NULL: PC += offset16`
0x74	`JMP_NOT_NULL`	C	`if R[cond] is not NULL: PC += offset16`
0x75	`JMP_ABS`	D	`PC = imm32` (absolute)
0x76	`CALL`	B	Push frame, `PC = target`
0x77	`RET`	F	Return from function (no value)
0x78	`RET_VAL`	E	Return `R[src]`
0x79	`HALT`	F	Stop execution
0x7A	`JMP_ZERO`	C	`if R[cond] == 0: PC += offset16`
0x7B	`JMP_NOT_ZERO`	C	`if R[cond] != 0: PC += offset16`
0x7C	`RET_NEXT`	E	SRF: Add `R[src]` to result accumulator
0x7D	`RET_QUERY`	B	SRF: Execute query `pool[imm16]`, add results

Branch Offset Encoding: The 16-bit signed offset is relative to the instruction following the branch, measured in 4-byte instruction units.

A.10 Type Operations (0x80-0x8F)

Type operations handle runtime type checking, casting, and NULL handling.

Opcode	Mnemonic	Format	Semantics
0x80	`TYPEOF`	E	`R[dst] = typeof(R[src])` as type ID
0x81	`CAST_I64_F64`	E	`F[dst] = (double)R[src]`
0x82	`CAST_F64_I64`	E	`R[dst] = (int64_t)F[src]`
0x83	`CAST_STR_I64`	E	`R[dst] = parse_int64(R[src])`
0x84	`CAST_STR_F64`	E	`F[dst] = parse_double(R[src])`
0x85	`CAST_I64_STR`	E	`R[dst] = to_string(R[src])`
0x86	`CAST_F64_STR`	E	`R[dst] = to_string(F[src])`
0x87	`CAST_BOOL_I64`	E	`R[dst] = R[src] ? 1 : 0`
0x88	`CAST_I64_BOOL`	E	`R[dst] = R[src] != 0`
0x89	`IS_NULL`	E	`R[dst] = (R[src] is NULL)`
0x8A	`IS_NOT_NULL`	E	`R[dst] = (R[src] is not NULL)`
0x8B	`COALESCE`	A	`R[dst] = R[src1] ?? R[src2]`
0x8C	`NULLIF`	A	`R[dst] = (R[src1]==R[src2]) ? NULL : R[src1]`
0x8D	`CAST_BOOL_STR`	E	`R[dst] = R[src] ? "true" : "false"`
0x8E	`CAST_STR_BOOL`	E	`R[dst] = parse_bool(R[src])`
0x8F	`CAST`	B	`R[dst] = cast(R[src], target_type)`

A.11 Field Access Operations (0x90-0x9F)

Field access operations extract and modify document fields.

Opcode	Mnemonic	Format	Semantics
0x90	`GET_FIELD`	B	`R[dst] = doc.field[pool[imm16]]`
0x91	`GET_FIELD_DYN`	A	`R[dst] = doc.field[R[src]]` (dynamic)
0x92	`SET_FIELD`	B	`doc.field[pool[imm16]] = R[src]`
0x93	`HAS_FIELD`	B	`R[dst] = doc.has(pool[imm16])`
0x94	`DEL_FIELD`	B	`doc.remove(pool[imm16])`
0x95	`GET_NESTED`	B	`R[dst] = doc.path(pool[imm16])`
0x96	`SET_NESTED`	B	`doc.path(pool[imm16]) = R[src]`
0x97	`FIELD_COUNT`	E	`R[dst] = doc.field_count()`
0x98	`FIELD_NAMES`	E	`R[dst] = doc.field_names()` as array
0x99	`GET_DOC`	E	`R[dst] = context.input_document`
0x9A	`GET_FIELD_IDX`	B	`R[dst] = doc.field_by_index(imm16)`

A.12 Array Operations (0xA0-0xAF)

Array operations manipulate ordered collections of values.

Opcode	Mnemonic	Format	Semantics
0xA0	`ARR_NEW`	E	`R[dst] = new empty array`
0xA1	`ARR_LEN`	E	`R[dst] = R[src].length`
0xA2	`ARR_GET`	A	`R[dst] = R[src1][R[src2]]`
0xA3	`ARR_GET_IMM`	B	`R[dst] = R[src][imm16]`
0xA4	`ARR_SET`	A	`R[src1][R[src2]] = R[dst]`
0xA5	`ARR_SET_IMM`	B	`R[src][imm16] = R[dst]`
0xA6	`ARR_PUSH`	A	`R[dst].push(R[src])`
0xA7	`ARR_POP`	E	`R[dst] = R[src].pop()`
0xA8	`ARR_SLICE`	A	`R[dst] = R[src1].slice(R[src2], R[flags])`
0xA9	`ARR_CONCAT`	A	`R[dst] = R[src1].concat(R[src2])`
0xAA	`ARR_CONTAINS`	A	`R[dst] = R[src1].contains(R[src2])`
0xAB	`ARR_INDEXOF`	A	`R[dst] = R[src1].indexOf(R[src2])`
0xAC	`ARR_REVERSE`	E	`R[dst] = R[src].reverse()`
0xAD	`ARR_SORT`	E	`R[dst] = R[src].sort()`

A.13 String Operations (0xB0-0xBF)

String operations handle UTF-8 encoded text.

Opcode	Mnemonic	Format	Semantics
0xB0	`STR_LEN`	E	`R[dst] = R[src].length`
0xB1	`STR_CONCAT`	A	`R[dst] = R[src1] + R[src2]`
0xB2	`STR_SUBSTR`	A	`R[dst] = R[src1].substr(R[src2], R[flags])`
0xB3	`STR_UPPER`	E	`R[dst] = R[src].toUpperCase()`
0xB4	`STR_LOWER`	E	`R[dst] = R[src].toLowerCase()`
0xB5	`STR_TRIM`	E	`R[dst] = R[src].trim()`
0xB6	`STR_LIKE`	A	`R[dst] = R[src1] LIKE R[src2]`
0xB7	`STR_ILIKE`	A	`R[dst] = R[src1] ILIKE R[src2]`
0xB8	`STR_REGEX`	A	`R[dst] = R[src1] ~ R[src2]` (regex)
0xB9	`STR_REPLACE`	A	`R[dst] = R[src1].replace(R[src2], R[flags])`
0xBA	`STR_SPLIT`	A	`R[dst] = R[src1].split(R[src2])`
0xBB	`STR_STARTS`	A	`R[dst] = R[src1].startsWith(R[src2])`
0xBC	`STR_ENDS`	A	`R[dst] = R[src1].endsWith(R[src2])`
0xBD	`STR_CONTAINS`	A	`R[dst] = R[src1].contains(R[src2])`
0xBE	`STR_INDEXOF`	A	`R[dst] = R[src1].indexOf(R[src2])`
0xBF	`STR_WILDCARD`	A	`R[dst] = R[src1] matches R[src2]` (wildcard)

A.14 Aggregation Operations (0xC0-0xCF)

Aggregation operations support SQL aggregate functions and GROUP BY processing.

A.14.1 Single Aggregation (0xC0-0xC7)

Opcode	Mnemonic	Format	Semantics
0xC0	`AGG_INIT`	B	`R[dst] = new_agg_state(type=imm16)`
0xC1	`AGG_ACCUM`	A	`R[dst].accumulate(R[src])`
0xC2	`AGG_ACCUM_COND`	A	`if R[src2]: R[dst].accumulate(R[src1])`
0xC3	`AGG_FINAL`	E	`R[dst] = R[src].finalize()`
0xC4	`AGG_MERGE`	A	`R[dst].merge(R[src])`
0xC5	`AGG_RESET`	E	`R[dst].reset()`
0xC6	`AGG_COUNT`	E	`R[dst] = R[src].count()`
0xC7	`AGG_SUM`	E	`R[dst] = R[src].sum()`

A.14.2 Aggregation Tables (0xC8-0xCF)

Opcode	Mnemonic	Format	Semantics
0xC8	`AGG_TBL_NEW`	B	Create aggregation table
0xC9	`AGG_GET_CREATE`	A	Get/create group for key
0xCA	`AGG_ITER_INIT`	E	Initialize group iterator
0xCB	`AGG_ITER_NEXT`	E	Get next (key, states) pair
0xCC	`AGG_TBL_NEW_MULTI`	B	Create multi-function table
0xCD	`AGG_STATE_AT`	B	Get state at index
0xCE	`AGG_ITER_HAS_NEXT`	E	Check if more groups

Aggregation Function Types:

ID	Function	Description
0	`COUNT`	Count non-NULL values
1	`SUM`	Sum of values
2	`AVG`	Average (sum/count)
3	`MIN`	Minimum value
4	`MAX`	Maximum value
5	`COUNT(*)`	Count all rows
6	`STDDEV_POP`	Population standard deviation
7	`STDDEV_SAMP`	Sample standard deviation
8	`VAR_POP`	Population variance
9	`VAR_SAMP`	Sample variance
10	`FIRST`	First non-NULL value
11	`LAST`	Last non-NULL value
12	`STRING_AGG`	Concatenate strings
13	`ARRAY_AGG`	Collect into array

A.15 Query Buffer Operations (0xD0-0xDF)

Query buffer operations support window functions, hash joins, sorting, and set operations.

A.15.1 Window Buffer (0xD0-0xD3)

Opcode	Mnemonic	Format	Semantics
0xD0	`WIN_NEW`	E	Create window buffer
0xD1	`WIN_ADD`	E	Add row to window buffer
0xD2	`WIN_COMPUTE`	B	Compute window functions
0xD3	`WIN_NEXT`	E	Get next row with results

A.15.2 Hash Table (0xD4-0xD7)

Opcode	Mnemonic	Format	Semantics
0xD4	`HT_NEW`	E	Create hash table
0xD5	`HT_INSERT`	A	Insert (key, value)
0xD6	`HT_PROBE`	A	Lookup key, get matches
0xD7	`HT_DESTROY`	E	Destroy hash table

A.15.3 Sort Buffer (0xD8-0xDB)

Opcode	Mnemonic	Format	Semantics
0xD8	`SORT_NEW`	B	Create sort buffer
0xD9	`SORT_ADD`	E	Add row to buffer
0xDA	`SORT_NEXT`	E	Get next sorted row
0xDB	`SORT_DESTROY`	E	Destroy sort buffer

A.15.4 Set Operations (0xDC-0xDF)

Opcode	Mnemonic	Format	Semantics
0xDC	`SET_OP_NEW`	B	Create set operation buffer
0xDD	`SET_OP_ADD`	A	Add document from source
0xDE	`SET_OP_NEXT`	E	Get next result
0xDF	`SET_OP_DESTROY`	E	Destroy buffer

Set Operation Types (encoded in imm16):

0: UNION
1: INTERSECT
2: EXCEPT

A.16 Working Table Operations (0xE0-0xE7)

Working table operations support recursive Common Table Expressions (CTEs).

Opcode	Mnemonic	Format	Semantics
0xE0	`WT_NEW`	B	Create working table
0xE1	`WT_ADD`	E	Add document to table
0xE2	`WT_SWAP`	E	Swap working/result tables
0xE3	`WT_SCAN`	E	Open scan, return first doc
0xE4	`WT_EMPTY`	E	Check if table is empty
0xE5	`WT_DESTROY`	E	Destroy working table
0xE6	`SORT_NEW_VALUES`	B	Create value-based sort buffer
0xE7	`SORT_ADD_VALUES`	A	Add row with computed keys

A.17 Function Call Operations (0xE8-0xEF)

Function call operations invoke built-in, user-defined, and external functions.

Opcode	Mnemonic	Format	Semantics
0xE8	`CALL_BUILTIN`	B	`R[dst] = builtin[imm16](args)`
0xE9	`CALL_SCALAR`	B	`R[dst] = scalar_func(args)`
0xEA	`CALL_UDF`	B	`R[dst] = udf[imm16](args)`
0xEB	`PUSH_ARG`	E	Push `R[src]` to argument stack
0xEC	`POP_ARG`	E	`R[dst] = pop from argument stack`
0xED	`CLEAR_ARGS`	F	Clear argument stack
0xEE	`GET_ARG_COUNT`	E	`R[dst] = argument stack size`
0xEF	`GET_ARG`	B	`R[dst] = argument_stack[imm16]`

A.18 Cursor Operations (0xF0-0xF7)

Cursor operations manage iteration over tables and CTEs.

Opcode	Mnemonic	Format	Semantics
0xF0	`CURSOR_OPEN`	B	Open cursor, return first doc
0xF1	`CURSOR_NEXT`	E	Return current doc, advance
0xF2	`CURSOR_CLOSE`	E	Close cursor, release resources
0xF3	`CURSOR_VALID`	E	`R[dst] = cursor.is_valid()`
0xF4	`CURSOR_RESET`	E	Reset cursor to beginning
0xF5	`EMIT_ROW`	E	Emit row to output callback
0xF6	`YIELD`	F	Suspend for streaming results
0xF7	`CURSOR_TAKE`	E	Move document with ownership

Cursor Open Flags (imm16 encoding):

Bit 15: is_cte flag (1=CTE, 0=collection)
Bits 0-14: constant pool index for name

A.19 Subquery and External Operations (0xF8-0xFC)

Opcode	Mnemonic	Format	Semantics
0xF8	`CALL_SUBQUERY`	B	Execute subquery `pool[imm16]`
0xF9	`CALL_EXTERNAL`	B	Call external function
0xFA	`TABLEFUNC_OPEN`	B	Open table function iterator
0xFB	`TABLEFUNC_NEXT`	E	Get next row from table func
0xFC	`TABLEFUNC_CLOSE`	E	Close table function iterator

External Function IDs:

ID	Function	Description
0x0000	`kScriptEval`	Evaluate Lua/Python script
0x0100	`kFTSMatch`	Full-text search match (`@@` operator)
0x0101	`kFTSScore`	Full-text search relevance score

The kFTSMatch external function implements the @@ (text search match) operator in WHERE clauses. When the planner encounters a column @@ to_tsquery('...') predicate, it lowers the expression to a CALL_EXTERNAL instruction with function ID 0x0100. The function accepts alternating field/query pairs on the argument stack and returns a boolean indicating whether the document matches the full-text search query. This enables CVM-compiled queries to evaluate FTS predicates inline without falling back to the Volcano executor.

A.20 Error and Extension (0xFD-0xFF)

Opcode	Mnemonic	Format	Semantics
0xFD	`ERROR`	B	Raise error from `pool[imm16]`
0xFE	`EXTENDED`	-	Extended opcode prefix
0xFF	`UNDEFINED`	-	Invalid opcode (trap)

A.21 Extended Opcodes (0xFE prefix)

Extended opcodes provide 256 additional instructions accessed via the 0xFE prefix.

A.21.1 Iteration Operations (0x01-0x0A)

ExtOp	Mnemonic	Format	Semantics
0x01	`ITER_ARR_BEGIN`	E	Create array iterator
0x02	`ITER_ARR_NEXT`	E	Get next element or branch
0x03	`ITER_ARR_END`	E	Close array iterator
0x04	`ITER_OBJ_BEGIN`	E	Create object key iterator
0x05	`ITER_OBJ_NEXT_KEY`	E	Get next key or branch
0x06	`ITER_OBJ_NEXT_VAL`	E	Get value for current key
0x07	`ITER_OBJ_END`	E	Close object iterator
0x08	`ITER_RANGE_BEGIN`	A	Create range iterator
0x09	`ITER_RANGE_NEXT`	E	Get next value or branch
0x0A	`ITER_RANGE_END`	E	Close range iterator

A.21.2 Document Construction (0x0B-0x12)

ExtOp	Mnemonic	Format	Semantics
0x0B	`DOC_NEW`	E	Create empty document
0x0C	`DOC_FROM_JSON`	E	Parse JSON to document
0x0D	`DOC_TO_JSON`	E	Serialize document to JSON
0x0E	`DOC_CLONE`	E	Deep clone document
0x0F	`DOC_MERGE`	A	Merge two documents
0x10	`DOC_PATCH`	A	Apply JSON Patch
0x11	`DOC_KEYS`	E	Get keys as array
0x12	`DOC_VALUES`	E	Get values as array

A.21.3 Composite Row Operations (0x13-0x1C)

Composite row operations enable zero-copy JOIN processing.

ExtOp	Mnemonic	Format	Semantics
0x13	`COMPOSITE_NEW`	H	Create empty CompositeRow
0x14	`COMPOSITE_ADD`	H	Add document slot
0x15	`COMPOSITE_GET`	H	Get field by qualified name
0x16	`COMPOSITE_GET_SLOT`	H	Get field by slot index
0x17	`COMPOSITE_MAT`	H	Materialize to Document
0x18	`COMPOSITE_EMIT`	H	Emit composite row
0x19	`COMPOSITE_CLEAR`	H	Clear all slots
0x1A	`COMPOSITE_EMIT_MAPPED`	H	Emit with column mapping
0x1B	`WT_SCAN_RESET`	H	Reset working table scan
0x1C	`COMPOSITE_MAT_ALL_QUAL`	H	Materialize with qualified names

A.21.3a Outer Context Operations (0x85)

Outer context operations support correlated subquery execution within the CVM. When a subquery references columns from an outer query, these opcodes resolve the outer column values without falling back to the Volcano executor.

ExtOp	Mnemonic	Format	Semantics
0x85	`GET_OUTER_FIELD`	H	`R[dst] = outer_context.get_field(pool[pool_idx])`

GET_OUTER_FIELD reads a field from the outer query's current row. During plan lowering, column references whose table alias is not in the local alias set are emitted as GET_OUTER_FIELD instead of GET_FIELD. The interpreter resolves the field from the outer row context, which may be either a plain Document or a CompositeRow (for outer queries involving joins).

Example: Correlated subquery

SELECT d.name,
       (SELECT COUNT(*) FROM employees e WHERE e.dept_id = d.id)
FROM departments d

The inner subquery's reference to d.id compiles to:

GET_OUTER_FIELD  R3, pool["d.id"]   ; R3 = outer_row.d.id
GET_FIELD        R4, pool["dept_id"] ; R4 = current_row.dept_id
CMP_EQ_POLY      R5, R4, R3         ; R5 = (dept_id == d.id)

A.21.4 Vectorized/Batch Operations (0x20-0x67)

Batch operations enable SIMD-accelerated columnar processing.

Batch Scan (0x20-0x27):

ExtOp	Mnemonic	Semantics
0x20	`BATCH_SCAN_OPEN`	Open columnar batch scan
0x21	`BATCH_SCAN_NEXT`	Get next ColumnBatch
0x22	`BATCH_SCAN_CLOSE`	Close batch scan
0x23	`BATCH_EMIT`	Emit column batch
0x24	`BATCH_CONST_I64`	Create constant int64 batch
0x25	`BATCH_CONST_F64`	Create constant float64 batch
0x26	`BATCH_EXTRACT_COL`	Extract column by index
0x27	`BATCH_EXTRACT_COL_NAME`	Extract column by name

Batch Arithmetic (0x28-0x37):

ExtOp	Mnemonic	Semantics
0x28	`BATCH_ADD_I64`	Vectorized int64 addition
0x29	`BATCH_SUB_I64`	Vectorized int64 subtraction
0x2A	`BATCH_MUL_I64`	Vectorized int64 multiply
0x2B	`BATCH_DIV_I64`	Vectorized int64 division
0x30-0x37	`BATCH_*_F64`	Vectorized float64 ops

Batch Comparison (0x38-0x47):

ExtOp	Mnemonic	Semantics
0x38-0x3D	`BATCH_CMP_*_I64`	Vectorized int64 comparisons
0x40-0x45	`BATCH_CMP_*_F64`	Vectorized float64 comparisons

Batch Logical (0x48-0x4F):

ExtOp	Mnemonic	Semantics
0x48	`BATCH_AND`	Selection vector intersection
0x49	`BATCH_OR`	Selection vector union
0x4A	`BATCH_NOT`	Selection vector complement
0x4B	`BATCH_IS_NULL`	Select null rows

Parallel Operations (0x5C-0x67):

ExtOp	Mnemonic	Semantics
0x5C	`PARALLEL_SCAN_OPEN`	Open parallel scan
0x5D	`PARALLEL_SCAN_NEXT`	Get next filtered batch
0x5E	`PARALLEL_SCAN_CLOSE`	Close parallel scan
0x60	`PARALLEL_PARTITION`	Partition batch
0x61	`PARALLEL_MERGE`	Merge batch results
0x62	`PARALLEL_BARRIER`	Wait for workers

A.21.5 SPI Cursor Operations (0x68-0x6F)

SPI operations support PL/pgSQL FOR-query loops.

ExtOp	Mnemonic	Format	Semantics
0x68	`SPI_CURSOR_OPEN`	G	Open SPI cursor for query
0x69	`SPI_CURSOR_FETCH`	E	Fetch next row
0x6A	`SPI_CURSOR_CLOSE`	E	Close SPI cursor
0x6B	`SPI_CURSOR_VALID`	E	Check if more rows
0x6C	`SPI_EXECUTE`	A	Execute dynamic SQL
0x6D	`SPI_EXECUTE_INTO`	A	Execute into variable
0x6E	`SPI_PERFORM`	E	Execute, discard result
0x6F	`SPI_CALL`	E	Call procedure

A.21.6 Exception Handling (0x70-0x76)

ExtOp	Mnemonic	Semantics
0x70	`EXCEPTION_PUSH`	Push exception handler
0x71	`EXCEPTION_POP`	Pop exception handler
0x72	`RAISE_EXCEPTION`	Raise exception
0x73	`RERAISE`	Re-raise current exception
0x74	`GET_DIAGNOSTICS`	Get diagnostic item
0x75	`SET_DIAGNOSTICS`	Set diagnostic item
0x76	`ASSERT`	Assert condition

A.22 Runtime Type System

The CVM uses a dynamic type system with the following type enumeration:

Type ID	Type Name	Size	Description
0x00	`Null`	0	SQL NULL / JSON null
0x01	`Bool`	1	Boolean true/false
0x02	`Int64`	8	64-bit signed integer
0x03	`Double`	8	IEEE 754 double
0x04	`String`	var	UTF-8 string (ptr + len)
0x05	`Array`	var	Ordered collection
0x06	`Document`	var	Key-value object
0x07	`Binary`	var	Raw byte array (BYTEA)
0x08	`Timestamp`	8	Microseconds since epoch
0x09	`TimestampTZ`	8	UTC microseconds
0x0A	`Date`	4	Days since epoch
0x0B	`Time`	8	Microseconds since midnight
0x0C	`Interval`	16	months + days + microseconds
0x0D	`Decimal`	16	128-bit arbitrary precision
0x0E	`UUID`	16	128-bit UUID
0x0F	`CompositeRow`	var	Zero-copy JOIN result
0x10	`AggState`	var	Aggregation state (internal)

A.22.1 VMValue Structure

The VMValue structure is a 24-byte discriminated union:

struct VMValue {
  CVMType type;     // 1 byte
  uint8_t flags;    // 1 byte (kFlagOwned=0x01, kFlagConst=0x02)
  uint16_t reserved;
  uint32_t padding;
  union {           // 16 bytes
    bool bool_val;
    int64_t int64_val;
    double double_val;
    StringRef string_val;
    ArrayRef array_val;
    Document* doc_val;
    CompositeRow* composite_row_val;
    TimestampVal timestamp_val;
    // ... other types
  };
};

A.23 Builtin Function Reference

The CVM provides an extensive library of built-in functions organized by category.

A.23.1 Mathematical Functions (0x0000-0x00FF)

ID	Function	Signature	Description
0x0000	`abs`	`(x) -> num`	Absolute value
0x0001	`floor`	`(x) -> num`	Floor (round down)
0x0002	`ceil`	`(x) -> num`	Ceiling (round up)
0x0003	`round`	`(x) -> num`	Round to nearest
0x0004	`trunc`	`(x) -> num`	Truncate toward zero
0x0005	`sqrt`	`(x) -> num`	Square root
0x0006	`pow`	`(x, y) -> num`	Power function
0x0007	`exp`	`(x) -> num`	Exponential (e^x)
0x0008	`log`	`(x) -> num`	Natural logarithm
0x0009	`log10`	`(x) -> num`	Base-10 logarithm
0x000A	`log2`	`(x) -> num`	Base-2 logarithm
0x000B-0x0010	`sin/cos/tan/asin/acos/atan`	`(x) -> num`	Trigonometric
0x0011	`atan2`	`(y, x) -> num`	Two-argument arctangent
0x0012	`sign`	`(x) -> int`	Sign (-1, 0, 1)
0x0014	`random`	`() -> double`	Random [0, 1)
0x0016	`pi`	`() -> double`	Pi constant

A.23.2 String Functions (0x0100-0x01FF)

ID	Function	Signature	Description
0x0100	`length`	`(s) -> int`	String length
0x0101	`upper`	`(s) -> str`	Uppercase
0x0102	`lower`	`(s) -> str`	Lowercase
0x0103	`trim`	`(s) -> str`	Trim whitespace
0x0106	`substring`	`(s, start, len) -> str`	Extract substring
0x0107	`concat`	`(s1, s2, ...) -> str`	Concatenation
0x0108	`replace`	`(s, from, to) -> str`	String replacement
0x010D	`position`	`(substr, s) -> int`	Find substring (1-based)
0x010E	`starts_with`	`(s, prefix) -> bool`	Prefix check
0x010F	`ends_with`	`(s, suffix) -> bool`	Suffix check
0x0115	`regex_match`	`(s, pattern) -> bool`	Regex match
0x0119	`md5`	`(s) -> str`	MD5 hash
0x011A	`sha256`	`(s) -> str`	SHA-256 hash

A.23.3 JSON/Document Functions (0x0500-0x05FF)

ID	Function	Signature	Description
0x0500	`json_extract`	`(doc, path) -> val`	Extract at path
0x0504	`json_keys`	`(doc) -> array`	Object keys
0x0505	`json_values`	`(doc) -> array`	Object values
0x0506	`json_contains`	`(doc, val) -> bool`	Containment check
0x0508	`json_parse`	`(s) -> doc`	Parse JSON string
0x0509	`json_stringify`	`(doc) -> str`	Serialize to JSON
0x0511	`jsonb_path_exists`	`(doc, path) -> bool`	JSONPath exists
0x0514	`json_build_array`	`(...) -> array`	Construct array
0x0515	`json_build_object`	`(...) -> obj`	Construct object

A.23.4 Date/Time Functions (0x0600-0x06FF)

ID	Function	Signature	Description
0x0600	`now`	`() -> timestamp`	Current timestamp
0x0601	`current_date`	`() -> date`	Current date
0x0603	`date_part`	`(part, ts) -> num`	Extract part
0x0604	`date_trunc`	`(part, ts) -> ts`	Truncate to unit
0x0605	`date_add`	`(ts, interval) -> ts`	Add interval
0x0607	`date_diff`	`(part, t1, t2) -> int`	Difference
0x0608	`format_date`	`(ts, fmt) -> str`	Format timestamp

A.24 Table Function Reference

Table functions return multiple rows and are used in FROM clauses.

ID	Function	Args	Description
0x0001	`generate_series`	`(start, stop)`	Integer series
0x0002	`generate_series`	`(start, stop, step)`	With step
0x0003	`generate_series`	`(start, stop, interval)`	Timestamp series
0x0010	`unnest`	`(array)`	Expand array to rows
0x0020	`json_each`	`(json)`	Key-value pairs
0x0030	`json_array_elements`	`(json)`	Array to rows
0x0040	`regexp_matches`	`(text, pattern)`	Regex captures
0x0041	`regexp_split_to_table`	`(text, pattern)`	Split by regex
0x0050	`string_to_table`	`(text, delim)`	Split by delimiter

A.25 Execution Examples

A.25.1 Simple Arithmetic Query

SELECT a + b * 2 FROM t

Compiled Bytecode:

00: CURSOR_OPEN     R0, 0, 42     ; Open cursor for table t
04: JMP_NULL        R0, 28        ; Jump to end if exhausted
08: GET_FIELD       R1, 43        ; R1 = doc.a
0C: GET_FIELD       R2, 44        ; R2 = doc.b
10: MOVE_IMM        R3, 2         ; R3 = 2
14: MUL_I64         R4, R2, R3    ; R4 = b * 2
18: ADD_I64         R5, R1, R4    ; R5 = a + (b * 2)
1C: EMIT_ROW        R5            ; Output result
20: CURSOR_NEXT     R0, 0         ; Advance cursor
24: JMP             -20           ; Loop back
28: CURSOR_CLOSE    0             ; Close cursor
2C: HALT                          ; Done

A.25.2 Aggregation Query

SELECT SUM(amount) FROM orders GROUP BY customer_id

Compiled Bytecode:

00: AGG_TBL_NEW     R0, 1         ; Create agg table (SUM)
04: CURSOR_OPEN     R1, 0, 50     ; Open orders cursor
08: JMP_NULL        R1, 40        ; Jump if exhausted
0C: GET_FIELD       R2, 51        ; R2 = customer_id
10: GET_FIELD       R3, 52        ; R3 = amount
14: AGG_GET_CREATE  R4, R0, R2    ; Get/create group for key
18: AGG_STATE_AT    R5, R4, 0     ; Get SUM state
1C: AGG_ACCUM       R5, R3        ; Accumulate amount
20: CURSOR_NEXT     R1, 0         ; Advance
24: JMP             -28           ; Loop
28: AGG_ITER_INIT   R0            ; Init group iterator
2C: AGG_ITER_NEXT   R6, R0        ; Get next group
30: JMP_NULL        R6, 48        ; Done if null
34: AGG_FINAL       R7, R6        ; Finalize SUM
38: EMIT_ROW        R7            ; Output
3C: JMP             -16           ; Next group
40: CURSOR_CLOSE    0
44: HALT

A.26 Performance Characteristics

A.26.1 Instruction Timing

Category	Typical Cycles	Notes
Data movement	1-2	Register-to-register
Integer arithmetic	1	Single-cycle ALU
Float arithmetic	3-5	FPU latency
Comparison	1	Produces boolean
Branch (taken)	3-5	Pipeline flush
Branch (not taken)	0	Predicted fall-through
Function call	10-20	Stack frame setup
Hash table probe	5-15	Cache-dependent
Field access	10-50	Document traversal

A.26.2 Opcode Frequency Analysis

Typical query workloads show the following opcode distribution:

Category	Frequency	Optimization Target
Field access	25-35%	Column pruning, caching
Comparisons	15-25%	Predicate pushdown
Control flow	15-20%	Branch prediction
Arithmetic	10-15%	SIMD vectorization
Data movement	10-15%	Register allocation
Aggregation	5-10%	Parallel execution

Insight

The CVM achieves 2-5 million instructions per second for typical OLTP workloads through computed-goto dispatch and careful cache optimization.

Vectorized batch operations (0x20-0x5F extended) can process 1000+ rows per opcode execution, achieving 10-50x throughput for analytical queries.

Copy-and-patch JIT compilation elevates hot bytecode sequences to native code with 2-5x additional speedup at ~100us compilation latency.

A.27 Summary

The CVM instruction set provides a comprehensive foundation for executing SQL queries:

256 core opcodes organized into functional categories
256 extended opcodes via the 0xFE prefix for advanced operations
Fixed 32-bit encoding for cache efficiency and fast decode
Type-specialized operations for integer, float, and string processing
Query-specific operations for cursors, aggregation, joins, and window functions
Vectorized batch operations for SIMD-accelerated analytical processing
PL/pgSQL support via SPI and exception handling opcodes
Correlated subquery support via GET_OUTER_FIELD for outer row field access
Full-text search integration via CALL_EXTERNAL with kFTSMatch/kFTSScore for inline @@ operator evaluation

The instruction set balances:

Decode efficiency through fixed-width encoding
Expressiveness through comprehensive operation coverage
Performance through type specialization and batch operations
Extensibility through the extended opcode mechanism