Chapter 17: Zero-Copy JOIN Implementation

17.1 Introduction

JOIN operations are fundamental to relational database systems, combining rows from multiple tables based on related columns. However, naive JOIN implementations suffer from significant memory overhead—materializing intermediate results for every matching pair of rows consumes memory proportional to the join cardinality, which can be enormous for large tables.

Cognica implements a zero-copy JOIN strategy that avoids materializing intermediate results until absolutely necessary. Instead of copying row data between operators, the system uses lightweight wrapper objects that reference original documents through pointers. This approach reduces memory consumption by orders of magnitude while maintaining the familiar SQL semantics.

This chapter explores Cognica's zero-copy JOIN implementation, covering the composite row abstraction, JOIN algorithms, bytecode generation, and memory management strategies for handling datasets larger than available memory.

17.1.1 The Cost of Materialization

Consider a JOIN between two tables with NN and MM rows respectively, producing KK result rows. Traditional implementations materialize each joined row:

Memory Cost Analysis:

Mtraditional=K(Sleft+Sright)M_{\text{traditional}} = K \cdot (S_{\text{left}} + S_{\text{right}})

where SleftS_{\text{left}} and SrightS_{\text{right}} are the average row sizes. For a JOIN with high selectivity (many matches), this can quickly exceed available memory.

Example: Joining a 1M-row users table with a 10M-row orders table where each user has 10 orders on average:

  • Result cardinality: K=10,000,000K = 10,000,000
  • Average row size: 200 bytes per side
  • Memory required: 107×400=410^7 \times 400 = 4 GB

This memory pressure forces expensive disk spills and degrades performance significantly.

17.1.2 The Zero-Copy Insight

The key insight is that most JOIN result columns come from source tables unchanged—we simply select which columns to include in the output. Rather than copying data, we can:

  1. Reference the original documents through pointers
  2. Defer materialization until output is required
  3. Reuse composite row structures across iterations
Mzero-copy=O(1) per iterationM_{\text{zero-copy}} = O(1) \text{ per iteration}

The composite row stores only pointers (8 bytes each) plus metadata, reducing memory from gigabytes to kilobytes for the working set.

Loading diagram...

17.2 The CompositeRow Abstraction

17.2.1 Design Overview

The CompositeRow class represents a joined row as a collection of references to source documents:

class CompositeRow {
  std::vector<AliasedDocument> slots_;

public:
  void add_slot(std::string_view alias, const Document* doc);
  void clear();

  auto get_field(std::string_view qualified_name) const -> Value;
  auto get_field_by_slot(size_t slot, std::string_view field) const -> Value;

  auto materialize(const ColumnMappingList& mappings) const -> Document;
  auto materialize_all() const -> Document;
  auto materialize_all_qualified() const -> Document;
};

AliasedDocument: A lightweight wrapper pairing a table alias with a document pointer:

struct AliasedDocument {
  std::string_view alias;   // Table alias (e.g., "u", "o")
  const Document* doc;      // Pointer to source document

  // Total size: 24 bytes (16 for string_view + 8 for pointer)
};

17.2.2 Memory Layout

For a two-table JOIN, the CompositeRow contains just two slots:

CompositeRow for: SELECT u.id, u.name, o.amount FROM users u JOIN orders o ON u.id = o.user_id

Loading diagram...

Total overhead: ~64 bytes vs. potentially KB for materialized row

17.2.3 Field Access

Field access in CompositeRow supports two modes:

Qualified Name Lookup: Search through slots for matching alias:

auto CompositeRow::get_field(std::string_view qualified_name) const -> Value {
  // Parse "alias.field" format
  auto dot_pos = qualified_name.find('.');
  auto alias = qualified_name.substr(0, dot_pos);
  auto field = qualified_name.substr(dot_pos + 1);

  // Search slots for matching alias
  for (const auto& slot : slots_) {
    if (slot.alias == alias) {
      return slot.doc->get(field);
    }
  }
  return Value::null();
}

Complexity: O(S)O(S) where SS is the number of slots (typically 2-5).

Direct Slot Access: When the slot index is known at compile time:

auto CompositeRow::get_field_by_slot(size_t slot_idx,
                                      std::string_view field) const -> Value {
  return slots_[slot_idx].doc->get(field);
}

Complexity: O(1)O(1)—preferred when the planner can resolve slot indices.

17.2.4 Deferred Materialization

Materialization creates a concrete Document from the composite references:

auto CompositeRow::materialize(const ColumnMappingList& mappings) const
    -> Document {
  Document result;

  for (const auto& mapping : mappings.mappings) {
    // Get value from source slot
    const auto& slot = slots_[mapping.source_slot];
    Value value = slot.doc->get(mapping.source_field);

    // Add to result with mapped name
    result.set(mapping.output_name, std::move(value));
  }

  return result;
}

Materialization is triggered only when:

  • Output must be emitted to the client
  • Data must persist beyond cursor lifetime
  • Working tables require owned copies
  • Column reordering or renaming is needed

17.3 Column Mapping

17.3.1 The ColumnMappingList Structure

Column mappings define how source fields map to output columns:

struct ColumnMapping {
  uint8_t source_slot;              // Index into CompositeRow slots
  std::string source_field;         // Field name in source document
  std::string output_name;          // Output column name
  std::optional<uint16_t> field_index;  // Optimization: cached field index
};

struct ColumnMappingList {
  std::vector<ColumnMapping> mappings;
};

17.3.2 Mapping Example

For the query:

SELECT u.id AS user_id, u.name, o.amount, o.created_at
FROM users u
JOIN orders o ON u.id = o.user_id

The ColumnMappingList contains:

Indexsource_slotsource_fieldoutput_name
00 (users)iduser_id
10 (users)namename
21 (orders)amountamount
31 (orders)created_atcreated_at

17.3.3 Handling Name Collisions

When both tables have columns with the same name, qualified names resolve ambiguity:

auto CompositeRow::materialize_all_qualified() const -> Document {
  Document result;

  for (const auto& slot : slots_) {
    for (const auto& [field, value] : *slot.doc) {
      // Prefix field with alias: "u.id", "o.id"
      std::string qualified_name =
          std::string(slot.alias) + "." + field;
      result.set(qualified_name, value);
    }
  }

  return result;
}

17.4 JOIN Algorithms

17.4.1 Hash Join

Hash join is the preferred algorithm for equi-joins on large tables. It operates in two phases:

Build Phase: Scan the build side (typically smaller table), hash the join key, and store document pointers in a hash table.

Probe Phase: Scan the probe side (typically larger table), compute the hash, look up matching documents, and emit composite rows.

Loading diagram...

Hash Table Structure:

struct HashTable {
  std::unordered_multimap<int64_t, Document*> entries;
  bool active = false;
};

The multimap supports multiple documents with the same key hash, handling duplicate join keys correctly.

Build Phase Implementation:

void hash_table_insert(VMContext& ctx, uint8_t slot,
                       const VMValue& key, Document* doc) {
  int64_t hash = compute_hash(key);
  ctx.hash_tables_[slot].entries.insert({hash, doc});
}

Probe Phase Implementation:

auto hash_table_probe(VMContext& ctx, uint8_t slot,
                      const VMValue& key) -> Document* {
  int64_t hash = compute_hash(key);
  auto it = ctx.hash_tables_[slot].entries.find(hash);
  if (it != ctx.hash_tables_[slot].entries.end()) {
    return it->second;  // Return pointer to matching document
  }
  return nullptr;
}

Complexity Analysis:

PhaseTimeSpace
BuildO(N)O(N)O(N)O(N) pointers
ProbeO(M)O(M) averageO(1)O(1) working set
TotalO(N+M)O(N + M)O(N)O(N)

where NN is the build side cardinality and MM is the probe side cardinality.

17.4.2 Nested Loop Join

Nested loop join is used for small tables or when index-based lookups are available:

// Outer loop
for (const auto& outer_doc : outer_cursor) {
  // Inner loop - reset for each outer row
  inner_cursor.reset();

  for (const auto& inner_doc : inner_cursor) {
    if (evaluate_join_condition(outer_doc, inner_doc)) {
      CompositeRow composite;
      composite.add_slot(outer_alias, &outer_doc);
      composite.add_slot(inner_alias, &inner_doc);
      emit(composite);
    }
  }
}

Complexity: O(N×M)O(N \times M) where NN is outer cardinality and MM is inner cardinality.

When to Use:

  • Small inner table (fits in cache)
  • Index available on inner table
  • Complex join conditions that cannot use hash lookup
  • Cross joins (no join condition)

17.4.3 Build Side Selection

The optimizer selects the build side based on cardinality estimates:

Build Side=argmin(R,S)\text{Build Side} = \arg\min(|R|, |S|)

The smaller relation becomes the build side because:

  1. Hash table memory is proportional to build side size
  2. Smaller hash table has better cache locality
  3. Probe phase processes more data in streaming fashion
auto select_build_side(const JoinNode& join) -> size_t {
  auto left_card = estimate_cardinality(join.left);
  auto right_card = estimate_cardinality(join.right);

  return (left_card <= right_card) ? 0 : 1;  // 0 = left, 1 = right
}

17.5 CVM JOIN Opcodes

17.5.1 Composite Row Operations

The CVM provides dedicated opcodes for composite row manipulation:

OpcodeCodeDescription
COMPOSITE_NEW0x13Create empty CompositeRow
COMPOSITE_ADD_SLOT0x14Add document reference with alias
COMPOSITE_GET_FIELD0x15Get field by qualified name
COMPOSITE_GET_SLOT0x16Get field by slot index + field
COMPOSITE_MATERIALIZE0x17Create Document with mappings
COMPOSITE_EMIT0x18Emit composite row to output
COMPOSITE_CLEAR0x19Clear slots for reuse
COMPOSITE_EMIT_MAPPED0x1AEmit with column mapping
COMPOSITE_MATERIALIZE_QUALIFIED0x1CMaterialize with alias prefixes

17.5.2 Hash Table Operations

OpcodeCodeDescription
HT_NEW0x10Create new hash table in slot
HT_INSERT0x11Insert key-document pair
HT_PROBE0x12Probe for matching document
HT_DESTROY0x1FDestroy hash table

17.5.3 Instruction Encoding

Composite and hash table operations use Format H (extended format):

Word 1: [0xFE:8][ExtOpcode:8][Dst:8][Src:8]
Word 2: [Operand1:16][Operand2:16]
Total:  64 bits

The 16-bit operand fields support:

  • Up to 65,536 constant pool entries
  • Up to 65,536 column mapping indices
  • Up to 256 slots per composite row

17.5.4 Example: Hash Join Bytecode

For the query:

SELECT u.name, o.amount
FROM users u
JOIN orders o ON u.id = o.user_id

Generated bytecode:

; Phase 1: BUILD (users table)
  HT_NEW          R10                    ; Create hash table in slot 0
  CURSOR_OPEN     R1, users_cursor       ; Open users cursor

build_loop:
  CURSOR_NEXT     R2, R1                 ; R2 = next users document
  JUMP_NULL       R2, build_done         ; Exit if no more rows
  GET_FIELD       R3, R2, "id"           ; R3 = u.id (join key)
  HT_INSERT       R10, R3, R2            ; hash_table[hash(R3)] = R2
  JUMP            build_loop

build_done:
  ; Keep users cursor open (documents referenced by hash table)

; Phase 2: PROBE (orders table)
  CURSOR_OPEN     R4, orders_cursor      ; Open orders cursor
  COMPOSITE_NEW   R5                     ; Create reusable composite row

probe_loop:
  CURSOR_NEXT     R6, R4                 ; R6 = next orders document
  JUMP_NULL       R6, probe_done         ; Exit if no more rows
  GET_FIELD       R7, R6, "user_id"      ; R7 = o.user_id (join key)
  HT_PROBE        R8, R10, R7            ; R8 = matching users doc or null
  JUMP_NULL       R8, probe_loop         ; Skip if no match

  ; Build composite row
  COMPOSITE_CLEAR R5                     ; Clear for reuse
  COMPOSITE_ADD   R5, R8, "u"            ; Add users doc with alias "u"
  COMPOSITE_ADD   R5, R6, "o"            ; Add orders doc with alias "o"

  ; Emit result
  COMPOSITE_EMIT_MAPPED R5, mapping_0    ; Emit with column mappings
  JUMP            probe_loop

probe_done:
  CURSOR_CLOSE    R4                     ; Close orders cursor
  CURSOR_CLOSE    R1                     ; Close users cursor
  HT_DESTROY      R10                    ; Free hash table
  HALT

17.6 Outer JOIN Support

17.6.1 LEFT OUTER JOIN

Left outer joins must emit unmatched left-side rows with NULL right-side values:

// Track whether current outer row matched any inner row
bool matched = false;

for (const auto& inner_doc : inner_cursor) {
  if (evaluate_condition(outer_doc, inner_doc)) {
    emit_composite(outer_doc, inner_doc);
    matched = true;
  }
}

// Emit unmatched outer row with null inner
if (!matched) {
  emit_composite(outer_doc, null_document);
}

Bytecode Pattern:

probe_loop:
  CURSOR_NEXT     R6, R4
  JUMP_NULL       R6, probe_done
  MOV_FALSE       R9                     ; matched = false
  GET_FIELD       R7, R6, "user_id"
  HT_PROBE        R8, R10, R7
  JUMP_NULL       R8, check_unmatched

  ; Match found
  COMPOSITE_CLEAR R5
  COMPOSITE_ADD   R5, R8, "u"
  COMPOSITE_ADD   R5, R6, "o"
  COMPOSITE_EMIT  R5
  MOV_TRUE        R9                     ; matched = true
  JUMP            probe_loop

check_unmatched:
  JUMP_TRUE       R9, probe_loop         ; Skip if matched
  ; Emit with NULL right side
  DOC_NEW         R11                    ; Create empty document
  COMPOSITE_CLEAR R5
  COMPOSITE_ADD   R5, R11, "u"           ; NULL users
  COMPOSITE_ADD   R5, R6, "o"            ; orders doc
  COMPOSITE_EMIT  R5
  JUMP            probe_loop

17.6.2 RIGHT OUTER JOIN

Right outer joins require tracking which build-side rows were matched:

Loading diagram...

17.6.3 FULL OUTER JOIN

Full outer joins combine left and right outer join logic:

  1. Build hash table from right side with match tracking
  2. Probe with left side, mark matched right rows
  3. Emit matches and unmatched left rows
  4. Emit unmatched right rows

17.7 Memory Management

17.7.1 VMContext Resources

The VMContext manages JOIN-related resources:

class VMContext {
  // Hash tables for hash joins (up to 4 concurrent)
  static constexpr size_t kMaxHashTables = 4;
  std::array<HashTable, kMaxHashTables> hash_tables_;

  // Owned composite rows
  std::vector<std::unique_ptr<CompositeRow>> owned_composite_rows_;

  // Working tables for CTEs and subqueries
  std::vector<std::unique_ptr<WorkingTable>> working_tables_;

public:
  auto create_hash_table(uint8_t slot) -> HashTable*;
  void destroy_hash_table(uint8_t slot);

  auto create_composite_row() -> CompositeRow*;
  void reset();  // Clears all owned resources
};

17.7.2 Document Lifetime

Zero-copy JOINs depend on source documents remaining valid throughout execution:

Lifetime Guarantees:

  1. Cursor documents: Valid until cursor advances or closes
  2. Hash table entries: Build cursor remains open during probe phase
  3. Composite row references: Valid as long as source cursors are open

Invalidation Prevention:

  • Build cursor is NOT closed until probe phase completes
  • Composite rows are cleared (not destroyed) between iterations
  • Working tables own document copies when persistence is needed

17.7.3 CompositeRow Reuse

The COMPOSITE_CLEAR operation enables efficient memory reuse:

void CompositeRow::clear() {
  slots_.clear();  // O(1) - vector size reset to 0
  // Capacity retained - no reallocation
}

Memory Pattern:

Without reuse (bad):
  Iteration 1: allocate CompositeRow
  Iteration 2: allocate CompositeRow
  ...
  Iteration N: allocate CompositeRow
  Memory: O(N) allocations

With reuse (good):
  Once: allocate CompositeRow
  Iteration 1: clear + add_slot
  Iteration 2: clear + add_slot
  ...
  Iteration N: clear + add_slot
  Memory: O(1) allocation

17.8 External Hash Join

17.8.1 Memory Overflow Handling

When the build side exceeds available memory, Cognica falls back to external (Grace) hash join:

class ExternalHashJoiner {
  struct Config {
    uint64_t memory_limit = 256 * 1024 * 1024;  // 256MB
    int32_t num_partitions = 64;
    int32_t max_recursion_depth = 4;
    CompressionType compression = CompressionType::ZSTD;
  };

  // In-memory hash table (used when data fits)
  std::unordered_map<std::string, std::vector<Document>> hash_table_;

  // Partition files (used when memory exceeded)
  std::vector<std::unique_ptr<HashPartition>> build_partitions_;
  std::vector<std::unique_ptr<HashPartition>> probe_partitions_;
};

17.8.2 Partitioning Strategy

Grace hash join partitions both relations by hash value:

partition(r)=hash(r.key)modP\text{partition}(r) = \text{hash}(r.\text{key}) \mod P

where PP is the number of partitions.

Loading diagram...

Key Property: Matching rows are guaranteed to be in the same partition number, enabling independent processing of each partition pair.

17.8.3 Recursive Partitioning

If a partition still exceeds memory, recursive partitioning splits it further:

void process_partition(size_t partition_idx) {
  auto& build_part = build_partitions_[partition_idx];
  auto& probe_part = probe_partitions_[partition_idx];

  size_t build_size = build_part->size_bytes();

  if (build_size <= config_.memory_limit) {
    // Process in memory
    process_in_memory(build_part, probe_part);
  } else if (recursion_depth_ < config_.max_recursion_depth) {
    // Recursively partition
    auto sub_partitions = repartition(build_part, probe_part);
    for (size_t i = 0; i < sub_partitions.size(); ++i) {
      ++recursion_depth_;
      process_partition(i);
      --recursion_depth_;
    }
  } else {
    // Fall back to nested loop for this partition
    nested_loop_join(build_part, probe_part);
  }
}

17.8.4 Spill File Management

Partitions are stored in temporary files with optional compression:

class HashPartition {
  std::string temp_file_path_;
  std::unique_ptr<FileWriter> writer_;
  CompressionType compression_;

public:
  void add_document(Document&& doc) {
    auto serialized = serialize(doc, compression_);
    writer_->write(serialized);
  }

  auto read_all() -> std::vector<Document> {
    std::vector<Document> result;
    FileReader reader(temp_file_path_, compression_);
    while (auto doc = reader.read_next()) {
      result.push_back(std::move(*doc));
    }
    return result;
  }
};

Compression Benefits:

  • ZSTD compression typically achieves 3-5x compression ratio
  • Reduces I/O time for large partitions
  • Trades CPU for I/O bandwidth (usually worthwhile)

17.9 Working Tables

17.9.1 Purpose and Use Cases

Working tables provide temporary storage for:

  • Recursive CTEs: Accumulating results across iterations
  • Multi-pass algorithms: Storing intermediate results
  • Subquery materialization: Caching correlated subquery results
class WorkingTable {
  std::vector<Document> rows_;   // Owned document copies
  size_t scan_index_ = 0;        // Current scan position

public:
  void add_row(Document&& doc);  // Move semantics
  void swap_with(WorkingTable& other);

  void open_scan();
  auto scan_next() -> Document*;
  void close_scan();
};

17.9.2 Recursive CTE Execution

Recursive CTEs use working tables to iterate until fixpoint:

WITH RECURSIVE employee_hierarchy AS (
  -- Base case
  SELECT id, name, manager_id, 1 AS level
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive case
  SELECT e.id, e.name, e.manager_id, h.level + 1
  FROM employees e
  JOIN employee_hierarchy h ON e.manager_id = h.id
)
SELECT * FROM employee_hierarchy;

Execution Pattern:

working_table = execute(base_case)
result_table = copy(working_table)

while (!working_table.empty()):
  temp_table = empty
  for row in working_table:
    for match in execute_recursive(row):
      temp_table.add(match)
      result_table.add(match)
  working_table = temp_table

return result_table

17.9.3 Working Table Bytecode

; Initialize
  WORKING_TABLE_NEW    R1                ; Create working table
  WORKING_TABLE_NEW    R2                ; Create result table

; Execute base case, populate both tables
  ; ... base case bytecode ...
  WORKING_TABLE_ADD    R1, R3            ; Add to working
  WORKING_TABLE_ADD    R2, R3            ; Add to result

iteration_loop:
  WORKING_TABLE_SWAP   R1, R2            ; Swap tables
  WORKING_TABLE_CLEAR  R2                ; Clear result for new iteration
  WORKING_TABLE_SCAN_OPEN R1             ; Open scan on working

scan_loop:
  WORKING_TABLE_SCAN_NEXT R4, R1         ; Get next row
  JUMP_NULL            R4, iteration_done

  ; Execute recursive case with R4 as input
  ; ... recursive bytecode ...
  WORKING_TABLE_ADD    R1, R5            ; Add to working (next iteration)
  WORKING_TABLE_ADD    R2, R5            ; Add to result (accumulate)
  JUMP                 scan_loop

iteration_done:
  WORKING_TABLE_SCAN_CLOSE R1
  WORKING_TABLE_EMPTY  R6, R1            ; Check if working empty
  JUMP_FALSE           R6, iteration_loop

  ; R2 now contains all results
  EMIT_WORKING_TABLE   R2
  HALT

17.10 Multi-Way JOIN Optimization

17.10.1 Join Order Enumeration

For multi-way JOINs, the optimizer enumerates possible join orderings:

SELECT *
FROM A
JOIN B ON A.x = B.x
JOIN C ON B.y = C.y
JOIN D ON C.z = D.z

Possible orderings include:

  • ((AB)C)D((A \bowtie B) \bowtie C) \bowtie D
  • (A(BC))D(A \bowtie (B \bowtie C)) \bowtie D
  • (AB)(CD)(A \bowtie B) \bowtie (C \bowtie D)
  • ... and many more

The number of orderings for nn tables is the Catalan number:

Cn=1n+1(2nn)=(2n)!(n+1)!n!C_n = \frac{1}{n+1}\binom{2n}{n} = \frac{(2n)!}{(n+1)!n!}

For 4 tables: C4=14C_4 = 14 orderings.

17.10.2 Cost-Based Selection

The optimizer selects the ordering with minimum estimated cost:

Cost(RS)=R+S+RS\text{Cost}(R \bowtie S) = |R| + |S| + |R \bowtie S|

For hash join specifically:

Costhash(RS)=CbuildR+CprobeS\text{Cost}_{\text{hash}}(R \bowtie S) = C_{\text{build}} \cdot |R| + C_{\text{probe}} \cdot |S|

where Cbuild>CprobeC_{\text{build}} > C_{\text{probe}} due to hash table construction overhead.

17.10.3 Intermediate Result Handling

Multi-way JOINs may require intermediate materialization:

Loading diagram...

The planner decides when to materialize based on:

  • Memory pressure
  • Reuse opportunities
  • Pipeline boundaries

17.11 Performance Characteristics

17.11.1 Memory Efficiency

ApproachMemory per Match10M Matches
Full Materialization400 bytes4 GB
Zero-Copy (CompositeRow)48 bytes480 MB
With Reuse48 bytes total48 bytes

17.11.2 Execution Time

OperationTraditionalZero-Copy
Build compositememcpy (100+ ns)pointer assign (1 ns)
Field accessdirect (1 ns)indirect (3-5 ns)
MaterializationN/Aon-demand (100+ ns)

Trade-off: Zero-copy adds 2-4 ns per field access but saves 100+ ns per row construction. For JOINs outputting many columns, field access overhead is negligible compared to avoided copies.

17.11.3 When Zero-Copy Excels

  • High cardinality JOINs: Millions of matching pairs
  • Wide rows: Many columns per table
  • Selective output: Few columns needed from many available
  • Pipeline processing: Results consumed immediately

17.11.4 When Materialization Is Preferred

  • Repeated access: Same row accessed multiple times
  • Complex expressions: Derived columns requiring multiple source fields
  • Persistence required: Results stored in working tables
  • Very narrow rows: Pointer overhead exceeds data size

17.12 Summary

Zero-copy JOIN implementation is a critical optimization that enables Cognica to process large JOIN results without excessive memory consumption. The key concepts covered in this chapter are:

CompositeRow Abstraction: Lightweight wrapper storing pointers to source documents with table aliases. Field access resolves references on-demand, avoiding data copying until materialization is required.

Deferred Materialization: Documents are created only when output is emitted, working tables require owned copies, or column reordering is needed. This lazy approach minimizes memory allocations in the critical path.

Hash Join Implementation: Build phase inserts document pointers (not copies) into hash table. Probe phase looks up matching pointers and creates composite rows. The build cursor remains open throughout to keep referenced documents valid.

CVM Opcodes: Dedicated instructions for composite row operations (COMPOSITE_NEW, COMPOSITE_ADD, COMPOSITE_CLEAR, COMPOSITE_EMIT) and hash table operations (HT_NEW, HT_INSERT, HT_PROBE) enable efficient bytecode generation.

Memory Reuse: The COMPOSITE_CLEAR operation resets composite rows for reuse, reducing allocations from O(N) to O(1) per JOIN execution.

External Hash Join: Grace hash join with partition spilling handles datasets exceeding available memory. Recursive partitioning and compression minimize I/O overhead.

Working Tables: Provide owned document storage for recursive CTEs, multi-pass algorithms, and subquery materialization where zero-copy semantics cannot be maintained.

The combination of zero-copy references, deferred materialization, and memory reuse enables Cognica to achieve 10-100x memory efficiency gains for large multi-way JOINs while maintaining familiar SQL semantics.

References

  1. Graefe, G. (1993). Query Evaluation Techniques for Large Databases. ACM Computing Surveys.
  2. Shapiro, L. D. (1986). Join Processing in Database Systems with Large Main Memories. ACM TODS.
  3. Kim, C., Kaldewey, T., Lee, V. W., et al. (2009). Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. PVLDB.
  4. Barber, R., Lohman, G., Mohan, C., et al. (2014). In-Memory BLU Acceleration in IBM DB2 and dashDB. CIDR.
  5. Leis, V., Gubichev, A., Mirber, A., et al. (2015). How Good Are Query Optimizers, Really? PVLDB.

Copyright (c) 2023-2026 Cognica, Inc.