Chapter 6: Document Storage and Schema Management

This chapter examines Cognica's document storage layer, which provides flexible schema management atop the LSM-tree foundation established in Chapter 5. We explore how JSON documents are encoded for efficient storage, how schemas define structure and constraints, and how indexes accelerate queries across diverse access patterns.

6.1 The Document Model

Document databases emerged from the recognition that many applications work with semi-structured data that doesn't fit neatly into relational tables. Rather than forcing data into rigid schemas, document databases store self-describing records that can vary in structure.

6.1.1 JSON as Universal Data Format

Cognica adopts JSON (JavaScript Object Notation) as its document format. JSON provides:

Simplicity: Human-readable syntax with just six data types:

  • Objects (key-value maps)
  • Arrays (ordered sequences)
  • Strings
  • Numbers
  • Booleans
  • Null

Universality: Native support in every programming language, HTTP APIs, and configuration systems.

Nestability: Documents can contain nested documents and arrays to arbitrary depth.

Example Document:

{
  "_id": "user_12345",
  "name": "Alice Chen",
  "email": "alice@example.com",
  "profile": {
    "bio": "Database enthusiast",
    "location": {
      "city": "San Francisco",
      "country": "USA"
    }
  },
  "tags": ["developer", "researcher"],
  "created_at": "2024-01-15T10:30:00Z"
}

6.1.2 Document vs Relational Trade-offs

The document model trades normalization for locality:

Relational Model:

UserJOINProfileJOINLocationJOINTags\text{User} \xrightarrow{\text{JOIN}} \text{Profile} \xrightarrow{\text{JOIN}} \text{Location} \xrightarrow{\text{JOIN}} \text{Tags}

Data is normalized across multiple tables, eliminating redundancy but requiring joins for reconstruction.

Document Model:

User Document=UserProfileLocationTags\text{User Document} = \text{User} \cup \text{Profile} \cup \text{Location} \cup \text{Tags}

Related data is embedded within a single document, enabling single-read retrieval at the cost of potential redundancy.

Access Pattern Optimization:

PatternRelationalDocument
Read user with profile3+ JOINs1 read
Update user's city1 updateRead-modify-write
Find users in cityIndex scanIndex scan
Aggregate across usersEfficientEfficient

Documents excel when data is read together more often than updated independently.

6.1.3 RapidJSON Integration

Cognica uses RapidJSON, a high-performance JSON library, as its in-memory document representation:

class Document : public rapidjson::GenericDocument<
    rapidjson::UTF8<>,
    DocumentAllocator
> {
  // Extended with Cognica-specific operations
};

Performance Characteristics:

OperationComplexityNotes
Parse JSON stringO(n)O(n)Single pass, in-situ possible
Access field by nameO(m)O(m)Linear scan, mm = object size
Access array elementO(1)O(1)Direct index
Iterate all fieldsO(m)O(m)Sequential scan
Serialize to stringO(n)O(n)Single pass

RapidJSON's DOM (Document Object Model) representation stores parsed JSON in memory, enabling random access and modification.

6.1.4 Custom Allocator

Cognica employs a custom memory allocator for document operations:

Benefits:

  • Pool allocation: Reduces malloc/free overhead
  • Arena semantics: Bulk deallocation when document is destroyed
  • Cache locality: Related allocations are contiguous

Allocation Strategy:

Block Size=max(requested,arena_block_size)\text{Block Size} = \max(\text{requested}, \text{arena\_block\_size})

Small allocations come from the current arena block; large allocations get dedicated blocks.

6.2 Document Encoding

Storing JSON documents directly would be inefficient. Cognica encodes documents into a compact binary format optimized for storage and retrieval.

6.2.1 Type Encoding

Each value is prefixed with a type marker:

TypeCodeDescription
Object0x01Nested document
Array0x02Ordered sequence
Null0x03Null value
False0x04Boolean false
True0x05Boolean true
Int640x0664-bit signed integer
UInt640x0764-bit unsigned integer
Double0x08IEEE 754 double
String0x09UTF-8 string

Type-Length-Value (TLV) Encoding:

Encoded Value=Type(1)Length(var)Data(length)\text{Encoded Value} = \text{Type}(1) \| \text{Length}(var) \| \text{Data}(length)

Variable-length encoding uses continuation bits to minimize space for small values:

Encoded Length={1 byteif length<1282 bytesif length<16384\text{Encoded Length} = \begin{cases} 1 \text{ byte} & \text{if } length < 128 \\ 2 \text{ bytes} & \text{if } length < 16384 \\ \vdots & \vdots \end{cases}

6.2.2 Primitive Encoding

Integers:

Signed integers use sign-flip encoding to preserve sort order:

encode(n)={n0x8000000000000000if n0n0xFFFFFFFFFFFFFFFFif n<0\text{encode}(n) = \begin{cases} n \oplus \text{0x8000000000000000} & \text{if } n \geq 0 \\ n \oplus \text{0xFFFFFFFFFFFFFFFF} & \text{if } n < 0 \end{cases}

This transforms the two's complement representation so that:

encode(1)<encode(0)<encode(1)\text{encode}(-1) < \text{encode}(0) < \text{encode}(1)

Lexicographic comparison of encoded bytes yields correct numeric ordering.

Floating-Point Numbers:

IEEE 754 doubles require special handling for sortable encoding:

encode(d)={bits(d)0x8000000000000000if d0bits(d)0xFFFFFFFFFFFFFFFFif d<0\text{encode}(d) = \begin{cases} \text{bits}(d) \oplus \text{0x8000000000000000} & \text{if } d \geq 0 \\ \text{bits}(d) \oplus \text{0xFFFFFFFFFFFFFFFF} & \text{if } d < 0 \end{cases}

where bits(d)\text{bits}(d) interprets the 64-bit IEEE 754 representation as an unsigned integer.

Strings:

Strings are encoded with length prefix followed by UTF-8 bytes:

Encoded String=VarInt(length)UTF8 bytes\text{Encoded String} = \text{VarInt}(length) \| \text{UTF8 bytes}

For key comparison, null-terminated encoding is used:

Key String=UTF8 bytes0x00\text{Key String} = \text{UTF8 bytes} \| \text{0x00}

6.2.3 Composite Encoding

Objects:

Objects encode as sequences of key-value pairs:

Object=0x01VarInt(count)KV1KV2...KVn\text{Object} = \text{0x01} \| \text{VarInt}(count) \| \text{KV}_1 \| \text{KV}_2 \| ... \| \text{KV}_n

Each key-value pair:

KV=VarInt(key_len)keyencoded value\text{KV} = \text{VarInt}(key\_len) \| \text{key} \| \text{encoded value}

Arrays:

Arrays encode as sequences of values:

Array=0x02VarInt(count)Value1Value2...Valuen\text{Array} = \text{0x02} \| \text{VarInt}(count) \| \text{Value}_1 \| \text{Value}_2 \| ... \| \text{Value}_n

6.2.4 Document Layout

Complete documents include a header with metadata:

Loading diagram...

Header Fields:

FieldSizePurpose
Timestamp8 bytesCreation/modification time
TTL4 bytesTime-to-live in seconds (0 = never expires)
Flags1 byteMetadata flags (deleted, migrating, etc.)

Space Efficiency:

Consider encoding the example user document:

ComponentJSON SizeEncoded Size
Field names89 bytes89 bytes
String values78 bytes82 bytes
Structural overhead45 bytes15 bytes
Total212 bytes186 bytes

Binary encoding typically achieves 10-30% size reduction through eliminated whitespace and compact length encoding.

6.3 Schema Definition

While documents can vary in structure, schemas define expectations and constraints that enable optimization and validation.

6.3.1 Schema Structure

A Cognica schema specifies:

collection: users
workspace: default

primary_key:
  fields: [_id]
  unique: true

secondary_keys:
  - name: email_idx
    fields: [email]
    unique: true
    type: secondary_key

  - name: location_idx
    fields: [profile.location.country, profile.location.city]
    type: secondary_key

  - name: content_idx
    fields: [profile.bio]
    type: full_text_search

comment: "User accounts with profile information"

6.3.2 Schema Components

Primary Key:

Every collection has exactly one primary key that uniquely identifies documents:

PK:DK\text{PK}: \mathcal{D} \rightarrow \mathcal{K}

The primary key maps each document to a unique key value. Primary keys can be:

  • Single field: _id
  • Composite: (tenant_id, user_id)
  • Auto-generated: UUID or sequence

Secondary Keys:

Secondary keys create additional access paths:

SK:D2K\text{SK}: \mathcal{D} \rightarrow 2^{\mathcal{K}}

Unlike primary keys, secondary keys can map to sets (for non-unique indexes) and support:

  • B-tree indexes: For range queries and sorting
  • Full-text indexes: For text search
  • Clustered indexes: Storing document data with the index

6.3.3 Schema Builder Pattern

Schemas are constructed programmatically using the builder pattern:

auto schema = SchemaBuilder{}
    .set_workspace_id(workspace_id)
    .set_collection_id(collection_id)
    .set_collection_name("users")
    .set_primary_key({"_id"}, PrimaryKeyOptions{.unique = true})
    .add_secondary_key("email_idx", {"email"}, SecondaryKeyOptions{
        .unique = true,
        .type = IndexType::kSecondaryKey
    })
    .add_secondary_key("content_idx", {"profile.bio"}, SecondaryKeyOptions{
        .type = IndexType::kFullTextSearchIndex
    })
    .set_comment("User accounts")
    .build();

The builder validates constraints during construction:

  • Primary key must have at least one field
  • Secondary key names must be unique
  • Field paths must be valid dot notation

6.3.4 Schema Flexibility

Cognica supports schema-on-read semantics: documents can contain fields not defined in the schema. The schema defines:

  1. Indexed fields: Fields with associated indexes
  2. Type hints: Expected types for validation
  3. Constraints: Uniqueness, nullability

Documents may include additional fields that are stored but not indexed. This enables gradual schema evolution without migration.

6.4 Key Encoding

Keys must be encoded to preserve ordering in the LSM-tree while supporting composite keys and nullable fields.

6.4.1 Primary Key Encoding

Primary keys are encoded with a prefix identifying the collection:

PK Storage Key=Prefix(14)Encoded PK Fields\text{PK Storage Key} = \text{Prefix}(14) \| \text{Encoded PK Fields}

Prefix Structure:

ComponentBytesPurpose
Database Type1Distinguishes document DB from others
Category1Data category (user data = 2)
Workspace ID4Multi-tenant isolation
Collection ID4Collection identification
Index ID4Primary key index (always 0)

Field Encoding:

For composite primary keys (field_1, field_2, ...):

Encoded PK=enc(field1)enc(field2)...\text{Encoded PK} = \text{enc}(field_1) \| \text{enc}(field_2) \| ...

Each field is encoded with its type-specific encoding, ensuring lexicographic order matches logical order.

6.4.2 Secondary Key Encoding

Secondary keys include both the secondary key fields and the primary key (for uniqueness):

SK Storage Key=Prefix(14)Encoded SK FieldsEncoded PK\text{SK Storage Key} = \text{Prefix}(14) \| \text{Encoded SK Fields} \| \text{Encoded PK}

Example:

For index location_idx on (country, city) with primary key _id:

Key: [prefix][country][city][_id]
     [14 bytes][var][var][var]

This encoding enables:

  • Prefix scans: Find all users in a country
  • Range scans: Find users in countries A-M
  • Exact lookup: Find user with specific country+city+id

6.4.3 Nullable Field Handling

Nullable fields require special encoding to maintain sort order:

encnullable(v)={0x00if v=null0x01enc(v)otherwise\text{enc}_{nullable}(v) = \begin{cases} \text{0x00} & \text{if } v = \text{null} \\ \text{0x01} \| \text{enc}(v) & \text{otherwise} \end{cases}

Null values sort before all non-null values (or after, depending on configuration).

6.4.4 Sort Order Preservation

The encoding must satisfy:

v1<v2    enc(v1)<lexenc(v2)v_1 < v_2 \implies \text{enc}(v_1) <_{lex} \text{enc}(v_2)

where <lex<_{lex} is lexicographic (byte-wise) comparison.

Descending Order:

For descending sorts, the encoding is inverted:

encdesc(v)=complement(encasc(v))\text{enc}_{desc}(v) = \text{complement}(\text{enc}_{asc}(v))

where complement flips all bits. This reverses the sort order while maintaining the comparison-by-bytes property.

6.5 Index Architecture

Indexes are the primary mechanism for accelerating queries. Cognica supports multiple index types optimized for different access patterns.

6.5.1 Index Type Hierarchy

Loading diagram...

6.5.2 Index Types

TypeCodeUse Case
Primary Key0Unique document identification
Secondary Key1Traditional B-tree index
Clustered Secondary2Secondary index with embedded data
Full-Text Search3Text search with posting lists
Clustered FTS4FTS with embedded document data

Primary Key Index:

The primary key index stores complete documents:

Key=PKValue=Encoded Document\text{Key} = \text{PK} \quad \text{Value} = \text{Encoded Document}

Secondary Key Index:

Secondary indexes store only the mapping:

Key=SKPKValue=TTL Metadata\text{Key} = \text{SK} \| \text{PK} \quad \text{Value} = \text{TTL Metadata}

Lookups require two steps:

  1. Find PK via secondary index
  2. Fetch document via primary key

Clustered Secondary Index:

Clustered secondaries embed document data:

Key=SKPKValue=Encoded Document\text{Key} = \text{SK} \| \text{PK} \quad \text{Value} = \text{Encoded Document}

This eliminates the second lookup at the cost of storage duplication.

6.5.3 Index Descriptor

The IndexDescriptor manages all indexes for a collection:

class IndexDescriptor {
  PrimaryKey primary_key_;
  std::vector<SecondaryKey> secondary_keys_;
  mutable std::shared_mutex mutex_;

  // Operations
  auto get_primary_key() const -> const PrimaryKey&;
  auto get_secondary_key(IndexID id) const -> const SecondaryKey*;
  auto find_by_name(std::string_view name) const -> const SecondaryKey*;
  void add_secondary_key(SecondaryKey&& sk);
  void remove_secondary_key(IndexID id);
};

Thread Safety:

The descriptor uses a shared mutex for concurrent access:

  • Multiple readers can access concurrently
  • Writers acquire exclusive access
  • Index additions/removals are atomic

6.5.4 Index Statistics

Each index tracks usage statistics for query optimization:

struct IndexStatistics {
  std::atomic<int64_t> accessed;   // Query count
  std::atomic<int64_t> added;      // Insert count
  std::atomic<int64_t> updated;    // Update count
  std::atomic<int64_t> deleted;    // Delete count
  std::atomic<int64_t> merged;     // Merge operation count

  TimePoint accessed_at;  // Last query time
  TimePoint added_at;     // Last insert time
  TimePoint updated_at;   // Last update time
  TimePoint deleted_at;   // Last delete time
  TimePoint merged_at;    // Last merge time
};

Statistics inform:

  • Index selection: Prefer frequently-used indexes
  • Maintenance scheduling: Identify cold indexes for optimization
  • Capacity planning: Track growth rates

6.6 Collection Operations

Collections are the primary interface for document manipulation, providing ACID operations through the transaction layer.

6.6.1 Collection Architecture

Loading diagram...

6.6.2 CRUD Operations

Insert:

Status Collection::insert(const Document& doc) {
  // 1. Extract primary key
  auto pk = extract_primary_key(doc);

  // 2. Check uniqueness
  if (pk_reader_->exists(pk)) {
    return Status::AlreadyExists("Duplicate primary key");
  }

  // 3. Encode document
  auto encoded = encode_document(doc);

  // 4. Write to primary index
  pk_writer_->put(pk, encoded);

  // 5. Update secondary indexes
  for (auto& sk_writer : sk_writers_) {
    auto sk = extract_secondary_key(doc, sk_writer->descriptor());
    sk_writer->put(sk, pk);
  }

  return Status::OK();
}

Find:

Cursor Collection::find(const Document& query) {
  // 1. Analyze query
  auto plan = query_planner_.plan(query);

  // 2. Select best index
  auto index = plan.best_index();

  // 3. Create cursor
  if (index.is_primary_key()) {
    return pk_reader_->scan(plan.key_range());
  } else {
    return sk_readers_[index.id()]->scan(plan.key_range());
  }
}

Update:

Status Collection::update(const Document& filter, const Document& updates) {
  // 1. Find matching documents
  auto cursor = find(filter);

  // 2. Apply updates
  while (cursor.valid()) {
    auto doc = cursor.document();

    // 3. Apply update operators
    apply_updates(doc, updates);

    // 4. Rewrite document
    auto pk = extract_primary_key(doc);
    pk_writer_->put(pk, encode_document(doc));

    // 5. Update secondary indexes if affected fields changed
    update_secondary_indexes(old_doc, doc);

    cursor.next();
  }

  return Status::OK();
}

Delete:

Status Collection::remove(const Document& filter) {
  auto cursor = find(filter);

  while (cursor.valid()) {
    auto doc = cursor.document();
    auto pk = extract_primary_key(doc);

    // 1. Delete from primary index
    pk_writer_->del(pk);

    // 2. Delete from secondary indexes
    for (auto& sk_writer : sk_writers_) {
      auto sk = extract_secondary_key(doc, sk_writer->descriptor());
      sk_writer->del(sk, pk);
    }

    cursor.next();
  }

  return Status::OK();
}

6.6.3 Batch Operations

For bulk inserts, batch operations amortize overhead:

Status Collection::insert_parallel(const std::vector<Document>& docs) {
  // 1. Partition documents across threads
  auto partitions = partition(docs, thread_count_);

  // 2. Process partitions in parallel
  parallel_for(partitions, [this](auto& partition) {
    auto batch = begin_write_batch();

    for (auto& doc : partition) {
      batch.insert(doc);
    }

    batch.commit();
  });

  return Status::OK();
}

Performance Characteristics:

OperationSingleBatch (1000 docs)
Insert100 us50 ms (50 us/doc)
Index update50 us25 ms (25 us/doc)
Total150 us75 ms
Throughput6,600/s13,300/s

Batching doubles throughput by amortizing transaction overhead.

6.6.4 Transaction Support

Collections support ACID transactions:

auto txn = collection.begin_transaction();

try {
  txn.insert(doc1);
  txn.update(filter, updates);
  txn.remove(filter2);

  txn.commit();
} catch (...) {
  txn.rollback();
}

Isolation Levels:

LevelDirty ReadNon-RepeatablePhantom
Read UncommittedYesYesYes
Read CommittedNoYesYes
Repeatable ReadNoNoYes
SerializableNoNoNo

Cognica defaults to Snapshot Isolation, which prevents dirty reads and non-repeatable reads while allowing phantoms in some cases.

6.7 Index Reader and Writer

The index reader/writer abstraction separates query and mutation operations.

6.7.1 Index Reader Interface

class IndexReader {
public:
  // Point lookup
  virtual auto get(const Slice& key) -> std::optional<Document> = 0;

  // Existence check
  virtual auto exists(const Slice& key) -> bool = 0;

  // Range scan
  virtual auto scan(const KeyRange& range) -> Cursor = 0;

  // Prefix scan
  virtual auto scan_prefix(const Slice& prefix) -> Cursor = 0;

  // Count
  virtual auto count(const KeyRange& range) -> size_t = 0;
};

6.7.2 Index Writer Interface

class IndexWriter {
public:
  // Insert
  virtual auto put(const Slice& key, const Slice& value) -> Status = 0;

  // Delete
  virtual auto del(const Slice& key) -> Status = 0;

  // Batch operations
  virtual auto put_batch(const std::vector<KV>& kvs) -> Status = 0;
  virtual auto del_batch(const std::vector<Slice>& keys) -> Status = 0;
};

6.7.3 Key Codec

The key codec handles encoding and decoding of index keys:

Primary Key Codec:

struct PrimaryKeyIndexKeyCodec {
  static auto encode(
    const PrimaryKey& pk_desc,
    const Slice& pk
  ) -> std::string {
    std::string key;
    // Add 14-byte prefix
    append_prefix(key, pk_desc.guid());
    // Add encoded primary key fields
    key.append(pk.data(), pk.size());
    return key;
  }

  static auto decode(
    const PrimaryKey& pk_desc,
    const Slice& storage_key
  ) -> Slice {
    // Skip 14-byte prefix
    return storage_key.substr(14);
  }
};

Secondary Key Codec:

struct SecondaryKeyIndexKeyCodec {
  static auto encode(
    const PrimaryKey& pk_desc,
    const SecondaryKey& sk_desc,
    const Slice& pk,
    const Document& doc,
    bool nullable
  ) -> std::string {
    std::string key;
    // Add 14-byte prefix with SK index ID
    append_prefix(key, sk_desc.guid());
    // Add encoded secondary key fields
    for (const auto& field : sk_desc.fields()) {
      auto value = doc.find(field);
      encode_field(key, value, nullable);
    }
    // Append primary key for uniqueness
    key.append(pk.data(), pk.size());
    return key;
  }
};

6.7.4 Index Affinity Score

The query optimizer uses affinity scores to select the best index:

Affinity(Q,I)=ffields(Q)fields(I)w(f,I)\text{Affinity}(Q, I) = \sum_{f \in \text{fields}(Q) \cap \text{fields}(I)} w(f, I)

where w(f,I)w(f, I) is the weight of field ff in index II (higher for earlier positions).

Scoring Algorithm:

double Index::compute_affinity_score(const FieldNames& query_fields) const {
  double score = 0.0;
  size_t position = 0;

  for (const auto& field : fields_) {
    if (query_fields.contains(field)) {
      // Higher weight for earlier positions (prefix selectivity)
      score += 1.0 / (position + 1);
    } else {
      // Gap in index prefix reduces usefulness
      break;
    }
    position++;
  }

  return score;
}

6.8 Dot Notation and Nested Documents

Cognica supports dot notation for accessing nested fields, enabling queries and indexes on deeply nested data.

6.8.1 Path Syntax

Dot notation uses periods to separate nested field names:

PathMeaning
nameTop-level field
profile.bioNested field
profile.location.cityDeeply nested field
tags[0]Array element
tags[*]All array elements

6.8.2 Path Resolution

class DotNotationSupport {
public:
  // Find nested member
  auto find_member(const Document& doc, std::string_view path)
      -> std::optional<Value>;

  // Add nested member (creating intermediate objects)
  auto add_member(Document& doc, std::string_view path, Value value)
      -> Status;

  // Check existence
  auto has_member(const Document& doc, std::string_view path)
      -> bool;

  // Remove nested member
  auto remove_member(Document& doc, std::string_view path)
      -> Status;
};

Resolution Algorithm:

find_member(doc, "profile.location.city"):
  1. Split path: ["profile", "location", "city"]
  2. current = doc
  3. For each segment:
     - If current is object and has segment:
       current = current[segment]
     - Else: return null
  4. Return current

6.8.3 Nested Index Creation

Indexes on nested fields work identically to top-level fields:

secondary_keys:
  - name: city_idx
    fields: [profile.location.city]
    type: secondary_key

The index stores the nested value directly, enabling efficient lookups:

SELECT * FROM users WHERE profile.location.city = 'San Francisco'

Uses city_idx for O(log n) lookup rather than O(n) full scan.

6.8.4 Array Handling

Arrays require special handling for indexing:

Multi-Key Index:

For a document with array field:

{"_id": "1", "tags": ["developer", "researcher"]}

A multi-key index creates entries for each array element:

Index Entries={("developer","1"),("researcher","1")}\text{Index Entries} = \{(\text{"developer"}, \text{"1"}), (\text{"researcher"}, \text{"1"})\}

Query Semantics:

SELECT * FROM users WHERE tags = 'developer'

Matches any document where tags contains "developer".

6.9 Catalog Management

The catalog stores metadata about collections, indexes, and schemas.

6.9.1 Catalog Structure

Loading diagram...

6.9.2 Catalog Operations

OperationDescription
create_collectionRegister new collection with schema
drop_collectionRemove collection and all data
get_collectionRetrieve collection metadata
list_collectionsEnumerate workspace collections
create_indexAdd secondary index
drop_indexRemove secondary index
get_indexRetrieve index metadata

6.9.3 Schema Versioning

Schemas evolve over time. Cognica tracks schema versions:

Schemav+1=migrate(Schemav,changes)\text{Schema}_{v+1} = \text{migrate}(\text{Schema}_v, \text{changes})

Compatible Changes (no migration needed):

  • Adding nullable fields
  • Adding secondary indexes
  • Adding new collections

Incompatible Changes (require migration):

  • Changing primary key fields
  • Changing field types
  • Removing required fields

6.9.4 Metadata Persistence

Catalog metadata is stored in the system database category:

Key=0x00typeworkspace_idcollection_id\text{Key} = \text{0x00} \| \text{type} \| \text{workspace\_id} \| \text{collection\_id}
TypePurpose
0x01Collection schema
0x02Index descriptor
0x03Statistics
0x04Access control

6.10 Query Context and Projection

Query context carries execution state through the query pipeline.

6.10.1 Query Context Structure

struct QueryContext {
  // Execution mode
  bool is_single_document;
  bool is_streaming;

  // Field projection
  FieldProjectMap projection;

  // Transaction state
  Transaction* transaction;
  Snapshot* snapshot;

  // Statistics
  QueryStatistics stats;
};

6.10.2 Field Projection

Projections limit which fields are returned, reducing I/O and network transfer:

SELECT name, email FROM users WHERE status = 'active'

Projection Encoding:

struct FieldProjectMap {
  enum Mode { kInclude, kExclude };

  Mode mode;
  std::unordered_set<std::string> fields;

  bool should_include(std::string_view field) const {
    bool in_set = fields.contains(field);
    return (mode == kInclude) ? in_set : !in_set;
  }
};

Projection Optimization:

For queries touching only indexed fields, the query can be answered from the index alone (covering index):

Covering    projected fieldsindex fields\text{Covering} \iff \text{projected fields} \subseteq \text{index fields}

Covering queries avoid the primary key lookup entirely.

6.10.3 Query Statistics

Each query collects execution statistics:

struct QueryStatistics {
  size_t documents_scanned;
  size_t documents_returned;
  size_t index_keys_examined;
  size_t bytes_read;

  Duration parse_time;
  Duration plan_time;
  Duration execution_time;

  std::string selected_index;
};

Statistics enable:

  • Query debugging: Identify slow queries
  • Index tuning: Find missing indexes
  • Capacity planning: Predict resource usage

6.11 Summary

This chapter explored Cognica's document storage layer, from JSON representation through binary encoding to index management. Key takeaways:

  1. JSON documents provide flexible schema with nested structure, encoded efficiently in binary format for storage.

  2. Key encoding preserves sort order for composite keys, enabling efficient range scans in the LSM-tree.

  3. Multiple index types (primary, secondary, full-text, clustered) optimize for different access patterns.

  4. Schema management balances flexibility (schema-on-read) with optimization (indexed fields, constraints).

  5. Collection operations provide ACID guarantees through the transaction layer, with batch optimization for bulk workloads.

  6. Dot notation enables seamless access to nested fields, with multi-key indexes for arrays.

  7. Catalog management tracks metadata with support for schema evolution.

The document layer provides the structured data interface that applications interact with, while the next chapter explores how full-text search indexes enable efficient text queries across document collections.

Copyright (c) 2023-2026 Cognica, Inc.