Chapter 6: Document Storage and Schema Management
This chapter examines Cognica's document storage layer, which provides flexible schema management atop the LSM-tree foundation established in Chapter 5. We explore how JSON documents are encoded for efficient storage, how schemas define structure and constraints, and how indexes accelerate queries across diverse access patterns.
6.1 The Document Model
Document databases emerged from the recognition that many applications work with semi-structured data that doesn't fit neatly into relational tables. Rather than forcing data into rigid schemas, document databases store self-describing records that can vary in structure.
6.1.1 JSON as Universal Data Format
Cognica adopts JSON (JavaScript Object Notation) as its document format. JSON provides:
Simplicity: Human-readable syntax with just six data types:
- Objects (key-value maps)
- Arrays (ordered sequences)
- Strings
- Numbers
- Booleans
- Null
Universality: Native support in every programming language, HTTP APIs, and configuration systems.
Nestability: Documents can contain nested documents and arrays to arbitrary depth.
Example Document:
{
"_id": "user_12345",
"name": "Alice Chen",
"email": "alice@example.com",
"profile": {
"bio": "Database enthusiast",
"location": {
"city": "San Francisco",
"country": "USA"
}
},
"tags": ["developer", "researcher"],
"created_at": "2024-01-15T10:30:00Z"
}
6.1.2 Document vs Relational Trade-offs
The document model trades normalization for locality:
Relational Model:
Data is normalized across multiple tables, eliminating redundancy but requiring joins for reconstruction.
Document Model:
Related data is embedded within a single document, enabling single-read retrieval at the cost of potential redundancy.
Access Pattern Optimization:
| Pattern | Relational | Document |
|---|---|---|
| Read user with profile | 3+ JOINs | 1 read |
| Update user's city | 1 update | Read-modify-write |
| Find users in city | Index scan | Index scan |
| Aggregate across users | Efficient | Efficient |
Documents excel when data is read together more often than updated independently.
6.1.3 RapidJSON Integration
Cognica uses RapidJSON, a high-performance JSON library, as its in-memory document representation:
class Document : public rapidjson::GenericDocument<
rapidjson::UTF8<>,
DocumentAllocator
> {
// Extended with Cognica-specific operations
};
Performance Characteristics:
| Operation | Complexity | Notes |
|---|---|---|
| Parse JSON string | Single pass, in-situ possible | |
| Access field by name | Linear scan, = object size | |
| Access array element | Direct index | |
| Iterate all fields | Sequential scan | |
| Serialize to string | Single pass |
RapidJSON's DOM (Document Object Model) representation stores parsed JSON in memory, enabling random access and modification.
6.1.4 Custom Allocator
Cognica employs a custom memory allocator for document operations:
Benefits:
- Pool allocation: Reduces malloc/free overhead
- Arena semantics: Bulk deallocation when document is destroyed
- Cache locality: Related allocations are contiguous
Allocation Strategy:
Small allocations come from the current arena block; large allocations get dedicated blocks.
6.2 Document Encoding
Storing JSON documents directly would be inefficient. Cognica encodes documents into a compact binary format optimized for storage and retrieval.
6.2.1 Type Encoding
Each value is prefixed with a type marker:
| Type | Code | Description |
|---|---|---|
| Object | 0x01 | Nested document |
| Array | 0x02 | Ordered sequence |
| Null | 0x03 | Null value |
| False | 0x04 | Boolean false |
| True | 0x05 | Boolean true |
| Int64 | 0x06 | 64-bit signed integer |
| UInt64 | 0x07 | 64-bit unsigned integer |
| Double | 0x08 | IEEE 754 double |
| String | 0x09 | UTF-8 string |
Type-Length-Value (TLV) Encoding:
Variable-length encoding uses continuation bits to minimize space for small values:
6.2.2 Primitive Encoding
Integers:
Signed integers use sign-flip encoding to preserve sort order:
This transforms the two's complement representation so that:
Lexicographic comparison of encoded bytes yields correct numeric ordering.
Floating-Point Numbers:
IEEE 754 doubles require special handling for sortable encoding:
where interprets the 64-bit IEEE 754 representation as an unsigned integer.
Strings:
Strings are encoded with length prefix followed by UTF-8 bytes:
For key comparison, null-terminated encoding is used:
6.2.3 Composite Encoding
Objects:
Objects encode as sequences of key-value pairs:
Each key-value pair:
Arrays:
Arrays encode as sequences of values:
6.2.4 Document Layout
Complete documents include a header with metadata:
Header Fields:
| Field | Size | Purpose |
|---|---|---|
| Timestamp | 8 bytes | Creation/modification time |
| TTL | 4 bytes | Time-to-live in seconds (0 = never expires) |
| Flags | 1 byte | Metadata flags (deleted, migrating, etc.) |
Space Efficiency:
Consider encoding the example user document:
| Component | JSON Size | Encoded Size |
|---|---|---|
| Field names | 89 bytes | 89 bytes |
| String values | 78 bytes | 82 bytes |
| Structural overhead | 45 bytes | 15 bytes |
| Total | 212 bytes | 186 bytes |
Binary encoding typically achieves 10-30% size reduction through eliminated whitespace and compact length encoding.
6.3 Schema Definition
While documents can vary in structure, schemas define expectations and constraints that enable optimization and validation.
6.3.1 Schema Structure
A Cognica schema specifies:
collection: users
workspace: default
primary_key:
fields: [_id]
unique: true
secondary_keys:
- name: email_idx
fields: [email]
unique: true
type: secondary_key
- name: location_idx
fields: [profile.location.country, profile.location.city]
type: secondary_key
- name: content_idx
fields: [profile.bio]
type: full_text_search
comment: "User accounts with profile information"
6.3.2 Schema Components
Primary Key:
Every collection has exactly one primary key that uniquely identifies documents:
The primary key maps each document to a unique key value. Primary keys can be:
- Single field:
_id - Composite:
(tenant_id, user_id) - Auto-generated: UUID or sequence
Secondary Keys:
Secondary keys create additional access paths:
Unlike primary keys, secondary keys can map to sets (for non-unique indexes) and support:
- B-tree indexes: For range queries and sorting
- Full-text indexes: For text search
- Clustered indexes: Storing document data with the index
6.3.3 Schema Builder Pattern
Schemas are constructed programmatically using the builder pattern:
auto schema = SchemaBuilder{}
.set_workspace_id(workspace_id)
.set_collection_id(collection_id)
.set_collection_name("users")
.set_primary_key({"_id"}, PrimaryKeyOptions{.unique = true})
.add_secondary_key("email_idx", {"email"}, SecondaryKeyOptions{
.unique = true,
.type = IndexType::kSecondaryKey
})
.add_secondary_key("content_idx", {"profile.bio"}, SecondaryKeyOptions{
.type = IndexType::kFullTextSearchIndex
})
.set_comment("User accounts")
.build();
The builder validates constraints during construction:
- Primary key must have at least one field
- Secondary key names must be unique
- Field paths must be valid dot notation
6.3.4 Schema Flexibility
Cognica supports schema-on-read semantics: documents can contain fields not defined in the schema. The schema defines:
- Indexed fields: Fields with associated indexes
- Type hints: Expected types for validation
- Constraints: Uniqueness, nullability
Documents may include additional fields that are stored but not indexed. This enables gradual schema evolution without migration.
6.4 Key Encoding
Keys must be encoded to preserve ordering in the LSM-tree while supporting composite keys and nullable fields.
6.4.1 Primary Key Encoding
Primary keys are encoded with a prefix identifying the collection:
Prefix Structure:
| Component | Bytes | Purpose |
|---|---|---|
| Database Type | 1 | Distinguishes document DB from others |
| Category | 1 | Data category (user data = 2) |
| Workspace ID | 4 | Multi-tenant isolation |
| Collection ID | 4 | Collection identification |
| Index ID | 4 | Primary key index (always 0) |
Field Encoding:
For composite primary keys (field_1, field_2, ...):
Each field is encoded with its type-specific encoding, ensuring lexicographic order matches logical order.
6.4.2 Secondary Key Encoding
Secondary keys include both the secondary key fields and the primary key (for uniqueness):
Example:
For index location_idx on (country, city) with primary key _id:
Key: [prefix][country][city][_id]
[14 bytes][var][var][var]
This encoding enables:
- Prefix scans: Find all users in a country
- Range scans: Find users in countries A-M
- Exact lookup: Find user with specific country+city+id
6.4.3 Nullable Field Handling
Nullable fields require special encoding to maintain sort order:
Null values sort before all non-null values (or after, depending on configuration).
6.4.4 Sort Order Preservation
The encoding must satisfy:
where is lexicographic (byte-wise) comparison.
Descending Order:
For descending sorts, the encoding is inverted:
where complement flips all bits. This reverses the sort order while maintaining the comparison-by-bytes property.
6.5 Index Architecture
Indexes are the primary mechanism for accelerating queries. Cognica supports multiple index types optimized for different access patterns.
6.5.1 Index Type Hierarchy
6.5.2 Index Types
| Type | Code | Use Case |
|---|---|---|
| Primary Key | 0 | Unique document identification |
| Secondary Key | 1 | Traditional B-tree index |
| Clustered Secondary | 2 | Secondary index with embedded data |
| Full-Text Search | 3 | Text search with posting lists |
| Clustered FTS | 4 | FTS with embedded document data |
Primary Key Index:
The primary key index stores complete documents:
Secondary Key Index:
Secondary indexes store only the mapping:
Lookups require two steps:
- Find PK via secondary index
- Fetch document via primary key
Clustered Secondary Index:
Clustered secondaries embed document data:
This eliminates the second lookup at the cost of storage duplication.
6.5.3 Index Descriptor
The IndexDescriptor manages all indexes for a collection:
class IndexDescriptor {
PrimaryKey primary_key_;
std::vector<SecondaryKey> secondary_keys_;
mutable std::shared_mutex mutex_;
// Operations
auto get_primary_key() const -> const PrimaryKey&;
auto get_secondary_key(IndexID id) const -> const SecondaryKey*;
auto find_by_name(std::string_view name) const -> const SecondaryKey*;
void add_secondary_key(SecondaryKey&& sk);
void remove_secondary_key(IndexID id);
};
Thread Safety:
The descriptor uses a shared mutex for concurrent access:
- Multiple readers can access concurrently
- Writers acquire exclusive access
- Index additions/removals are atomic
6.5.4 Index Statistics
Each index tracks usage statistics for query optimization:
struct IndexStatistics {
std::atomic<int64_t> accessed; // Query count
std::atomic<int64_t> added; // Insert count
std::atomic<int64_t> updated; // Update count
std::atomic<int64_t> deleted; // Delete count
std::atomic<int64_t> merged; // Merge operation count
TimePoint accessed_at; // Last query time
TimePoint added_at; // Last insert time
TimePoint updated_at; // Last update time
TimePoint deleted_at; // Last delete time
TimePoint merged_at; // Last merge time
};
Statistics inform:
- Index selection: Prefer frequently-used indexes
- Maintenance scheduling: Identify cold indexes for optimization
- Capacity planning: Track growth rates
6.6 Collection Operations
Collections are the primary interface for document manipulation, providing ACID operations through the transaction layer.
6.6.1 Collection Architecture
6.6.2 CRUD Operations
Insert:
Status Collection::insert(const Document& doc) {
// 1. Extract primary key
auto pk = extract_primary_key(doc);
// 2. Check uniqueness
if (pk_reader_->exists(pk)) {
return Status::AlreadyExists("Duplicate primary key");
}
// 3. Encode document
auto encoded = encode_document(doc);
// 4. Write to primary index
pk_writer_->put(pk, encoded);
// 5. Update secondary indexes
for (auto& sk_writer : sk_writers_) {
auto sk = extract_secondary_key(doc, sk_writer->descriptor());
sk_writer->put(sk, pk);
}
return Status::OK();
}
Find:
Cursor Collection::find(const Document& query) {
// 1. Analyze query
auto plan = query_planner_.plan(query);
// 2. Select best index
auto index = plan.best_index();
// 3. Create cursor
if (index.is_primary_key()) {
return pk_reader_->scan(plan.key_range());
} else {
return sk_readers_[index.id()]->scan(plan.key_range());
}
}
Update:
Status Collection::update(const Document& filter, const Document& updates) {
// 1. Find matching documents
auto cursor = find(filter);
// 2. Apply updates
while (cursor.valid()) {
auto doc = cursor.document();
// 3. Apply update operators
apply_updates(doc, updates);
// 4. Rewrite document
auto pk = extract_primary_key(doc);
pk_writer_->put(pk, encode_document(doc));
// 5. Update secondary indexes if affected fields changed
update_secondary_indexes(old_doc, doc);
cursor.next();
}
return Status::OK();
}
Delete:
Status Collection::remove(const Document& filter) {
auto cursor = find(filter);
while (cursor.valid()) {
auto doc = cursor.document();
auto pk = extract_primary_key(doc);
// 1. Delete from primary index
pk_writer_->del(pk);
// 2. Delete from secondary indexes
for (auto& sk_writer : sk_writers_) {
auto sk = extract_secondary_key(doc, sk_writer->descriptor());
sk_writer->del(sk, pk);
}
cursor.next();
}
return Status::OK();
}
6.6.3 Batch Operations
For bulk inserts, batch operations amortize overhead:
Status Collection::insert_parallel(const std::vector<Document>& docs) {
// 1. Partition documents across threads
auto partitions = partition(docs, thread_count_);
// 2. Process partitions in parallel
parallel_for(partitions, [this](auto& partition) {
auto batch = begin_write_batch();
for (auto& doc : partition) {
batch.insert(doc);
}
batch.commit();
});
return Status::OK();
}
Performance Characteristics:
| Operation | Single | Batch (1000 docs) |
|---|---|---|
| Insert | 100 us | 50 ms (50 us/doc) |
| Index update | 50 us | 25 ms (25 us/doc) |
| Total | 150 us | 75 ms |
| Throughput | 6,600/s | 13,300/s |
Batching doubles throughput by amortizing transaction overhead.
6.6.4 Transaction Support
Collections support ACID transactions:
auto txn = collection.begin_transaction();
try {
txn.insert(doc1);
txn.update(filter, updates);
txn.remove(filter2);
txn.commit();
} catch (...) {
txn.rollback();
}
Isolation Levels:
| Level | Dirty Read | Non-Repeatable | Phantom |
|---|---|---|---|
| Read Uncommitted | Yes | Yes | Yes |
| Read Committed | No | Yes | Yes |
| Repeatable Read | No | No | Yes |
| Serializable | No | No | No |
Cognica defaults to Snapshot Isolation, which prevents dirty reads and non-repeatable reads while allowing phantoms in some cases.
6.7 Index Reader and Writer
The index reader/writer abstraction separates query and mutation operations.
6.7.1 Index Reader Interface
class IndexReader {
public:
// Point lookup
virtual auto get(const Slice& key) -> std::optional<Document> = 0;
// Existence check
virtual auto exists(const Slice& key) -> bool = 0;
// Range scan
virtual auto scan(const KeyRange& range) -> Cursor = 0;
// Prefix scan
virtual auto scan_prefix(const Slice& prefix) -> Cursor = 0;
// Count
virtual auto count(const KeyRange& range) -> size_t = 0;
};
6.7.2 Index Writer Interface
class IndexWriter {
public:
// Insert
virtual auto put(const Slice& key, const Slice& value) -> Status = 0;
// Delete
virtual auto del(const Slice& key) -> Status = 0;
// Batch operations
virtual auto put_batch(const std::vector<KV>& kvs) -> Status = 0;
virtual auto del_batch(const std::vector<Slice>& keys) -> Status = 0;
};
6.7.3 Key Codec
The key codec handles encoding and decoding of index keys:
Primary Key Codec:
struct PrimaryKeyIndexKeyCodec {
static auto encode(
const PrimaryKey& pk_desc,
const Slice& pk
) -> std::string {
std::string key;
// Add 14-byte prefix
append_prefix(key, pk_desc.guid());
// Add encoded primary key fields
key.append(pk.data(), pk.size());
return key;
}
static auto decode(
const PrimaryKey& pk_desc,
const Slice& storage_key
) -> Slice {
// Skip 14-byte prefix
return storage_key.substr(14);
}
};
Secondary Key Codec:
struct SecondaryKeyIndexKeyCodec {
static auto encode(
const PrimaryKey& pk_desc,
const SecondaryKey& sk_desc,
const Slice& pk,
const Document& doc,
bool nullable
) -> std::string {
std::string key;
// Add 14-byte prefix with SK index ID
append_prefix(key, sk_desc.guid());
// Add encoded secondary key fields
for (const auto& field : sk_desc.fields()) {
auto value = doc.find(field);
encode_field(key, value, nullable);
}
// Append primary key for uniqueness
key.append(pk.data(), pk.size());
return key;
}
};
6.7.4 Index Affinity Score
The query optimizer uses affinity scores to select the best index:
where is the weight of field in index (higher for earlier positions).
Scoring Algorithm:
double Index::compute_affinity_score(const FieldNames& query_fields) const {
double score = 0.0;
size_t position = 0;
for (const auto& field : fields_) {
if (query_fields.contains(field)) {
// Higher weight for earlier positions (prefix selectivity)
score += 1.0 / (position + 1);
} else {
// Gap in index prefix reduces usefulness
break;
}
position++;
}
return score;
}
6.8 Dot Notation and Nested Documents
Cognica supports dot notation for accessing nested fields, enabling queries and indexes on deeply nested data.
6.8.1 Path Syntax
Dot notation uses periods to separate nested field names:
| Path | Meaning |
|---|---|
name | Top-level field |
profile.bio | Nested field |
profile.location.city | Deeply nested field |
tags[0] | Array element |
tags[*] | All array elements |
6.8.2 Path Resolution
class DotNotationSupport {
public:
// Find nested member
auto find_member(const Document& doc, std::string_view path)
-> std::optional<Value>;
// Add nested member (creating intermediate objects)
auto add_member(Document& doc, std::string_view path, Value value)
-> Status;
// Check existence
auto has_member(const Document& doc, std::string_view path)
-> bool;
// Remove nested member
auto remove_member(Document& doc, std::string_view path)
-> Status;
};
Resolution Algorithm:
find_member(doc, "profile.location.city"):
1. Split path: ["profile", "location", "city"]
2. current = doc
3. For each segment:
- If current is object and has segment:
current = current[segment]
- Else: return null
4. Return current
6.8.3 Nested Index Creation
Indexes on nested fields work identically to top-level fields:
secondary_keys:
- name: city_idx
fields: [profile.location.city]
type: secondary_key
The index stores the nested value directly, enabling efficient lookups:
SELECT * FROM users WHERE profile.location.city = 'San Francisco'
Uses city_idx for O(log n) lookup rather than O(n) full scan.
6.8.4 Array Handling
Arrays require special handling for indexing:
Multi-Key Index:
For a document with array field:
{"_id": "1", "tags": ["developer", "researcher"]}
A multi-key index creates entries for each array element:
Query Semantics:
SELECT * FROM users WHERE tags = 'developer'
Matches any document where tags contains "developer".
6.9 Catalog Management
The catalog stores metadata about collections, indexes, and schemas.
6.9.1 Catalog Structure
6.9.2 Catalog Operations
| Operation | Description |
|---|---|
create_collection | Register new collection with schema |
drop_collection | Remove collection and all data |
get_collection | Retrieve collection metadata |
list_collections | Enumerate workspace collections |
create_index | Add secondary index |
drop_index | Remove secondary index |
get_index | Retrieve index metadata |
6.9.3 Schema Versioning
Schemas evolve over time. Cognica tracks schema versions:
Compatible Changes (no migration needed):
- Adding nullable fields
- Adding secondary indexes
- Adding new collections
Incompatible Changes (require migration):
- Changing primary key fields
- Changing field types
- Removing required fields
6.9.4 Metadata Persistence
Catalog metadata is stored in the system database category:
| Type | Purpose |
|---|---|
0x01 | Collection schema |
0x02 | Index descriptor |
0x03 | Statistics |
0x04 | Access control |
6.10 Query Context and Projection
Query context carries execution state through the query pipeline.
6.10.1 Query Context Structure
struct QueryContext {
// Execution mode
bool is_single_document;
bool is_streaming;
// Field projection
FieldProjectMap projection;
// Transaction state
Transaction* transaction;
Snapshot* snapshot;
// Statistics
QueryStatistics stats;
};
6.10.2 Field Projection
Projections limit which fields are returned, reducing I/O and network transfer:
SELECT name, email FROM users WHERE status = 'active'
Projection Encoding:
struct FieldProjectMap {
enum Mode { kInclude, kExclude };
Mode mode;
std::unordered_set<std::string> fields;
bool should_include(std::string_view field) const {
bool in_set = fields.contains(field);
return (mode == kInclude) ? in_set : !in_set;
}
};
Projection Optimization:
For queries touching only indexed fields, the query can be answered from the index alone (covering index):
Covering queries avoid the primary key lookup entirely.
6.10.3 Query Statistics
Each query collects execution statistics:
struct QueryStatistics {
size_t documents_scanned;
size_t documents_returned;
size_t index_keys_examined;
size_t bytes_read;
Duration parse_time;
Duration plan_time;
Duration execution_time;
std::string selected_index;
};
Statistics enable:
- Query debugging: Identify slow queries
- Index tuning: Find missing indexes
- Capacity planning: Predict resource usage
6.11 Summary
This chapter explored Cognica's document storage layer, from JSON representation through binary encoding to index management. Key takeaways:
-
JSON documents provide flexible schema with nested structure, encoded efficiently in binary format for storage.
-
Key encoding preserves sort order for composite keys, enabling efficient range scans in the LSM-tree.
-
Multiple index types (primary, secondary, full-text, clustered) optimize for different access patterns.
-
Schema management balances flexibility (schema-on-read) with optimization (indexed fields, constraints).
-
Collection operations provide ACID guarantees through the transaction layer, with batch optimization for bulk workloads.
-
Dot notation enables seamless access to nested fields, with multi-key indexes for arrays.
-
Catalog management tracks metadata with support for schema evolution.
The document layer provides the structured data interface that applications interact with, while the next chapter explores how full-text search indexes enable efficient text queries across document collections.