Chapter 34: Context-Isolated Architecture
Database systems that rely on global state face fundamental barriers to multi-tenancy, testability, and safe shutdown. When subsystems access shared resources through singletons and static variables, the system cannot host multiple independent database instances within a single process, tests cannot run in isolation, and shutdown sequences become fragile races against dangling references. This chapter examines Cognica's systematic refactoring from a singleton-based architecture to a context-isolated model where a single owner object — ServerContext — holds the complete lifecycle of every subsystem.
34.1 Introduction — From Singletons to Context Isolation
34.1.1 The Global State Problem
Cognica's original architecture followed a common pattern in database systems: each subsystem exposed a pair of free functions — initialize() and uninitialize() — backed by static variables in an anonymous or detail namespace. The storage engine, document database, scheduler, logger, configuration parser, session registry, compilation cache, and replication manager all followed this pattern:
// Original pattern: free functions backed by static state
namespace cognica::db::storage {
namespace detail {
static std::filesystem::path db_path = "data/cognica.db";
static rdb::Options options {};
static std::unique_ptr<TransactionDBWrapper> db {};
static ChainedCommitObserver commit_observer {};
static bool fast_shutdown = false;
} // namespace detail
bool initialize(const std::filesystem::path& path);
bool uninitialize(bool flush_db = false, bool force = false);
TransactionDB* get_db();
} // namespace cognica::db::storage
This design has several inherent limitations:
-
Single-instance constraint. A process can host exactly one database instance. Running two independent databases for multi-tenant isolation requires separate processes.
-
Test coupling. Tests share global state, creating implicit dependencies between test suites. A test that modifies the compilation cache or session registry affects every subsequent test.
-
Shutdown fragility. Uninitializing subsystems in the wrong order causes use-after-free crashes. The correct order is the reverse of initialization, but nothing enforces this when each subsystem manages its own lifetime.
-
Hidden dependencies. When any function can call
DocumentDB::instance()orconfig::get_options(), the dependency graph is invisible. Refactoring becomes dangerous because callers are not declared in function signatures.
34.1.2 The Default Context Bridge Anti-Pattern
A naive approach to removing singletons would change every call site simultaneously — a change touching hundreds of files in a single commit. Cognica instead adopted a phased migration using the default context bridge pattern:
// Bridge pattern: free function delegates to a default context
namespace cognica::db::storage {
TransactionDB* get_db() {
return ServerContext::get_default()
->get_storage_engine_context()
->get_db();
}
} // namespace cognica::db::storage
During migration, each singleton's free-function API was preserved but rewritten to delegate through ServerContext::get_default(). This allowed incremental migration: existing callers continued to work while new code received the context through parameters. Once all callers were migrated, the bridge was removed.
The bridge is an anti-pattern in the final architecture — it reintroduces global access through a different mechanism. Its value is purely transitional: it enables a multi-month refactoring to proceed in small, testable increments without breaking the entire codebase.
34.2 ServerContext Ownership Model
34.2.1 ServerContext as the Top-Level Owner
The ServerContext is the single root object that owns every subsystem's state. It is constructed during service::initialize() and destroyed during service::uninitialize(). No subsystem outlives ServerContext, and no subsystem is initialized before it.
The ownership model follows the C++ RAII principle: construction acquires resources, destruction releases them. Because ServerContext holds each subsystem through std::unique_ptr, the destruction order is the reverse of declaration order — matching the reverse-of-initialization requirement automatically.
34.2.2 Construction Order and Dependency Graph
Subsystem initialization must respect a strict partial order dictated by dependencies. The configuration must be parsed before the storage engine can read its options. The storage engine must be open before the document database can create collections. The document database must exist before the SQL session registry can bootstrap system catalogs.
The initialization sequence in service::initialize() encodes this order:
Destruction proceeds in the reverse order. The replication manager shuts down first (to stop accepting writes), followed by the scheduler (to stop background tasks), followed by the document database, and finally the storage engine.
34.2.3 RAII-Based Lifetime Management
Each subsystem context is held through std::unique_ptr, ensuring deterministic destruction:
class ServerContext final {
public:
ServerContext();
~ServerContext();
auto get_config_context() -> ConfigContext*;
auto get_storage_engine_context() -> StorageEngineContext*;
auto get_document_db() -> db::document::DocumentDB*;
auto get_sql_context() -> SQLContext*;
auto get_scheduler_context() -> SchedulerContext*;
auto get_replication_manager() -> replication::ReplicationManager*;
auto get_keyspace_manager() -> db::kv::KeyspaceManager*;
// Optional component: set after construction if replication is enabled
void set_replication_manager(
std::unique_ptr<replication::ReplicationManager> manager);
private:
std::unique_ptr<ConfigContext> config_context_;
std::unique_ptr<StorageEngineContext> storage_engine_context_;
std::unique_ptr<db::document::DocumentDB> document_db_;
std::unique_ptr<SQLContext> sql_context_;
std::unique_ptr<SchedulerContext> scheduler_context_;
std::unique_ptr<replication::ReplicationManager> replication_manager_;
std::unique_ptr<db::kv::KeyspaceManager> keyspace_manager_;
};
The std::unique_ptr members are destroyed in reverse declaration order when ServerContext is destructed. This automatically enforces the correct shutdown sequence: keyspace_manager_ is destroyed before replication_manager_, which is destroyed before scheduler_context_, and so on.
34.3 Context Hierarchy
34.3.1 Overview
The context hierarchy forms a tree rooted at ServerContext. Each context encapsulates the state that was previously spread across static variables in various translation units.
34.3.2 ConfigContext
ConfigContext owns the parsed configuration tree and the Options struct derived from it. It replaces the static std::optional<Options> and static std::optional<json> that previously lived in options.cpp.
RuntimeOptions is deliberately kept as a process-global static. Test harnesses set RuntimeOptions::is_unit_test before ServerContext exists, creating a chicken-and-egg dependency that is resolved by keeping RuntimeOptions outside the context hierarchy:
namespace cognica::config {
// Process-global: set before ServerContext construction
RuntimeOptions* get_runtime_options();
} // namespace cognica::config
34.3.3 StorageEngineContext
StorageEngineContext owns the RocksDB instance, thread pools, commit observer chain, and optional encryption environment. It replaces seven static variables from storage_engine.cpp.
The commit observer uses a chained pattern where multiple observers are registered and invoked in order on each commit. Because ChainedCommitObserver has deleted move operations, StorageEngineContext must be allocated through std::unique_ptr rather than held by value.
Thread pools are named and specialized:
enum class ThreadPoolName {
kGeneric = 0,
kBatchWrite = 1,
kQuery = 2,
kSchema = 3,
kDiskIO = 4,
};
Each pool is sized according to configuration and workload characteristics. The generic pool handles miscellaneous background work; the batch write pool serializes concurrent write batches; the query pool executes parallel scan operators; the schema pool handles DDL operations; and the disk I/O pool manages compaction and flush operations.
34.3.4 SQLContext
SQLContext is the largest context, owning ten components that were previously singletons or static locals scattered across the SQL layer:
| Component | Previous Location | Role |
|---|---|---|
| SessionRegistry | session_registry.cpp static | Tracks active SQL sessions for pg_stat_activity |
| DatabaseMetadataManager | database_metadata.cpp static | Maps database names to workspace IDs |
| CompilationCache | compilation_cache.cpp static | LRU cache for JIT-compiled expressions |
| TableFunctionRegistry | table_function_registry.cpp static | Registry of SQL table functions |
| GraphAdjacencyCache | graph_functions.cpp static | Caches graph adjacency lists |
| JSONPathCache | jsonpath_cache.cpp static local | Caches parsed JSONPath expressions |
| DeoptStats | deopt.cpp static global | Tracks JIT deoptimization events |
| AggregateFunctionRegistry | expr_utils.cpp static local | Registry of aggregate functions |
| JITRuntimeConfig | jit_config.cpp static | JIT compiler configuration |
| StandardAnalyzer | fts_functions.cpp / fts_matcher.cpp statics | FTS text analysis pipeline |
Construction order within SQLContext matters. GraphAdjacencyCache must be constructed before TableFunctionRegistry because graph table functions reference the cache during registration. DatabaseMetadataManager must be constructed last because its constructor performs database I/O to load persisted metadata.
A key implementation detail: several of these components use private constructors with friend SQLContext to prevent unauthorized instantiation. The standard std::make_unique<T>() cannot access private constructors, even when the calling code is a friend. The workaround uses the raw new operator:
// std::make_unique cannot access private constructors via friend
// This does not compile:
// session_registry_ = std::make_unique<SessionRegistry>();
//
// The correct approach:
session_registry_ = std::unique_ptr<SessionRegistry>(
new SessionRegistry {});
34.3.5 SchedulerContext
SchedulerContext owns the task groups and individual tasks that were previously held in static variables in scheduler.cpp. It is the simplest context — four static variables (a mutex, two vectors, and a map) moved into a class.
Task groups are configured from YAML and started during construction. Individual tasks are registered after construction by subsystems that need periodic background work (statistics collection, compaction scheduling, index maintenance).
34.3.6 LoggerContext
LoggerContext owns the spdlog logger instances and their sink configurations. It replaces the static vectors and atomic flag in logger.cpp.
The logger is kept as a process-global facility despite being encapsulated in a context object. The LOGGER_INFO, LOGGER_WARN, and similar macros expand to calls to logger::get(category), which is invoked from over 120 call sites across the codebase. Threading a logger context through every function signature would impose unacceptable noise for a diagnostic facility that is inherently process-scoped.
34.3.7 ReplicationManager
The ReplicationManager is an optional component — it is only constructed when replication is enabled in the configuration. Unlike other contexts that are constructed during ServerContext::initialize(), the replication manager uses a setter:
// Optional: constructed only when replication is enabled
if (options.replication.enabled) {
auto manager = std::make_unique<ReplicationManager>(options);
auto status = manager->initialize();
if (status.ok()) {
server_context->set_replication_manager(std::move(manager));
}
}
Code that accesses the replication manager must check for null:
auto* manager = server_context->get_replication_manager();
if (manager != nullptr && manager->is_leader()) {
// Replicate the write
}
34.3.8 KeyspaceManager
The KeyspaceManager maps collection names to keyspace IDs within the storage engine. It is owned by ServerContext and passed to subsystems that need to resolve collection metadata. The compaction filter receives it through a factory setter, and service layer components receive it through their constructors.
34.4 Bridge Removal Strategy
34.4.1 The Systematic Approach
Removing the default context bridge required migrating every call site to receive its context through function parameters. The migration proceeded in a disciplined order, working from the outermost layers inward:
-
Service layer. PostgreSQL protocol handlers and Flight SQL handlers received
ServerContext*or specific context pointers through their constructors. -
SQL layer. The SQL execution chain —
SQLSession,SQLExecutor,ExecutionContext,PhysicalPlan,PlanBuilder— was threaded withSQLContext*parameters. -
Storage layer. The storage engine context was threaded through the replication module (manager, replicator, log writer, state machine, applier) and the database core (transactions, iterators, compaction filters).
-
Test fixtures. Approximately 80 test and benchmark files were migrated from
DocumentDB::instance()toservice::get_server_context()->get_document_db().
34.4.2 Signature Cascading
Adding a context parameter to a function cascades through its callers. When is_aggregate_function() was changed to accept SQLContext*, the signature change propagated through six layers of function calls:
Each intermediate function must accept and forward the context parameter even if it does not use the context directly. This is an unavoidable cost of explicit dependency threading. The benefit is that the dependency graph is now visible in function signatures — every function declares exactly which subsystems it requires.
34.4.3 Conditional Fallback During Transition
During the migration period, both the bridge and the direct context path coexisted. Call sites used a conditional fallback pattern to maintain compatibility:
auto* cache = (sql_context != nullptr)
? sql_context->get_compilation_cache()
: &CompilationCache::instance();
This pattern was strictly transitional. Once all callers of a given singleton were migrated, the instance() method and the bridge function were deleted.
34.4.4 Bridges Kept by Design
Four bridges were retained permanently because their call sites cannot receive a context parameter:
-
config::get_runtime_options()— Process-global configuration set beforeServerContextexists. Test harnesses configure runtime options (e.g.,is_unit_test = true) during static initialization. -
logger::get()— Process-global diagnostic facility accessed through macros at 120+ call sites. Threading a logger context through every function would add noise without meaningful benefit. -
Deopt stats bridge — A C
externfunction called from JIT-generated machine code. The JIT emits direct function calls that cannot pass context parameters through the calling convention. -
FAISS I/O bridge — FAISS uses a callback mechanism for custom I/O that does not support user-data parameters. The bridge provides the only path to the storage engine from within FAISS callbacks.
34.5 Testing Implications
34.5.1 Migrating Test Fixtures
The test migration was the largest single phase of the refactoring. Approximately 80 test and benchmark files accessed DocumentDB::instance() to obtain a handle to the document database. Each was migrated to the explicit path:
// Before migration
void SetUp() override {
db_ = DocumentDB::instance();
}
// After migration
void SetUp() override {
db_ = service::get_server_context()->get_document_db();
}
The service::get_server_context() function returns the ServerContext that was created during test initialization. Tests that use the full service layer (SQL integration tests, session tests, replication tests) initialize the complete ServerContext in their SetUpTestSuite() method. Lightweight tests that only need the document database can initialize a minimal context.
34.5.2 Null-Safety Patterns
Context isolation introduced a new failure mode: null context pointers. Components that previously relied on singletons being globally available must now handle the case where a context was not provided. Two patterns address this:
Guard-and-skip: For optional functionality that degrades gracefully.
void execute_with_parallelism(ThreadPool* pool, ...) {
if (pool == nullptr) {
// Fall back to sequential execution
execute_sequential(...);
return;
}
pool->submit([&] { ... });
}
Assert-and-fail: For required dependencies that indicate a programming error if absent.
auto* storage = txn->get_storage_engine_context();
assert(storage != nullptr && "Transaction must have a storage context");
The BasicCounter class (used for RocksDB statistics) required null-safety because lightweight test fixtures running in in-memory-only mode do not initialize a full storage engine context. The IndexIntersectionCursor required a null snapshot guard for the same reason.
34.5.3 Lightweight Test Fixtures vs Full Server Context
The context-isolated architecture enables a spectrum of test configurations:
| Test Type | Context Required | Subsystems Initialized |
|---|---|---|
| Unit tests (pure logic) | None | None |
| Document DB tests | StorageEngineContext + DocumentDB | Storage, collections |
| SQL integration tests | Full ServerContext | All subsystems |
| Replication tests | Full ServerContext + ReplicationManager | All + Raft consensus |
Lightweight fixtures that skip unnecessary subsystems run faster and have fewer failure modes. A document database test does not need the SQL compilation cache or the replication manager.
34.6 Design Principles
34.6.1 Prefer Threading Context Through Parameters
The central design principle is explicit dependency passing: every function declares its dependencies through parameters rather than reaching into global state. This principle has a cost — deeper call chains require more parameters — but produces three benefits:
-
Visible dependencies. Reading a function signature reveals which subsystems it touches. Code review can verify that a function does not access subsystems it should not.
-
Testable in isolation. A function that receives its dependencies as parameters can be tested with mock or minimal implementations.
-
Multi-instance capable. Two
ServerContextinstances can coexist in the same process, each with independent storage engines, document databases, and SQL caches.
34.6.2 Accessor Naming Conventions
Context accessor methods use the get_ prefix consistently:
auto get_config_context() -> ConfigContext*;
auto get_storage_engine_context() -> StorageEngineContext*;
auto get_document_db() -> db::document::DocumentDB*;
auto get_sql_context() -> SQLContext*;
This convention distinguishes accessors from factory methods (which create new objects) and mutation methods (which modify state). The get_ prefix signals that the caller receives a non-owning pointer to an existing object.
34.6.3 Optional Components and Null Checks
Not every subsystem is required in every deployment. The replication manager is only present when replication is enabled. The subscription manager is only present when GraphQL subscriptions are configured. These optional components use a setter method on ServerContext and are accessed through nullable pointers:
// Setter: called conditionally during initialization
void set_replication_manager(
std::unique_ptr<replication::ReplicationManager> manager);
// Getter: returns nullptr if not configured
auto get_replication_manager() -> replication::ReplicationManager*;
Callers must check for null before use. This is not a burden but a feature — it makes the optionality explicit in the code rather than hiding it behind a singleton that silently returns a no-op implementation.
34.6.4 Process-Global Facilities
Two categories of state remain process-global by design:
Diagnostic infrastructure. The logger is used pervasively through macros (LOGGER_INFO, LOGGER_WARN). Requiring a context parameter for every log call would impose unacceptable syntactic overhead on code that has nothing to do with logging.
Pre-context configuration. RuntimeOptions must be available before ServerContext is constructed. Test harnesses set is_unit_test = true during static initialization, before main() runs. Moving this into ServerContext would create a circular dependency.
These exceptions are documented and justified. They are not escape hatches for avoiding the work of threading context — they represent genuine architectural constraints where process-global access is the correct design.
34.6.5 Construction with Private Constructors
Several singleton classes use private constructors to prevent unauthorized instantiation. When these classes declare friend SQLContext, the standard std::make_unique<T>() still fails to compile because make_unique is a separate function template, not a member of the friend class.
The solution avoids workarounds or factory indirection:
// Inside SQLContext constructor (which is a friend)
session_registry_ = std::unique_ptr<SessionRegistry>(
new SessionRegistry {});
This is a well-known C++ idiom. The new expression is evaluated in the friend's scope (where the private constructor is accessible), and the resulting pointer is immediately captured by unique_ptr. No raw pointer escapes.
34.7 Summary
The context-isolated architecture transforms Cognica from a singleton-dependent system to one where every subsystem's lifetime is explicitly managed through ownership:
-
ServerContext is the root owner. It holds every subsystem through
std::unique_ptr, and RAII guarantees correct destruction order. -
Seven context types partition the system state: ConfigContext (configuration), StorageEngineContext (RocksDB and thread pools), DocumentDB (collections and statistics), SQLContext (ten SQL-layer components), SchedulerContext (task groups), ReplicationManager (optional Raft consensus), and KeyspaceManager (collection-to-keyspace mapping).
-
The bridge removal strategy enabled incremental migration: free-function APIs delegated to
ServerContext::get_default()during transition, then were deleted once all callers received context through parameters. -
Signature cascading is the unavoidable cost of explicit dependency threading. Adding a context parameter to a leaf function propagates through every caller in the chain. The benefit is a visible, auditable dependency graph.
-
Four bridges are retained by design:
config::get_runtime_options()(pre-context configuration),logger::get()(diagnostic facility), deopt stats (JIT calling convention constraint), and FAISS I/O (callback API constraint). -
Null-safety patterns handle optional components and lightweight test fixtures. Guard-and-skip provides graceful degradation; assert-and-fail catches programming errors.
-
Multi-instance capability is the architectural payoff. Two
ServerContextinstances can coexist in the same process with fully independent state, enabling multi-tenant deployments, parallel test execution, and safe hot-restart.
The refactoring touched over 150 source files across 20 sub-phases, migrated approximately 80 test and benchmark files, and eliminated over 30 singleton bridges. The result is a codebase where dependencies are explicit, lifetimes are deterministic, and the system is structurally prepared for multi-instance deployment.
References
- Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1994). Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley.
- Meyers, S. (2014). Effective Modern C++: 42 Specific Ways to Improve Your Use of C++11 and C++14. O'Reilly Media.
- Stroustrup, B. (2013). The C++ Programming Language (4th Edition). Addison-Wesley.
- Sutter, H. (2005). Exceptional C++ Style: 40 New Engineering Puzzles, Programming Problems, and Solutions. Addison-Wesley.
- Lakos, J. (2019). Large-Scale C++ Volume I: Process and Architecture. Addison-Wesley.
- Winters, T., Manshreck, T., & Wright, H. (2020). Software Engineering at Google: Lessons Learned from Programming Over Time. O'Reilly Media.
- Hellerstein, J. M., Stonebraker, M., & Hamilton, J. (2007). Architecture of a Database System. Foundations and Trends in Databases.