Chapter 1: Understanding Cognica's Architecture

Before writing your first query, it helps tremendously to understand what happens behind the scenes when you execute SQL. This knowledge will help you write better queries and troubleshoot performance issues.

1.1 The Journey of a SQL Query

When you type a SQL query and press Enter, that simple text goes through a remarkable transformation before returning your results. Let's trace this journey step by step.

Loading diagram...

Step 1: Parsing - Understanding Your Intent

The first thing Cognica does is read your SQL text and figure out what you're asking for. This is called parsing, and Cognica uses the exact same parser that PostgreSQL uses (through a library called libpg_query). This means if your query works in PostgreSQL, it will almost certainly parse correctly in Cognica.

For example, when you write:

SELECT name, email FROM users WHERE active = true ORDER BY name

The parser breaks this down into a structured representation:

  • "This is a SELECT statement"
  • "The user wants two columns: name and email"
  • "The data comes from the users table"
  • "Only include rows where active equals true"
  • "Sort the results by name"

Why This Matters to You: If you ever get a syntax error, it's coming from this parsing stage. The error message will tell you exactly where the parser got confused. Common parsing errors include missing commas, unmatched parentheses, or misspelled keywords.

Pro Tip: You can see exactly how Cognica parses any query using the --show-tree command:

bin/cognica db query sql "SELECT name FROM users WHERE id = 1" --show-tree

This shows you the parse tree in JSON format - incredibly useful for debugging complex queries or understanding how SQL syntax is interpreted.

You can also see the compiled CVM bytecode using the --show-bytecode command:

bin/cognica db query sql "SELECT name FROM users WHERE id = 1" --show-bytecode

This displays the physical execution plan and disassembled bytecode that the CVM (Cognica Virtual Machine) will execute:

---------- Physical Plan ----------
Seq Scan on users  (cost=0.00..110.00 rows=1000 width=32)
      Filter: (id = 1)

---------- CVM Bytecode Disassembly ----------
Query: SELECT name FROM users WHERE id = 1

; CVM Bytecode Disassembly
; ========================
;
; Version:       1.0
; Entry Point:   0x0000
; Code Size:     56 bytes
; Max GPRs:      5
; Max FPRs:      0
; Stack Depth:   0

.constants:
  [  0] string: "users"
  [  1] field_ref: "id"
  [  2] int64: 1
  [  3] field_ref: "name"

.code:
 0000: F0000000                   CURSOR_OPEN   slot[0], [0]  ; open cursor on "users"
 0004: 7300000B                   JUMP_NULL     R0, @0030     ; offset 11
 0008: 90200001                   GET_FIELD     R2, R0, [1]   ; get field "id"
 000C: 07300002                   LOAD_CONSTANT R3, [2]       ; int64: 1
 0010: 5E423000                   CMP_EQ_POLY   R4, R2, R3    ; equal
 0014: 72400005                   JUMP_FALSE    R4, @0028     ; offset 5
 0018: FE0B1000                   DOCUMENT_NEW  R1            ; create empty document
 001C: 90200003                   GET_FIELD     R2, R0, [3]   ; get field "name"
 0020: 92120003                   SET_FIELD     R1, R2, [3]   ; set field "name"
 0024: F5010000                   EMIT_ROW      R1            ; emit row to output
 0028: F1000000                   CURSOR_NEXT   R0, slot[0]   ; fetch next row
 002C: 7000FFF6                   JUMP          @0004         ; offset -10
 0030: F2000000                   CURSOR_CLOSE  slot[0]       ; close cursor
 0034: 79000000                   HALT

---------- Compilation Info ----------
Output columns: [name]
Cursor slots used: 1
Registers used: 5
Compilation time: 259 us

This is useful for understanding execution flow, debugging performance issues, or learning how SQL queries translate to low-level operations.

Step 2: Semantic Analysis - Checking Your Work

After parsing, Cognica needs to verify that your query actually makes sense. This is like a teacher checking your homework:

  • Does the users table actually exist?
  • Does that table have columns called name and email?
  • Is active really a boolean column that can be compared to true?
  • Do you have permission to read from this table?

If anything is wrong, you'll get a clear error message. For example:

SELECT naem FROM users;  -- Typo in column name
-- Error: column "naem" does not exist
-- Hint: Perhaps you meant "name"?

Step 3: Query Planning - Finding the Best Path

This is where Cognica's intelligence really shines. There are often many ways to execute the same query, and the query planner's job is to find the fastest way.

Consider this query:

SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped'

The planner might consider:

  • Should we scan the entire table and check each row?
  • Is there an index on customer_id we could use?
  • Is there an index on status?
  • Should we use both indexes and combine the results?

Cognica's planner is specially optimized for LSM-tree storage. Cognica uses a customized version of RocksDB, modified and optimized specifically for Cognica's workloads. This is important because LSM-trees have different performance characteristics than traditional B-tree databases. What's fast in MySQL might not be fast in Cognica, and vice versa. The planner knows these differences and chooses accordingly.

Step 4: Execution - Getting Your Results

Finally, Cognica executes the plan. The execution engine (called CVM - Cognica Virtual Machine) reads data from storage, applies your filters, performs any calculations, sorts the results, and sends them back to you.

1.2 Understanding Storage: Why It Affects Your Queries

Cognica stores data using an LSM-tree (Log-Structured Merge-tree) architecture through a customized version of RocksDB. This is not vanilla RocksDB - Cognica maintains its own fork with modifications optimized for database workloads, including custom compaction strategies and memory management. You don't need to understand all the details, but knowing the basics helps you write better queries.

Writes Are Fast: When you INSERT data, it goes into memory first and is written to disk later in batches. This makes INSERT operations very fast - much faster than traditional databases that must update on-disk structures immediately.

Reads May Check Multiple Places: When you SELECT data, Cognica might need to check several places: the in-memory buffer, recent disk files, and older compacted files. The query planner accounts for this when deciding how to execute your query.

Indexes Help a Lot: Like all databases, Cognica performs much better when it can use indexes. But because of the LSM-tree architecture, the benefit of indexes can be even more pronounced in Cognica.

Practical Implication: If you notice a query is slow, the first thing to check is whether there's an appropriate index. We'll cover indexes in detail in Chapter 10.


Copyright (c) 2023-2026 Cognica, Inc.