Overview

External Virtual Tables represent one of the most powerful features in Cognica Database, fundamentally changing how organizations can interact with their data ecosystem. Rather than requiring all data to be imported into Cognica before it can be queried, External Virtual Tables allow you to query data exactly where it lives - whether that is in a PostgreSQL database, a MySQL server, an S3-hosted Delta Lake table, or a collection of Parquet files on Azure Blob Storage.

This capability addresses a critical challenge in modern data architectures: data is rarely centralized in a single system. Organizations typically have transactional data in relational databases, historical data in data lakes, log files on cloud storage, and analytics results in various formats. Traditional approaches require building ETL pipelines to move this data into a single query engine, which introduces latency, increases storage costs, and creates synchronization challenges.

External Virtual Tables eliminate these problems by bringing the query to the data rather than the data to the query. When you execute a SQL statement against an External Virtual Table, Cognica translates your query into the appropriate format for the target system, executes it remotely, and streams the results back - all transparently, as if you were querying a native Cognica collection.

Why External Virtual Tables Matter

The value proposition of External Virtual Tables extends beyond simple convenience. Consider a typical enterprise scenario: your customer data lives in PostgreSQL, your clickstream events are stored in Delta Lake on S3, and your product catalog is managed in MySQL. Without External Virtual Tables, answering a question like "Which premium customers viewed products but did not purchase in the last 30 days?" would require:

  1. Exporting data from each source system
  2. Loading it into a common data warehouse
  3. Running the analytical query
  4. Maintaining synchronization as source data changes

With External Virtual Tables, this becomes a single SQL query that joins data from all three sources in real-time, always reflecting the current state of each system.

Key Capabilities

External Virtual Tables provide a rich set of capabilities designed for production use:

Federated Query Execution enables you to write SQL queries that seamlessly join data from multiple external sources with each other and with native Cognica collections. The query optimizer understands the capabilities of each source and generates efficient execution plans that minimize data movement.

Predicate Pushdown ensures that filter conditions in your WHERE clauses are pushed down to the external data source whenever possible. This is critical for performance - rather than fetching an entire table and filtering locally, Cognica instructs the remote system to perform the filtering, dramatically reducing network transfer and processing time.

Projection Pushdown works similarly for column selection. When your query only references specific columns, Cognica requests only those columns from the external source, further reducing data transfer overhead.

Automatic Schema Inference eliminates the need to manually define table schemas. When you create an External Virtual Table, Cognica connects to the source, inspects its schema, and automatically maps the source types to Cognica types. This schema is cached locally for performance but can be refreshed when the source schema changes.

Extension Auto-Installation handles the complexity of DuckDB extension management transparently. When you first query a PostgreSQL-backed virtual table, Cognica automatically downloads, installs, and loads the required postgres extension. This happens once and is cached for subsequent queries.

Connection Pooling and Caching optimize repeated access to external sources. Database connections are pooled and reused across queries, and database attachments are cached to avoid redundant authentication handshakes.


Copyright (c) 2023-2026 Cognica, Inc.