Automated Financial Statement Extraction from PDFs Using LLMs
November 18, 2025
We introduce the process of building a system that automatically extracts and normalizes financial statements from PDFs in various formats using Large Language Models (LLMs). We cover data model design with Structured Output and Pydantic, the extraction process through Google Gemini API, and post-processing methods applicable to real-world scenarios, all implemented in about 200 lines of code.
Read Post
The Monic Framework: How Cognica is Reshaping API Architecture
October 24, 2025
The Monic framework addresses REST's endpoint explosion problem by allowing clients to express their intent as computational expressions through a single /compute endpoint, redefining APIs as a "Computational Interface" integrated with the database.
Read Post
Why NOT Operations are Difficult in Vector Search
February 3, 2025
We discuss why NOT operations are difficult in vector search.
Read Post
Explains the limitations and characteristics of vector embeddings and covers the improvements made to store them.
Read Post
Searching Case Law Data with Natural Language
July 4, 2024
Explains how to build a natural language search service by applying vector search to a case law search demo using FTS.
Read Post
Making Case Law Data Quickly Searchable
June 21, 2024
Explains the process of downloading case law data and building a case law search service in just one day using Cognica.
Read Post
Applying Natural Language Search to Product Search
June 12, 2024
We explain the process of data collection and processing, search, and service development for product search using Cognica. Learn how to index when structured and unstructured data are mixed, and how to transform queries for search using LLM.






