Cognica Blog

All Insights Tech Research Engineering Case Studies Company

Automated Financial Statement Extraction from PDFs Using LLMs

by Cognica Team | November 18, 2025

We introduce the process of building a system that automatically extracts and normalizes financial statements from PDFs in various formats using Large Language Models (LLMs). We cover data model design with Structured Output and Pydantic, the extraction process through Google Gemini API, and post-processing methods applicable to real-world scenarios, all implemented in about 200 lines of code.

Read Post

Research4 min read

Momentarily Reconstructed Contexts: A New Approach to LLM Usage

by Jaepil Jeong | December 15, 2024

We propose a new approach to LLM usage by momentarily reconstructing the context.

Read Post

Insights5 min read

Why Did OpenAI Acquire Rockset?

by Tim Yang | July 11, 2024

On June 21, 2024, OpenAI announced the acquisition of database startup Rockset. According to OpenAI, the background of the Rockset acquisition is to improve search infrastructure to make AI more useful. Specifically, what advantages led OpenAI to acquire Rockset?

Read Post

Engineering20 min read

Applying Natural Language Search to Product Search

by Cognica Team | June 12, 2024

We explain the process of data collection and processing, search, and service development for product search using Cognica. Learn how to index when structured and unstructured data are mixed, and how to transform queries for search using LLM.

Read Post

Insights5 min read

An AI Database for RAG (Retrieval Augmented Generation)

by Tim Yang | December 11, 2023

You can easily create RAG (Retrieval Augmented Generation) with just one AI database without complex infrastructure setup.

Read Post

Case Studies9 min read

Case Study, Developing a Q&A System Using Vector DB and LLM

by Tim Yang | September 17, 2023

Methods to overcome the limitations of Large Language Models (LLMs) by utilizing Vector Databases (VectorDBs) are gaining attention. To provide accurate answers on specialized information such as law firm case precedents or company communication records—domain data that is not included in the training data—we can use a Vector Database that can convert, store, and search all kinds of data into vector embeddings, serving as a long-term memory storage for LLMs. To illustrate this, we examine a concrete case of how a vector database can complement an LLM through processes like data preprocessing, vectorization, storage, and search, using a Q&A system based on Wikipedia.

Read Post

Terms Privacy