Engineering12 min read
We introduce the process of building a system that automatically extracts and normalizes financial statements from PDFs in various formats using Large Language Models (LLMs). We cover data model design with Structured Output and Pydantic, the extraction process through Google Gemini API, and post-processing methods applicable to real-world scenarios, all implemented in about 200 lines of code.

Read Post

Research4 min read
We propose a new approach to LLM usage by momentarily reconstructing the context.

Read Post

Insights5 min read

Why Did OpenAI Acquire Rockset?

by Tim Yang | July 11, 2024
On June 21, 2024, OpenAI announced the acquisition of database startup Rockset. According to OpenAI, the background of the Rockset acquisition is to improve search infrastructure to make AI more useful. Specifically, what advantages led OpenAI to acquire Rockset?

Read Post

Engineering20 min read
We explain the process of data collection and processing, search, and service development for product search using Cognica. Learn how to index when structured and unstructured data are mixed, and how to transform queries for search using LLM.

Read Post

Insights5 min read
You can easily create RAG (Retrieval Augmented Generation) with just one AI database without complex infrastructure setup.

Read Post

Case Studies9 min read
Methods to overcome the limitations of Large Language Models (LLMs) by utilizing Vector Databases (VectorDBs) are gaining attention. To provide accurate answers on specialized information such as law firm case precedents or company communication records—domain data that is not included in the training data—we can use a Vector Database that can convert, store, and search all kinds of data into vector embeddings, serving as a long-term memory storage for LLMs. To illustrate this, we examine a concrete case of how a vector database can complement an LLM through processes like data preprocessing, vectorization, storage, and search, using a Q&A system based on Wikipedia.

Read Post