3 C
New York
Thursday, January 22, 2026

OceanBase Releases seekdb: An Open Supply AI Native Hybrid Search Database for Multi-model RAG and AI Brokers


AI functions hardly ever cope with one clear desk. They combine consumer profiles, chat logs, JSON metadata, embeddings, and typically spatial information. Most groups reply this with a patchwork of an OLTP database, a vector retailer, and a search engine. OceanBase launched seekdb, an open supply AI targeted database (beneath the Apache 2.0 license). seekdb is described as an AI native search database that unifies relational information, vector information, textual content, JSON, and GIS in a single engine and exposes hybrid search and in database AI workflows. 

What’s seekdb?

seekdb is positioned because the light-weight, embedded model of the OceanBase engine, aimed toward AI functions fairly than basic function distributed deployments. It runs as a single node database, helps embedded mode and shopper or server mode, and stays suitable with MySQL drivers and SQL syntax.

Within the functionality matrix, seekdb is marked as:

  • Embedded database supported
  • Standalone database supported
  • Distributed database not supported

whereas the total OceanBase product covers the distributed case.

From a knowledge mannequin perspective, seekdb helps:

  • Relational information with customary SQL
  • Vector search
  • Full textual content search
  • JSON information
  • Spatial GIS information

all inside one storage and indexing layer.

Hybrid search because the core characteristic

The principle characteristic OceanBase pushes is hybrid search. That is search that mixes vector primarily based semantic retrieval, full textual content key phrase retrieval, and scalar filters in a single question and a single rating step.

seekdb implements hybrid search by way of a system bundle named DBMS_HYBRID_SEARCH with two entry factors:

  • DBMS_HYBRID_SEARCH.SEARCH which returns outcomes as JSON, sorted by relevance
  • DBMS_HYBRID_SEARCH.GET_SQL which returns the concrete SQL string used for execution

The hybrid search path can run:

  • pure vector search
  • pure full textual content search
  • mixed hybrid search

and might push relational filters and joins down into storage. It additionally helps question reranking methods like weighted scores and reciprocal rank fusion and might plug in massive language mannequin primarily based re-rankers.

For retrieval augmented technology (RAG) and agent reminiscence, this implies you’ll be able to write a single SQL question that does semantic matching on embeddings, precise matching on product codes or correct nouns, and relational filtering on consumer or tenant scopes.

Vector and full textual content engine particulars

At its core, seekdb exposes a trendy vector and full textual content stack.

For vectors, seekdb:

  • helps dense vectors and sparse vectors
  • helps Manhattan, Euclidean, inside product, and cosine distance metrics
  • offers in reminiscence index sorts comparable to HNSW, HNSW SQ, HNSW BQ
  • offers disk primarily based index sorts together with IVF and IVF PQ

Hybrid vector index present how one can retailer uncooked textual content, let seekdb name an embedding mannequin mechanically, and have the system preserve the corresponding vector index with no separate preprocessing pipeline.

For textual content, seekdb presents full textual content search with:

  • key phrase, phrase, and Boolean queries
  • BM25 rating for relevance
  • a number of tokenizer modes

The important thing level is that full textual content and vector indexes are top quality and are built-in in the identical question planner as scalar indexes and GIS indexes, so hybrid search doesn’t want exterior orchestration.

AI features contained in the database

seekdb consists of in-built AI perform expressions that allow you to name fashions immediately from SQL, with no separate utility service mediating each name. The principle features are:

  • AI_EMBED to transform textual content into embeddings
  • AI_COMPLETE for textual content technology utilizing a chat or completion mannequin
  • AI_RERANK to rerank a listing of candidates
    AI_PROMPT to assemble immediate templates and dynamic values right into a JSON object for AI_COMPLETE

Mannequin metadata and endpoints are managed by the DBMS_AI_SERVICE bundle, which helps you to register exterior suppliers, set URLs, and configure keys, all on the database facet. 

Multimodal information and workloads

seekdb is constructed to deal with a number of information modalities in a single node. it has a multimodal information and indexing layer that covers vectors, textual content, JSON, and GIS, and a multi-model compute layer for hybrid workloads throughout vector, full textual content, and scalar circumstances.

It additionally offers JSON indexes for metadata queries and GIS indexes for spatial circumstances. This enables queries like:

  • discover semantically comparable paperwork
  • filter by JSON metadata like tenant, area, or class
  • constrain by spatial vary or polygon

with out leaving the identical engine.

As a result of seekdb is derived from the OceanBase engine, it inherits ACID transactions, row and column hybrid storage, and vectorized execution, though excessive scale distributed deployments stay a job for the total OceanBase database.

Comparability Desk

Key Takeaways

  1. AI native hybrid search: seekdb unifies vector search, full textual content search and relational filtering in a single SQL and DBMS_HYBRID_SEARCH interface, so RAG and agent workloads can run multi sign retrieval in a single question as an alternative of sewing collectively a number of engines.
  2. Multimodal information in a single engine: seekdb shops and indexes relational information, vectors, textual content, JSON and GIS in the identical engine, which lets AI functions maintain paperwork, embeddings and metadata constant with out sustaining separate databases.
  3. In database AI features for RAG: With AI_EMBED, AI_COMPLETE, AI_RERANK and AI_PROMPT, seekdb can name embedding fashions, LLMs and rerankers immediately from SQL, which simplifies RAG pipelines and strikes extra orchestration logic into the database layer.
  4. Single node, embedded pleasant design: seekdb is a single node, MySQL suitable engine that helps embedded and standalone modes, whereas distributed, massive scale deployments stay the position of full OceanBase, which makes seekdb appropriate for native, edge and repair embedded AI workloads.
  5. Open supply and power ecosystem: seekdb is open sourced beneath Apache 2.0 and integrates with a rising ecosystem of AI instruments and frameworks, with Python assist through pyseekdb and MCP primarily based integration for code assistants and brokers, so it may well act as a unified information airplane for AI functions.

Try the Repo and Undertaking. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be at liberty to observe us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you’ll be able to be a part of us on telegram as properly.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles