Integration: Oracle
Use Oracle AI Vector Search as a Document Store with Haystack
Table of Contents
Overview
Oracle AI Vector Search is a feature of Oracle Database 23ai and 26ai that provides native vector storage and similarity search using the VECTOR data type โ no extensions or plugins required.
This integration provides an OracleDocumentStore and an OracleEmbeddingRetriever that you can use in Haystack pipelines. It supports HNSW indexing for fast approximate search, metadata filtering with the full Haystack filter grammar, wallet-based TLS connections to Oracle Autonomous Database, and async variants for all public methods.
For a detailed overview of all available methods and settings, visit the Haystack API Reference.
Installation
pip install oracle-haystack
No separate Oracle Client install is required โ the underlying python-oracledb driver runs in thin mode by default.
Usage
Once installed, initialize an OracleDocumentStore. For a local Oracle Database:
from haystack.utils import Secret
from haystack_integrations.document_stores.oracle import OracleDocumentStore, OracleConnectionConfig
document_store = OracleDocumentStore(
connection_config=OracleConnectionConfig(
user="scott",
password=Secret.from_env_var("ORACLE_PASSWORD"),
dsn="localhost:1521/freepdb1",
),
embedding_dim=768,
)
For Oracle Autonomous Database on OCI, provide the wallet location for TLS authentication:
document_store = OracleDocumentStore(
connection_config=OracleConnectionConfig(
user="admin",
password=Secret.from_env_var("ORACLE_PASSWORD"),
dsn="mydb_low",
wallet_location="/path/to/wallet",
wallet_password=Secret.from_env_var("ORACLE_WALLET_PASSWORD"),
),
embedding_dim=768,
distance_metric="COSINE",
)
Writing Documents to OracleDocumentStore
To write documents to your OracleDocumentStore, create an indexing pipeline with a
DocumentWriter, or use the write_documents() function. You can use the available
Converters and
PreProcessors, as well as other
Integrations that might help you fetch data from other resources.
Indexing Pipeline
from haystack import Pipeline
from haystack.components.converters import TextFileToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("converter", TextFileToDocument())
indexing_pipeline.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=2))
indexing_pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store))
indexing_pipeline.connect("converter", "splitter")
indexing_pipeline.connect("splitter", "embedder")
indexing_pipeline.connect("embedder", "writer")
indexing_pipeline.run({"converter": {"sources": ["filename.txt"]}})
For faster approximate nearest-neighbor search on large collections, create an HNSW index once after the first batch of writes โ Oracle maintains it incrementally as new documents are added, so you don’t need to rebuild it after each ingestion:
document_store.create_hnsw_index()
The call is idempotent (CREATE VECTOR INDEX IF NOT EXISTS), so re-running it is safe. You can also pass create_index=True to OracleDocumentStore(...) to have the index created automatically on initialization.
Using Oracle in a RAG Pipeline
Once you have documents in your OracleDocumentStore, you can use the OracleEmbeddingRetriever in any Haystack pipeline. Below is a pipeline that retrieves relevant documents and generates an answer using an LLM.
from haystack import Pipeline
from haystack.utils import Secret
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack_integrations.components.retrievers.oracle import OracleEmbeddingRetriever
prompt_template = """Answer the following query based on the provided context. If the context does
not include an answer, reply with 'I don't know'.\n
Query: {{query}}
Documents:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
Answer:
"""
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", OracleEmbeddingRetriever(document_store=document_store, top_k=5))
query_pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template))
query_pipeline.add_component("generator", OpenAIGenerator(api_key=Secret.from_env_var("OPENAI_API_KEY"), model="gpt-4o"))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "prompt_builder.documents")
query_pipeline.connect("prompt_builder", "generator")
query = "What is Oracle AI Vector Search?"
results = query_pipeline.run(
{
"text_embedder": {"text": query},
"prompt_builder": {"query": query},
}
)
License
oracle-haystack is distributed under the terms of the
Apache-2.0 license.
