Haystack docs home page

RouteDocuments

The RouteDocuments node makes it possible to split documents by content_type or a metadata field. It takes a list of documents as input and seggregates them by either content_type or a meta value.

This node is handy if you have different types of data, for example tables and text. You can then use it to route each document type to a Reader trained on it.

Usage

You can initialize RouteDocuments to split documents by content type, which is the default method. This means that documents are split into documents containing text and documents containing tables. To initialize RouteDocuments this way, run:

route_documents = RouteDocuments()

You can also initialize RouteDocuments to split documents based on a metadata field. To do this, specify the medatada and its values when initializing the node. For example, if your documents contain a metadata field called language and you want to split your documents into German, English, and Spanish documents, here's how you initiate RouteDocuments:

route_documents() = RouteDocuments(split_by="language", metadata_values=["de", "en", "es"])