RouteDocuments
The RouteDocuments
node makes it possible to split documents by content_type
or a metadata field. It takes a list of documents as input and seggregates them by either content_type
or a meta value.
This node is handy if you have different types of data, for example tables and text. You can then use it to route each document type to a Reader trained on it.
Usage
You can initialize RouteDocuments
to split documents by content type, which is the default method. This means that documents are split into documents containing text and documents containing tables. To initialize RouteDocuments
this way, run:
route_documents = RouteDocuments()
You can also initialize RouteDocuments
to split documents based on a metadata field. To do this, specify the medatada and its values when initializing the node. For example, if your documents contain a metadata field called language
and you want to split your documents into German, English, and Spanish documents, here's how you initiate RouteDocuments
:
route_documents() = RouteDocuments(split_by="language", metadata_values=["de", "en", "es"])