Haystack docs home page

Module file_type

FileTypeClassifier

class FileTypeClassifier(BaseComponent)

Route files in an Indexing Pipeline to corresponding file converters.

__init__

def __init__(supported_types: List[str] = DEFAULT_TYPES)

Node that sends out files on a different output edge depending on their extension.

Arguments:

  • supported_types: the file types that this node can distinguish. Note that it's limited to a maximum of 10 outgoing edges, which correspond each to a file extension. Such extension are, by default txt, pdf, md, docx, html. Lists containing more than 10 elements will not be allowed. Lists with duplicate elements will also be rejected.

run

def run(file_paths: Union[Path, List[Path], str, List[str], List[Union[Path, str]]])

Sends out files on a different output edge depending on their extension.

Arguments:

  • file_paths: paths to route on different edges.