Integration: FunASR
Transcribe audio files to Documents using FunASR โ an open-source, self-hosted speech recognition toolkit supporting 50+ languages.
Table of Contents
Overview
FunASR is an open-source speech recognition toolkit from Alibaba DAMO Academy that runs entirely locally โ no API key required. It supports 50+ languages, speaker diarization, and timestamp extraction.
This integration provides:
-
FunASRTranscriber: Transcribes audio files to HaystackDocumentobjects. Accepts file paths,Pathobjects, andByteStreaminputs.
Installation
pip install funasr-haystack
Usage
FunASRTranscriber
FunASRTranscriber transcribes audio files to Haystack Document objects using FunASR models. Models are downloaded from ModelScope on first use and cached in ~/.cache/modelscope.
Basic Example
from haystack_integrations.components.audio.funasr import FunASRTranscriber
transcriber = FunASRTranscriber()
result = transcriber.run(sources=["speech.wav", "interview.mp3"])
for doc in result["documents"]:
print(doc.content)
In a Pipeline
from haystack import Pipeline
from haystack.components.fetchers import LinkContentFetcher
from haystack_integrations.components.audio.funasr import FunASRTranscriber
pipe = Pipeline()
pipe.add_component("fetcher", LinkContentFetcher())
pipe.add_component("transcriber", FunASRTranscriber())
pipe.connect("fetcher", "transcriber")
result = pipe.run(
data={
"fetcher": {
"urls": ["https://example.com/speech.wav"],
},
}
)
print(result["transcriber"]["documents"][0].content)
Speaker Diarization
from haystack.utils import ComponentDevice
from haystack_integrations.components.audio.funasr import FunASRTranscriber
transcriber = FunASRTranscriber(
model="paraformer-zh",
vad_model="fsmn-vad",
punc_model="ct-punc",
spk_model="cam++",
device=ComponentDevice.from_str("cuda"),
)
result = transcriber.run(sources=["meeting.wav"])
doc = result["documents"][0]
print(doc.content)
print("Speakers:", doc.meta.get("speakers"))
License
funasr-haystack is distributed under the terms of the
Apache-2.0 license.
