DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

Tracing

This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it.

Traces document the flow of requests through your application and are vital for monitoring applications in production. This helps to understand the execution order of your Pipeline components and analyze where your Pipeline spends the most time.

Configuring a Tracing Backend

Instrumented applications typically send traces to a trace collector or a tracing backend. Haystack provides out-of-the-box support for OpenTelemetry and Datadog. You can also quickly implement support for additional providers of your choosing.

OpenTelemetry

To use OpenTelemetry as your tracing backend, follow these steps:

  1. Install the OpenTelemetry SDK:

    pip install opentelemetry-sdk
    pip install opentelemetry-exporter-otlp
    
  2. To add traces to even deeper levels of your Pipelines, we recommend you check out OpenTelemetry integrations, such as:

  3. There are two options for how to hook Haystack to the OpenTelemetry SDK.

    • Run your Haystack applications using OpenTelemetry’s automated instrumentation. Haystack will automatically detect the configured tracing backend and use it to send traces.

      First, install the OpenTelemetry CLI:

      pip install opentelemetry-distro
      

      Then, run your Haystack application using the OpenTelemetry SDK:

      opentelemetry-instrument \
          --traces_exporter console \
          --metrics_exporter console \
          --logs_exporter console \
          --service_name my-haystack-app \
          <command to run your Haystack pipeline>
      

    β€” or β€”

    • Configure the tracing backend in your Python code:

      from opentelemetry import trace
      from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
      from opentelemetry.sdk.trace import TracerProvider
      from opentelemetry.sdk.trace.export import BatchSpanProcessor
      
      # Service name is required for most backends
      resource = Resource(attributes={
          SERVICE_NAME: "haystack"
      })
      
      traceProvider = TracerProvider(resource=resource)
      processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces"))
      traceProvider.add_span_processor(processor)
      trace.set_tracer_provider(traceProvider)
      
      # Tell Haystack to auto-detect the configured tracer
      import haystack.tracing
      haystack.tracing.auto_enable_tracing()
      
      # Explicitly tell Haystack to use your tracer
      from haystack.tracing import OpenTelemetryTracer
      
      tracer = traceProvider.get_tracer("my_application")
      tracing.enable_tracing(OpenTelemetryTracer(tracer))
      

Datadog

To use Datadog as your tracing backend, follow these steps:

  1. Install Datadog’s tracing library ddtrace.

    pip install ddtrace
    
  2. There are two options for how to hook Haystack to ddtrace.

    • Run your Haystack application using the ddtrace:
      ddtrace <command to run your Haystack pipeline
      

    β€” or β€”

    • Configure the Datadog tracing backend in your Python code:

      from haystack.tracing.datadog import DatadogTracer
      from haystack import tracing
      import ddtrace
      
      tracer = ddtrace.tracer
      tracing.enable_tracing(DatadogTracer(tracer))
      

Custom Tracing Backend

To use your custom tracing backend with Haystack, follow these steps:

  1. Implement the Tracer interface. The following code snippet provides an example using the OpenTelemetry package:

    import contextlib
    from typing import Optional, Dict, Any, Iterator
    
    from opentelemetry import trace
    from opentelemetry.trace import NonRecordingSpan
    
    from haystack.tracing import Tracer, Span
    from haystack.tracing import utils as tracing_utils
    import opentelemetry.trace
    
    class OpenTelemetrySpan(Span):
       def __init__(self, span: opentelemetry.trace.Span) -> None:
           self._span = span
    
       def set_tag(self, key: str, value: Any) -> None:
    			 # Tracing backends usually don't support any tag value
    			 # `coerce_tag_value` forces the value to either be a Python
    			 # primitive (int, float, boolean, str) or tries to dump it as string.
           coerced_value = tracing_utils.coerce_tag_value(value)
           self._span.set_attribute(key, coerced_value)
    
    class OpenTelemetryTracer(Tracer):
       def __init__(self, tracer: opentelemetry.trace.Tracer) -> None:
           self._tracer = tracer
    
       @contextlib.contextmanager
       def trace(self, operation_name: str, tags: Optional[Dict[str, Any]] = None) -> Iterator[Span]:
           with self._tracer.start_as_current_span(operation_name) as span:
               span = OpenTelemetrySpan(span)
               if tags:
                   span.set_tags(tags)
    
               yield span
    
       def current_span(self) -> Optional[Span]:
           current_span = trace.get_current_span()
           if isinstance(current_span, NonRecordingSpan):
               return None
    
           return OpenTelemetrySpan(current_span)
    
  2. Tell Haystack to use your custom tracer:

    from haystack import tracing
    
    haystack_tracer = OpenTelemetryTracer(tracer)
    tracing.enable_tracing(haystack_tracer)
    

Disabling Auto Tracing

Haystack automatically detects and enables tracing under the following circumstances:

  • If opentelemetry-sdk is installed and configured for OpenTelemetry.
  • If ddtrace is installed for Datadog.

To disable this behavior, there are two options:

  • Set the environment variable HAYSTACK_AUTO_TRACE_ENABLED to false when running your Haystack application

β€” or β€”

  • Disable tracing in Python:

    from haystack.tracing import disable_tracing
    
    disable_tracing()
    

Content Tracing

Haystack also allows you to trace your Pipeline components' input and output values. This is useful for investigating your Pipeline execution step by step.

By default, this behavior is disabled to prevent sensitive user information from being sent to your tracing backend.

To enable content tracing, there are two options:

  • Set the environment variable HAYSTACK_CONTENT_TRACING_ENABLED to true when running your Haystack application

β€” or β€”

  • Explicitly enable content tracing in Python:

    from haystack import tracing
    
    tracing.tracer.is_content_tracing_enabled = True
    

Visualizing Traces During Development

Use Jaeger as a lightweight tracing backend for local Pipeline development. This allows you to experiment with tracing without the need for a complex tracing backend.

An illustrative screenshot of Jaeger UI.
  1. Run the Jaeger container. This creates a tracing backend as well as a UI to visualize the traces:

    docker run --rm -d --name jaeger \
      -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
      -p 6831:6831/udp \
      -p 6832:6832/udp \
      -p 5778:5778 \
      -p 16686:16686 \
      -p 4317:4317 \
      -p 4318:4318 \
      -p 14250:14250 \
      -p 14268:14268 \
      -p 14269:14269 \
      -p 9411:9411 \
      jaegertracing/all-in-one:1
    
  2. Install the OpenTelemetry SDK:

    pip install opentelemetry-sdk
    pip install opentelemetry-exporter-otlp
    
  3. Configure OpenTelemetry to use the Jaeger backend:

    from opentelemetry.sdk.resources import SERVICE_NAME, Resource
    
    from opentelemetry import trace
    from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.export import BatchSpanProcessor
    
    # Service name is required for most backends
    resource = Resource(attributes={
        SERVICE_NAME: "haystack"
    })
    
    traceProvider = TracerProvider(resource=resource)
    processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces"))
    traceProvider.add_span_processor(processor)
    trace.set_tracer_provider(traceProvider)
    
  4. Tell Haystack to use OpenTelemetry for tracing:

    import haystack.tracing
    
    haystack.tracing.auto_enable_tracing()
    
  5. Run your pipeline:

    ...
    pipeline.run(...)
    ...
    
  6. Inspect the traces in the UI provided by Jaeger at http://localhost:16686.