<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Blog</title>
    <link>https://haystack.deepset.ai/blog/</link>
    <description>Recent content on Haystack</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Mon, 22 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://haystack.deepset.ai/blog/index.xml" rel="self" type="application/rss+xml" />
    
    <item>
      <title>MCP &#43; Haystack: A Practical Guide for AI Engineers</title>
      <link>https://haystack.deepset.ai/blog/mcp-with-haystack/</link>
      <pubDate>Mon, 22 Jun 2026 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/mcp-with-haystack/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;The Model Context Protocol (MCP) is the connective tissue of the modern AI stack. If you are building AI agents or production RAG systems, understanding how MCP works with Haystack is one of the most direct ways to make your applications composable, reusable, and reachable from the tools your users already live in.&lt;/p&gt;
&lt;p&gt;This blog post explains what MCP is, why it matters, and the different ways you can use MCP with Haystack, whether you want to &lt;em&gt;consume&lt;/em&gt; external tools inside a Haystack agent or &lt;em&gt;expose&lt;/em&gt; your Haystack pipelines and agents as MCP tools for MCP clients like Claude, ChatGPT, and Cursor.&lt;/p&gt;
&lt;h2 id=&#34;what-is-mcp-model-context-protocol&#34;&gt;What is MCP (Model Context Protocol)?&lt;/h2&gt;
&lt;p&gt;The 
&lt;a href=&#34;https://modelcontextprotocol.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Model Context Protocol (MCP)&lt;/a&gt; is an open standard that defines how AI applications connect to external tools, data sources, and services. The common analogy is that MCP is the “USB-C of AI”: instead of writing a bespoke integration for every model and every assistant, you implement the protocol once and any MCP-compatible client can use it.&lt;/p&gt;
&lt;p&gt;An MCP setup has two sides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;MCP servers&lt;/strong&gt; expose capabilities (tools, prompts, resources) over a standardized interface.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MCP clients&lt;/strong&gt; (an LLM application, an agent, or an AI assistant like Claude Code, ChatGPT, or Cursor) discover those capabilities and call them at runtime.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A “tool” in MCP terms is just a callable with a name, a description, and a JSON Schema describing its inputs. That small, standardized contract is what makes the whole ecosystem interoperable.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Security note:&lt;/strong&gt; MCP servers may introduce security risks. Exercise caution when connecting to MCP servers to ensure they do not expose sensitive data or perform malicious or unsafe actions.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;why-mcp-and-haystack-are-a-strong-combination&#34;&gt;Why MCP and Haystack are a strong combination&lt;/h2&gt;
&lt;p&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/intro&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt; is an open-source AI framework for building production-ready agents, RAG applications, and multimodal search systems out of reusable components and pipelines. MCP and Haystack complement each other in both directions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Haystack as an MCP client.&lt;/strong&gt; Your agent gains instant access to a growing ecosystem of MCP servers (filesystem, Git/GitHub, databases, web search, internal services) without you having to hand-roll each integration.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Haystack as an MCP server.&lt;/strong&gt; The retrieval logic, agentic workflows, and domain knowledge you have already encoded in a pipeline become callable from any MCP client. You build the capability once and it shows up wherever your users work.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/mcp-with-haystack/mcp-haystack.png alt=&#34;Haystack as MCP client &amp; server&#34;  /&gt;
        &lt;figcaption&gt;Haystack works on both sides of MCP: as a client consuming external tools, and as a server exposing pipelines and agents to any MCP-compatible assistant&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;The payoff for AI engineers is &lt;strong&gt;composability and reach&lt;/strong&gt;. A single agentic flow can pull data from a Haystack RAG pipeline, hand off to a live web-context MCP tool, and route through another pipeline for structured output, connecting your own logic with external tools instead of re-implementing everything inside one monolithic agent.&lt;/p&gt;
&lt;p&gt;The rest of this guide covers &lt;em&gt;three&lt;/em&gt; concrete approaches:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Use MCP servers as tools inside a Haystack Agent (
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/mcptool&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;MCPTool&lt;/code&gt;&lt;/a&gt;, 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/mcptoolset&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;MCPToolset&lt;/code&gt;&lt;/a&gt;, and 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/searchabletoolset&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;SearchableToolset&lt;/code&gt;&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Deploy any Haystack pipeline or agent as an MCP server with &lt;strong&gt;Hayhooks&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Expose Haystack pipelines/agents as managed MCP tools on the &lt;strong&gt;Haystack Enterprise Platform&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These approaches are not mutually exclusive. You can connect a Haystack agent to MCP servers, then expose that same agent as an MCP tool via Hayhooks or the Haystack Enterprise Platform.&lt;/p&gt;
&lt;h2 id=&#34;approach-1-haystack-as-an-mcp-client&#34;&gt;Approach 1: Haystack as an MCP client&lt;/h2&gt;
&lt;p&gt;Using MCP servers as tools is the most common entry point. You have a Haystack 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/agent&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;Agent&lt;/code&gt;&lt;/a&gt; and you want it to be able to call tools that live behind an MCP server. In practice, you can combine &lt;code&gt;MCPTool&lt;/code&gt; and &lt;code&gt;MCPToolset&lt;/code&gt; from the &lt;code&gt;mcp-haystack&lt;/code&gt; integration with Haystack’s &lt;code&gt;SearchableToolset&lt;/code&gt; for larger catalogs.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pip install mcp-haystack
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;mcptool-bind-a-single-mcp-tool&#34;&gt;&lt;code&gt;MCPTool&lt;/code&gt;: bind a single MCP tool&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;MCPTool&lt;/code&gt; connects to an MCP server and exposes &lt;strong&gt;one specific tool&lt;/strong&gt; to your agent. This gives you precise control over exactly which capability the LLM can reach. It supports Streamable HTTP and stdio transports.&lt;/p&gt;
&lt;p&gt;Here is a single MCP tool wired directly into a Haystack &lt;code&gt;Agent&lt;/code&gt; (using the official &lt;code&gt;mcp-server-time&lt;/code&gt; server):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# pip install ollama-haystack&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.agents&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators.chat&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OllamaChatGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.dataclasses&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.tools.mcp&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;MCPTool&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;StdioServerInfo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;time_tool&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;MCPTool&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;get_current_time&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;server_info&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;StdioServerInfo&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;command&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;uvx&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;args&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;mcp-server-time&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;--local-timezone=Europe/Berlin&#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;chat_generator&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;OllamaChatGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gemma4:e4b&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;time_tool&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What is the time in New York? Be brief.&#34;&lt;/span&gt;)],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;last_message&#34;&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Result:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;8:42 AM Thursday (New York)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;The Agent component is model-agnostic, so you can swap in any Haystack chat generator your stack supports.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To connect to a remote server instead, swap the transport:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.tools.mcp&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;MCPTool&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;StreamableHttpServerInfo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;server_info&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;StreamableHttpServerInfo&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;http://localhost:8000/mcp&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;tool&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;MCPTool&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;my_tool&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;server_info&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;server_info&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;mcptoolset-load-a-whole-servers-tools-at-once&#34;&gt;&lt;code&gt;MCPToolset&lt;/code&gt;: load a whole server’s tools at once&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;MCPToolset&lt;/code&gt; connects to an MCP server and automatically discovers and loads its tools into a single, manageable unit. It is a subclass of Haystack’s &lt;code&gt;Toolset&lt;/code&gt;, so it plugs directly into a Chat Generator, a &lt;code&gt;ToolInvoker&lt;/code&gt;, or an &lt;code&gt;Agent&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The key feature for controlling agent behavior is the &lt;code&gt;tool_names&lt;/code&gt; filter, which lets you decide exactly which tools from the server your agent is allowed to use. Here is a practical example with the 
&lt;a href=&#34;https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;official filesystem MCP server&lt;/a&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# pip install mistral-haystack&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.agents&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators.chat&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;MistralChatGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.dataclasses&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.tools.mcp&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;MCPToolset&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;StdioServerInfo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;toolset&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;MCPToolset&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;server_info&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;StdioServerInfo&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;command&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;npx&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;args&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;-y&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;@modelcontextprotocol/server-filesystem&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;/Users/you/projects&#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;tool_names&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;list_directory&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;read_file&#34;&lt;/span&gt;],  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# restrict to safe read-only actions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;chat_generator&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;MistralChatGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;mistral-medium-3-5&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;toolset&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;exit_conditions&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text&#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;List markdown files in /Users/you/projects/docs and summarize them.&#34;&lt;/span&gt;)]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;last_message&#34;&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Response:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Here are the Markdown files in the repo root and a brief summary of each:...
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; If you omit &lt;code&gt;tool_names&lt;/code&gt;, the toolset loads every tool the server offers. Be careful here, exposing 20–30&#43; tools at once can overwhelm the LLM’s tool-selection logic and degrade accuracy. Curating the tool list is one of the simplest reliability wins you can make.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id=&#34;searchabletoolset-scale-to-large-tool-catalogs&#34;&gt;&lt;code&gt;SearchableToolset&lt;/code&gt;: scale to large tool catalogs&lt;/h3&gt;
&lt;p&gt;As soon as you connect multiple MCP servers, you hit a hard problem: too many tools. Every tool definition (name, description, input schema) is injected into the LLM’s context on every turn. A catalog of dozens or hundreds of tools bloats the context window, raises cost and latency, and makes the model worse at picking the right tool.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;SearchableToolset&lt;/code&gt; is Haystack’s answer to this &lt;strong&gt;context management&lt;/strong&gt; challenge. Instead of exposing every tool up front, it exposes a single bootstrap tool, &lt;code&gt;search_tools&lt;/code&gt;, that the agent uses to discover relevant tools on demand via BM25 keyword search. Once the agent searches, the matching tools become immediately available for it to call in subsequent iterations.&lt;/p&gt;
&lt;p&gt;Crucially, the catalog can contain &lt;code&gt;MCPTool&lt;/code&gt; and &lt;code&gt;MCPToolset&lt;/code&gt; instances, so you can place many MCP servers behind one searchable, context-efficient interface:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.agents&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.dataclasses&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.tools&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SearchableToolset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators.chat&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIChatGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.tools.mcp&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;MCPToolset&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;StdioServerInfo&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;StreamableHttpServerInfo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Pull tools from several MCP servers into one catalog&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;fetch_tools&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;MCPToolset&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;server_info&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;StdioServerInfo&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;command&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;uvx&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;args&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;mcp-server-fetch&#34;&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;github_tools&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;MCPToolset&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;server_info&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;StreamableHttpServerInfo&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://api.githubcopilot.com/mcp/&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;headers&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Authorization&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Bearer &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;GITHUB_PAT&#39;&lt;/span&gt;]&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;catalog&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;fetch_tools&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;github_tools&lt;/span&gt;] &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# can be dozens of tools&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;toolset&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SearchableToolset&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;catalog&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;catalog&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;search_threshold&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;chat_generator&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIChatGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-5.4-mini&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;toolset&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=[
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Check the open PRs on haystack integrations repo &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;(deepset-ai/haystack-integrations) and create a social media post &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;about the most interesting ones.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;last_message&#34;&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Result:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Here’s a draft social post highlighting the most interesting open PRs in &lt;span style=&#34;color:#2aa198&#34;&gt;`&lt;/span&gt;deepset-ai/haystack-integrations&lt;span style=&#34;color:#2aa198&#34;&gt;`&lt;/span&gt;....
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this setup, the 
&lt;a href=&#34;https://github.com/modelcontextprotocol/servers/tree/main/src/fetch&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Fetch MCP Server&lt;/a&gt; runs over stdio, while the 
&lt;a href=&#34;https://github.com/github/github-mcp-server&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GitHub MCP Server&lt;/a&gt; uses the official remote MCP endpoint over Streamable HTTP. If your MCP host supports OAuth, you can authenticate with that remote server without passing a PAT header manually.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Fetch MCP caution:&lt;/strong&gt; The Fetch MCP Server can access local/internal IP addresses and may introduce security risk. Use it carefully to avoid exposing sensitive data.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This pattern keeps the agent’s prompt lean while still giving it access to a large universe of capabilities, exactly the kind of context engineering that separates a demo from a production agent. Learn more about context management in 
&lt;a href=&#34;https://haystack.deepset.ai/blog/context-engineering&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Blog Post: Context Engineering for Agentic Systems&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;approach-2-haystack-as-an-mcp-server-using-hayhooks&#34;&gt;Approach 2: Haystack as an MCP server using Hayhooks&lt;/h2&gt;
&lt;p&gt;The first approach makes Haystack an MCP &lt;em&gt;client&lt;/em&gt;. Now let’s flip it around. 
&lt;a href=&#34;https://deepset-ai.github.io/hayhooks&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hayhooks&lt;/a&gt; is deepset’s tool for serving Haystack pipelines and agents over HTTP, and it can also act as an &lt;strong&gt;MCP server&lt;/strong&gt;. This means any Haystack application, a defined pipeline or a full agent, can be exposed as an MCP tool and connected to MCP clients like Claude, ChatGPT, Cursor, or any other agent.&lt;/p&gt;
&lt;h3 id=&#34;getting-started&#34;&gt;Getting started&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pip install hayhooks[mcp]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;hayhooks mcp run
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This starts the MCP server on &lt;code&gt;localhost:1417&lt;/code&gt; by default (configurable via &lt;code&gt;HAYHOOKS_MCP_HOST&lt;/code&gt; and &lt;code&gt;HAYHOOKS_MCP_PORT&lt;/code&gt;). It speaks Streamable HTTP at &lt;code&gt;/mcp&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;how-a-pipeline-becomes-a-tool&#34;&gt;How a pipeline becomes a tool&lt;/h3&gt;
&lt;p&gt;When you deploy a pipeline with a &lt;code&gt;PipelineWrapper&lt;/code&gt;, Hayhooks automatically turns it into an MCP tool. The magic is in the &lt;code&gt;run_api&lt;/code&gt; method and its docstring:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The wrapper &lt;code&gt;name&lt;/code&gt; becomes the MCP tool &lt;code&gt;name&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The first line of the &lt;code&gt;run_api&lt;/code&gt; docstring becomes the tool &lt;code&gt;description&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;run_api&lt;/code&gt; method arguments and their type hints become the tool’s &lt;code&gt;inputSchema&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;pathlib&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;hayhooks&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;BasePipelineWrapper&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;PipelineWrapper&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;BasePipelineWrapper&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;setup&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline_yaml&lt;/span&gt; = (&lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;__file__&lt;/span&gt;).&lt;span style=&#34;color:#268bd2&#34;&gt;parent&lt;/span&gt; / &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;my_haystack_pipeline.yml&#34;&lt;/span&gt;).&lt;span style=&#34;color:#268bd2&#34;&gt;read_text&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;loads&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline_yaml&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;run_api&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;urls&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;], &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        Ask a question about one or more websites using a Haystack pipeline.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        Args:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;            urls: List of website URLs to analyze
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;            question: Question to ask about the content
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        &#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;fetcher&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;urls&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;urls&lt;/span&gt;}, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Because the input schema is derived from your method signature, Hayhooks validates inputs automatically.&lt;/p&gt;
&lt;h3 id=&#34;connecting-your-ide-or-assistant&#34;&gt;Connecting your IDE or assistant&lt;/h3&gt;
&lt;p&gt;To use your deployed pipeline from MCP clients, add the server in MCP settings like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;mcpServers&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;hayhooks&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;url&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;http://localhost:1417/mcp&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once configured, you can deploy, manage, and run your Haystack pipelines directly from chat. Hayhooks also exposes &lt;strong&gt;core tools&lt;/strong&gt; (&lt;code&gt;deploy_pipeline&lt;/code&gt;, &lt;code&gt;undeploy_pipeline&lt;/code&gt;, &lt;code&gt;get_pipeline_status&lt;/code&gt;, &lt;code&gt;get_all_pipeline_statuses&lt;/code&gt;) so an assistant can manage your deployments through natural language.&lt;/p&gt;
&lt;p&gt;This approach is ideal when you self-host and want full control over the infrastructure, while still getting standardized MCP access for free. Learn about the details of how to use Hayhooks as an MCP Server 
&lt;a href=&#34;https://deepset-ai.github.io/hayhooks/features/mcp-support&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;approach-3-haystack-as-an-mcp-server-using-haystack-enterprise-platform&#34;&gt;Approach 3: Haystack as an MCP server using Haystack Enterprise Platform&lt;/h2&gt;
&lt;p&gt;Self-hosting an MCP server works, but production MCP tools need governance: authentication, access control, scaling, and observability. The 
&lt;a href=&#34;https://docs.cloud.deepset.ai/docs/getting-started&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack Enterprise Platform&lt;/a&gt; provides all of this as a managed service, so you can turn any deployed pipeline into an MCP tool without standing up extra infrastructure.&lt;/p&gt;
&lt;p&gt;The workflow is straightforward:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Build and deploy your pipeline in the platform.&lt;/li&gt;
&lt;li&gt;Create a single &lt;strong&gt;workspace MCP server&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Enable individual pipelines as tools (toggling &lt;strong&gt;Use as MCP tool&lt;/strong&gt; in each pipeline’s settings), optionally with a custom tool name and description.&lt;/li&gt;
&lt;li&gt;Copy the generated client configuration and connect your assistant.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;MCP client configuration looks like this (the platform generates it for you):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;mcpServers&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;haystack-enterprise&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;url&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://api.cloud.deepset.ai/api/v2/workspaces/&lt;your-workspace-id&gt;/mcp&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;headers&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;Authorization&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Bearer your-api-key&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can read the full walkthrough in the 
&lt;a href=&#34;https://docs.cloud.deepset.ai/docs/use-pipeline-as-mcp-tool&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack Enterprise Platform docs&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;haystack-docs-mcp-server&#34;&gt;Haystack Docs MCP Server&lt;/h3&gt;
&lt;p&gt;A concrete example of this pattern running in production: &lt;strong&gt;we just launched a docs MCP server&lt;/strong&gt;. Under the hood it is a Haystack pipeline that performs document search over the Haystack documentation, deployed on the Haystack Enterprise Platform and exposed as an MCP server. It is the exact same pipeline that powers the search functionality on the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/intro&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack documentation site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is what the whole idea looks like end to end: one retrieval pipeline serves the docs search UI &lt;em&gt;and&lt;/em&gt; becomes a callable MCP tool. Connect it to your IDE or assistant and you can ask questions about Haystack, grounded in the official docs, without leaving your editor.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/mcp-with-haystack/docs-mcp.png alt=&#34;Haystack Docs MCP with an MCP Client&#34; class=&#34;medium&#34; /&gt;
        &lt;figcaption&gt;Haystack Docs MCP with an MCP Client&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h2 id=&#34;choosing-the-right-approach&#34;&gt;Choosing the right approach&lt;/h2&gt;
&lt;div class=&#34;styled-table demo&#34;&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Goal&lt;/th&gt;
          &lt;th&gt;Use this&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Give a Haystack agent one external capability&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;MCPTool&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Give a Haystack agent a curated group of tools from a server&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;MCPToolset&lt;/code&gt; with &lt;code&gt;tool_names&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Manage a large catalog of tools without bloating context&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;SearchableToolset&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Self-host your pipeline/agent as an MCP server&lt;/td&gt;
          &lt;td&gt;Hayhooks (&lt;code&gt;hayhooks mcp run&lt;/code&gt;)&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Run managed, governed, observable MCP tools in production&lt;/td&gt;
          &lt;td&gt;
&lt;a href=&#34;https://www.deepset.ai/products-and-services/haystack-enterprise-platform&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack Enterprise Platform&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;h2 id=&#34;mcp-use-cases&#34;&gt;MCP use cases&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Build robust AI agents&lt;/strong&gt;: Connect Haystack agents to MCP servers with context efficiency in mind. This helps you build more capable agents that can interact with external systems like vector databases, CLI tools, and version control.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Internal knowledge in everyday tools&lt;/strong&gt;: Expose a Haystack RAG pipeline as an MCP tool so teammates can query your internal knowledge base from inside Claude, ChatGPT, or Cursor, grounded in your own data, with no new interface to adopt.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Developer copilots&lt;/strong&gt;: Connect the docs MCP server to your IDE so your coding assistant answers framework questions from authoritative documentation while you build.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Productizing pipelines&lt;/strong&gt;: Expose a Haystack pipeline as a managed, authenticated tool that partners or customers can call, turning internal retrieval logic into an external offering.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;MCP gives AI engineers a practical layer: consume external capabilities inside your Haystack agents, and expose your own pipelines and agents as reusable tools for any MCP-compatible client.&lt;/p&gt;
&lt;p&gt;Use &lt;code&gt;MCPTool&lt;/code&gt;, &lt;code&gt;MCPToolset&lt;/code&gt;, and &lt;code&gt;SearchableToolset&lt;/code&gt; when you want precise, scalable tool access inside an agent. Use &lt;strong&gt;Hayhooks&lt;/strong&gt; or the &lt;strong&gt;Haystack Enterprise Platform&lt;/strong&gt; when you want to publish Haystack applications for broader teams and production usage.&lt;/p&gt;
&lt;p&gt;Ready to try Haystack with MCP? Start with the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/mcptool&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;MCP get started guide&lt;/a&gt;, wire one tool into your agent, and iterate from there. If you want more deployment control and governance, explore 
&lt;a href=&#34;https://docs.cloud.deepset.ai/docs/use-pipeline-as-mcp-tool&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;MCP tools on Haystack Enterprise Platform&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Context Engineering for Agentic Systems: What Goes Into Your Agent&#39;s Mind</title>
      <link>https://haystack.deepset.ai/blog/context-engineering/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/context-engineering/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Every new generation of Large Language Models arrives with a bigger context window - and the temptation to use it fully. If the model can read a million tokens, why not feed it everything? In practice, more context doesn’t reliably mean better answers: it often means higher costs, slower responses, and a model that loses track of what actually matters. &lt;strong&gt;Context engineering&lt;/strong&gt; is the discipline of deciding not just &lt;em&gt;what&lt;/em&gt; to put in the context window, but &lt;em&gt;how much&lt;/em&gt;, &lt;em&gt;in what form&lt;/em&gt;, and &lt;em&gt;when to leave things out&lt;/em&gt; - and it’s quickly becoming one of the most important skills in building reliable agentic systems.&lt;/p&gt;
&lt;h2 id=&#34;why-context-is-so-important-for-agentic-systems&#34;&gt;Why context is so important for agentic systems&lt;/h2&gt;
&lt;p&gt;An LLM has exactly two sources of information when generating a response:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Internal state (“knowledge”)&lt;/strong&gt; - what was baked in during training. It is static, potentially stale, and opaque to the developer.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context (“prompt”)&lt;/strong&gt; - everything provided at inference time. That’s the only thing we can actively control.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Training knowledge is fixed. We can’t update it without retraining, and we can’t know exactly what the model does or doesn’t know - though most providers publish a &lt;strong&gt;knowledge cutoff date&lt;/strong&gt; in their model cards or documentation, which tells you the point beyond which the model has no awareness of world events. Context is the lever we actually have. Everything a model knows about the current task, the current user, the tools available to it, and the world right now has to come through the context window.&lt;/p&gt;
&lt;p&gt;Today’s leading models offer context windows that would have seemed impossibly large just a few years ago - millions of tokens, enough to fit entire codebases, legal contracts, or a stack of research papers in a single prompt. Yet in practice, agentic systems burn through these limits surprisingly fast. A system prompt, a set of tool definitions, all tool calls and results, a few retrieved documents, and a handful of conversation turns can easily consume tens of thousands of tokens before the agent has done anything meaningful. And even when the hard limit isn’t reached, performance often degrades long before it is - the model starts losing track of earlier instructions, repeating itself, or missing relevant details buried under layers of accumulated context.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/context-engineering/context-growth.png alt=&#34;Context window growth over time&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;At step 1, the context holds little more than the user’s task. By step N, it has grown to include every tool call, every result, every model response, and any retrieved documents - all concatenated and re-sent from scratch on every iteration.&lt;/p&gt;
&lt;p&gt;The difference from one-shot prompting is stark. A single prompt is small, hand-crafted, and fully under control. An agentic system operates in a loop - reasoning, calling tools, receiving results, and repeating, potentially dozens of times. Because LLMs are stateless, every iteration re-sends the entire accumulated history from scratch. The context isn’t a fixed input, but more of a growing log, and context engineering is about managing that growth.&lt;/p&gt;
&lt;h3 id=&#34;when-less-is-more&#34;&gt;When less is more&lt;/h3&gt;
&lt;p&gt;Transformers architecture behind the LLMs work by letting every token attend to every other token in the context. This is what makes them so powerful at integrating information - but it also means the model’s capacity is spread across all tokens simultaneously. Think of it as an &lt;strong&gt;attention budget&lt;/strong&gt;: every new token you introduce depletes it by some amount, regardless of whether that token is useful or not.&lt;/p&gt;
&lt;p&gt;The practical consequence is that irrelevant or redundant content doesn’t just waste space - it actively competes with the information that actually matters. A critical instruction buried under pages of tool outputs may receive less attention than if it had been sent alone. 
&lt;a href=&#34;https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Research from Anthropic&lt;/a&gt; confirms this: models remain capable at longer contexts but show reduced precision for information retrieval and long-range reasoning compared to shorter ones. A million-token context window is not a free pass to include everything - it’s a budget, and every token you add is a trade-off.&lt;/p&gt;
&lt;h3 id=&#34;the-cost-dimension&#34;&gt;The cost dimension&lt;/h3&gt;
&lt;p&gt;Most hosted LLMs charge per input token, which means every byte of context has a price tag. A single call with a 50,000-token context costs roughly 50× more than one with 1,000 tokens - and in an agentic loop that runs dozens of iterations, that multiplier compounds with every step. Context management is therefore not just a quality concern but a cost concern: a bloated context window can turn a cheap pipeline into an expensive one without producing any better answers.&lt;/p&gt;
&lt;h2 id=&#34;what-fills-the-context-window-in-an-agentic-system&#34;&gt;What fills the context window in an agentic system&lt;/h2&gt;
&lt;p&gt;We’ve already touched on some of the components that fill an agent’s context window - system prompts, tool definitions, retrieved documents. Let’s map out the full picture, because the list is longer than many developers expect.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;System prompt&lt;/strong&gt; - standing instructions, persona, constraints, output format. Usually fixed but can be large.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Conversation history&lt;/strong&gt; - the full back-and-forth between user and agent across the current session.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory&lt;/strong&gt; - retrieved facts from past sessions or external knowledge stores. See also: 
&lt;a href=&#34;https://haystack.deepset.ai/cookbook/memory_store_mem0&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Using Mem0 Memory Store with Haystack Agents&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Retrieval output&lt;/strong&gt; - documents or chunks fetched proactively by a RAG pipeline, before the model acts. This data arrives in context as part of the input to the model, not as a consequence of something the model decided to do.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool definitions&lt;/strong&gt; - every tool the model &lt;em&gt;could&lt;/em&gt; call must be described in the context (name, description, parameters schema). With MCP toolsets, this can easily balloon into hundreds of tool descriptions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool call results&lt;/strong&gt; - the output of tools the model itself chose to invoke. Unlike retrieval output, these arrive mid-session as a consequence of the model’s actions. They can be surprisingly large: a read file operation returning a 500-line source file, a web search returning multiple scraped pages, or a database query returning hundreds of rows - and each result stays in context for the remainder of the session.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Few-shot examples&lt;/strong&gt; - demonstration input/output pairs used to guide model behaviour.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The iceberg effect.&lt;/strong&gt; A user sees a single answer. Behind the scenes, the model may have received 50,000 tokens or more on that one turn - a system prompt (perhaps 10k tokens), tool definitions (5k), retrieved documents (20k), and accumulated conversation history (15k). The answer is the tip, while the context is everything below the surface.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id=&#34;what-the-context-actually-looks-like&#34;&gt;What the context actually looks like&lt;/h3&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/context-engineering/context-breakdown.png alt=&#34;Claude Code’s /context command breaks down where tokens are being spent - system prompt, tools, conversation history, and files.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;The screenshot above shows Claude Code’s &lt;code&gt;/context&lt;/code&gt; command, which breaks down exactly where tokens are being spent: system prompt, tool definitions, conversation history, open files, etc. Knowing this makes it possible to identify which component is responsible for a bloated context and whether that cost is justified. With this visibility, optimisation is a bit easier.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;building-a-haystack-agent&#34;&gt;Building a Haystack agent&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.agents&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.generators.anthropic&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;AnthropicChatGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.dataclasses&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.tools&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;tool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;@tool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;get_weather&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;city&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;Get the current weather for a city.&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;It&#39;s sunny and 22°C in &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;city&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;chat_generator&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AnthropicChatGenerator&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;system_prompt&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;You are a helpful assistant.&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;get_weather&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What&#39;s the weather in Paris?&#34;&lt;/span&gt;)])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;last_message&#34;&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When you create an agent in Haystack, much of the context is assembled automatically. Tool descriptions are serialised and injected into the prompt under the hood - you define a tool once, and the framework ensures the model receives everything it needs to call it: the name, description, and parameter schema. The same applies to conversation history, which is maintained across turns without any manual concatenation. The context you see in your code is just the surface, but the model receives considerably more on every call.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;strategies-for-managing-context-growth&#34;&gt;Strategies for managing context growth&lt;/h2&gt;
&lt;p&gt;Context explosion is not inevitable. Once you understand what’s filling the window, you can start making choices about what actually needs to be there. There are several proven techniques for keeping context short without sacrificing quality.&lt;/p&gt;
&lt;h3 id=&#34;delegation-to-subagents&#34;&gt;Delegation to subagents&lt;/h3&gt;
&lt;p&gt;Another way to keep context small is to never let it grow large in the first place. Instead of one agent accumulating the full history of a complex task, you can split the work across specialised subagents - each one receiving only the slice of context relevant to its job. The orchestrator maintains a thin, high-level context, while the worker agents get focused, task-specific contexts. The total token count across the system may be similar, but no single model call is burdened with everything at once. For a practical example of this pattern in Haystack, see 
&lt;a href=&#34;https://haystack.deepset.ai/blog/swarm-of-agents/&#34;&gt;Building a Swarm of Agents&lt;/a&gt; or the 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/45_creating_a_multi_agent_system&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Creating a Multi-Agent System with Haystack&lt;/a&gt; tutorial.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.agents&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.generators.anthropic&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;AnthropicChatGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.dataclasses&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.tools&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;tool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;@tool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;search_web&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;Search the web for up-to-date information on a topic.&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Search results for &#39;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;: ...&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Worker agent: only receives context relevant to its task&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;researcher&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;chat_generator&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AnthropicChatGenerator&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;system_prompt&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;You are a research assistant. Answer questions concisely.&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;search_web&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.tools&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ComponentTool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;delegate_research&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ComponentTool&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;component&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;researcher&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;delegate_research&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;description&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Delegate a research question to a specialised agent.&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;outputs_to_string&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;source&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;last_message&#34;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Orchestrator: only sees compact summaries from worker agents&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;orchestrator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;chat_generator&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AnthropicChatGenerator&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;system_prompt&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Break down tasks and delegate them to specialised agents.&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;delegate_research&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;orchestrator&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Compare quantum and classical computing.&#34;&lt;/span&gt;)])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;last_message&#34;&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;improving-retrieval-quality&#34;&gt;Improving retrieval quality&lt;/h3&gt;
&lt;p&gt;In RAG pipelines, retrieval quality directly determines how many tokens land in the context. Poor retrieval returns irrelevant chunks that add noise without adding value - each one consuming part of the attention budget. Better precision means fewer chunks are needed, which means a smaller, cleaner context.&lt;/p&gt;
&lt;p&gt;A related problem is redundancy: when retrieved passages are near-duplicates, the model sees the same information repeated multiple times without gaining anything new. This is why &lt;strong&gt;diversity&lt;/strong&gt; matters as much as relevance - a set of chunks that each cover a different facet of the question is far more efficient than a set of very similar top matches. Techniques like 
&lt;a href=&#34;https://haystack.deepset.ai/blog/hybrid-retrieval/&#34;&gt;hybrid retrieval&lt;/a&gt;, 
&lt;a href=&#34;https://haystack.deepset.ai/blog/optimizing-retrieval-with-hyde/&#34;&gt;HyDE&lt;/a&gt;, 
&lt;a href=&#34;https://haystack.deepset.ai/blog/query-decomposition/&#34;&gt;query decomposition&lt;/a&gt;, and 
&lt;a href=&#34;https://haystack.deepset.ai/blog/improve-retrieval-with-auto-merging/&#34;&gt;auto-merging retrieval&lt;/a&gt; all help surface results that are both more relevant and more varied.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.embedders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersTextEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.rankers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;TransformersSimilarityRanker&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersTextEmbedder&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Retrieve 10 candidates, then rerank to the 3 most relevant&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;10&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ranker&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;TransformersSimilarityRanker&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder.embedding&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever.query_embedding&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever.documents&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ranker.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;climate change&#34;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ranker&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;climate change&#34;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# result[&#34;ranker&#34;][&#34;documents&#34;] now contains at most 3 highly relevant chunks&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Coming up in the series:&lt;/strong&gt; Retrieval quality deserves a post of its own. The next article will go deep on techniques for surfacing more relevant, more diverse results - so your RAG pipelines put more important tokens in front of the model.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id=&#34;summarisation-and-compaction&#34;&gt;Summarisation and compaction&lt;/h3&gt;
&lt;p&gt;As a conversation grows, the raw message history becomes the biggest consumer of context. Compaction addresses this by periodically replacing the accumulated history with a condensed summary - retaining the essential facts and decisions while discarding the verbatim back-and-forth. The agent continues with a much shorter context, and the summary is updated with each new turn.&lt;/p&gt;
&lt;p&gt;This pattern is well-established in practice. Popular coding agents’ context compaction feature works exactly this way: when the context approaches its limit, it summarises the conversation so far and continues from the summary rather than truncating or failing.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.core.component&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;component&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.agents&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ChatPromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.generators.anthropic&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;AnthropicChatGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.dataclasses&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.tools&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;tool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_experimental.chat_message_stores.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryChatMessageStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_experimental.components.retrievers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessageRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_experimental.components.writers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessageWriter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;@tool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;get_current_date&lt;/span&gt;() -&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;Return today&#39;s date.&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;datetime&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;date&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;date&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;today&lt;/span&gt;().&lt;span style=&#34;color:#268bd2&#34;&gt;isoformat&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;@component&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;HistoryCompactor&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;__init__&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;threshold&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;int&lt;/span&gt; = &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;threshold&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;threshold&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;compactor&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ChatPromptBuilder&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=[
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Summarise the key facts from the conversation below in &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;3-5 bullet points.&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;{{ history }}&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;required_variables&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;history&#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;summariser&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;AnthropicChatGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;claude-haiku-4-5-20251001&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;@component.output_types&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;]) -&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;dict&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;len&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;) &lt;= &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;threshold&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;messages&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;history_text&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;join&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;m&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;role&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;m&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;m&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;m&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;compactor&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template_variables&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;history&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;history_text&lt;/span&gt;})[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;summary&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;summariser&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;)[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# The output message has to be a user message, as our chat &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# generator cannot work with just system/assistant messages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;messages&#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Conversation so far (summary):&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;summary&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# skip_system_messages=False so the compacted summary (a system message) is persisted&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;message_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryChatMessageStore&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;skip_system_messages&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;message_retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessageRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;message_store&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;compactor&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;HistoryCompactor&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;threshold&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;agent&#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;chat_generator&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AnthropicChatGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;claude-haiku-4-5-20251001&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;system_prompt&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;You are a helpful assistant.&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;get_current_date&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;message_writer&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessageWriter&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;message_store&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;message_retriever.messages&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;compactor.messages&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;compactor.messages&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;agent.messages&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;agent.messages&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;message_writer.messages&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;chat_history_id&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;session_1&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# First turn&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;message_retriever&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;current_messages&#34;&lt;/span&gt;: [&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What day is it today?&#34;&lt;/span&gt;)],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;chat_history_id&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;chat_history_id&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;message_writer&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;chat_history_id&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;chat_history_id&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Second turn - history is retrieved, compacted if needed, and stored back automatically&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;message_retriever&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;current_messages&#34;&lt;/span&gt;: [&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What month are we in?&#34;&lt;/span&gt;)],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;chat_history_id&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;chat_history_id&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;message_writer&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;chat_history_id&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;chat_history_id&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;agent&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;last_message&#34;&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;adding-only-relevant-tools-to-the-context&#34;&gt;Adding only relevant tools to the context&lt;/h3&gt;
&lt;p&gt;Tool definitions can be a surprisingly large slice of the context window, especially when connecting to MCP servers that expose dozens or hundreds of tools. Listing every tool upfront means the model receives all those descriptions on every single call, regardless of which tool is actually needed.&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/searchabletoolset&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;SearchableToolset&lt;/code&gt;&lt;/a&gt;, introduced in Haystack 2.25, inverts this approach. Instead of exposing the full catalog, the agent starts with a single &lt;code&gt;search_tools&lt;/code&gt; function and uses it to dynamically discover relevant tools via BM25 keyword search. Only the tools it actually needs are loaded into the context for that turn.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.agents&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.generators.anthropic&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;AnthropicChatGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.dataclasses&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.tools&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Tool&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;SearchableToolset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Create a catalog of tools&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;catalog&lt;/span&gt; = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Tool&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;get_weather&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;description&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Get weather for a city&#34;&lt;/span&gt;, ...),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Tool&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;search_web&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;description&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Search the web&#34;&lt;/span&gt;, ...),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# ... 100s more tools&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;toolset&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SearchableToolset&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;catalog&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;catalog&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;chat_generator&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AnthropicChatGenerator&lt;/span&gt;(), &lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;toolset&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# The agent is initially provided only with the search_tools tool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# and will use it to find relevant tools on demand.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What&#39;s the weather in Milan?&#34;&lt;/span&gt;)])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;offloading-notes-scratchpad--working-memory&#34;&gt;Offloading notes (scratchpad / working memory)&lt;/h3&gt;
&lt;p&gt;An agent’s intermediate reasoning - the chain of thoughts it builds up while working through a multi-step task - does not have to live inside the context window. A simple alternative is to give the agent two dedicated tools: one to write a note to an external store, and one to read notes back. Instead of accumulating its internal monologue in the prompt, the agent can offload conclusions, partial results, and reminders to storage and retrieve only what it needs at each step.&lt;/p&gt;
&lt;p&gt;This keeps the context lean: rather than carrying the full trace of every intermediate thought, the agent holds a minimal working state and queries its own notes on demand. The pattern is especially useful for long-horizon tasks where the reasoning chain would otherwise grow without bound, and it has the side effect of making the agent’s thinking inspectable and debuggable from outside the model.&lt;/p&gt;
&lt;h2 id=&#34;whats-coming-next-in-this-series&#34;&gt;What’s coming next in this series&lt;/h2&gt;
&lt;p&gt;This article is the foundation of a series on context engineering. Future posts will go deeper on specific topics - measuring whether your context actually helps the model, keeping context manageable in long-running agent loops, diversifying retrieval results, tracking token usage across pipelines, and more. If there is a particular area you would like us to cover first, let us know.&lt;/p&gt;
&lt;p&gt;To stay up to date with the series and everything else happening in Haystack, star the 
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack GitHub repository&lt;/a&gt; and join the conversation on 
&lt;a href=&#34;https://discord.gg/Dr63fr9NDS&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Discord&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Multimodal Search with Gemini Embedding 2 in Haystack</title>
      <link>https://haystack.deepset.ai/blog/multimodal-embeddings-gemini-haystack/</link>
      <pubDate>Tue, 10 Mar 2026 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/multimodal-embeddings-gemini-haystack/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;Embeddings are the backbone of modern AI applications, from semantic search and recommendation systems to Retrieval-Augmented Generation (RAG). However, most embedding models operate in a single modality, typically focusing only on textual data.&lt;/p&gt;
&lt;p&gt;Google has introduced &lt;strong&gt;Gemini Embedding 2&lt;/strong&gt;, a &lt;strong&gt;fully multimodal embedding model&lt;/strong&gt; that maps &lt;strong&gt;text, images, video, audio, and PDFs into a shared vector space&lt;/strong&gt;. This means you can search across different types of data using a &lt;strong&gt;single embedding model&lt;/strong&gt;: &lt;code&gt;gemini-embedding-2&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Even better, &lt;strong&gt;Haystack supports Gemini Embedding 2 from Day 0&lt;/strong&gt;. Through the 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/google-genai&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Google GenAI x Haystack integration&lt;/a&gt;, you can immediately start using the model in your Haystack applications for both &lt;strong&gt;text and multimodal embeddings&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Let’s take a closer look.&lt;/p&gt;
&lt;h2 id=&#34;meet-gemini-embedding-2&#34;&gt;Meet Gemini Embedding 2&lt;/h2&gt;
&lt;p&gt;Gemini Embedding 2 is Google’s &lt;strong&gt;first fully multimodal embedding model&lt;/strong&gt;, built on the Gemini architecture. It can map &lt;strong&gt;text, images, video, audio, and PDFs into a single unified vector space&lt;/strong&gt;, enabling cross-modal comparison and retrieval using a shared semantic representation.&lt;/p&gt;
&lt;p&gt;For example, a &lt;strong&gt;text query can retrieve relevant images&lt;/strong&gt;, an &lt;strong&gt;audio clip can match a document&lt;/strong&gt;, or a &lt;strong&gt;video segment can be retrieved using text search&lt;/strong&gt;. This unified representation makes it easier to build multimodal applications like image search, recommendation systems, and RAG.&lt;/p&gt;
&lt;p&gt;The model supports &lt;strong&gt;100&#43; languages&lt;/strong&gt; and allows developers to choose flexible embedding sizes using &lt;strong&gt;Matryoshka Representation Learning (MRL)&lt;/strong&gt;. Depending on the trade-off between storage and accuracy, you can select embedding dimensions up to &lt;strong&gt;3072&lt;/strong&gt;, with commonly recommended sizes being &lt;strong&gt;768&lt;/strong&gt;, &lt;strong&gt;1536&lt;/strong&gt;, or &lt;strong&gt;3072 (default)&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Gemini Embedding 2 also supports &lt;strong&gt;large inputs up to 8192 tokens&lt;/strong&gt;, making it suitable for embedding longer documents and complex multimodal inputs.&lt;/p&gt;
&lt;p&gt;Early benchmarks indicate strong performance across modalities, including a &lt;strong&gt;top-5 ranking on the MTEB Multilingual leaderboard for text&lt;/strong&gt; and &lt;strong&gt;state-of-the-art results among proprietary models&lt;/strong&gt;, with &lt;strong&gt;document retrieval performance comparable to Voyage&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Check out the 
&lt;a href=&#34;https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-embedding-2/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;official Google documentation&lt;/a&gt; for more details.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/multimodal-embeddings-gemini-haystack/gemini-diagram.png alt=&#34;A diagram explaning Gemini Embedding 2 capabilities&#34;  /&gt;
        &lt;figcaption&gt;Generated by &lt;code&gt;gemini-3.1-flash-image-preview&lt;/code&gt; aka. Nano Banana 2&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h2 id=&#34;using-gemini-embeddings-in-haystack&#34;&gt;Using Gemini Embeddings in Haystack&lt;/h2&gt;
&lt;p&gt;Haystack provides built-in components for generating Gemini embeddings through the Gemini API and Vertex AI.&lt;/p&gt;
&lt;p&gt;For &lt;strong&gt;text data&lt;/strong&gt;, you can use:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/googlegenaitextembedder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;GoogleGenAITextEmbedder&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/googlegenaidocumentembedder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;GoogleGenAIDocumentEmbedder&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;code&gt;GoogleGenAIDocumentEmbedder&lt;/code&gt; is typically used during the &lt;strong&gt;indexing&lt;/strong&gt; to embed documents before storing them in a vector database.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# pip install haystack-ai google-genai-haystack&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;datasets&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;load_dataset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.embedders.google_genai&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;GoogleGenAIDocumentEmbedder&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;GoogleGenAITextEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_similarity_function&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cosine&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt; = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;The capybara is the largest rodent in the world and is native to South America, where it lives near rivers, lakes, and wetlands. It is highly social and often seen relaxing in groups, spending much of its time swimming or soaking in water. Capybaras communicate through whistles, barks, and purr-like sounds.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Dogs are domesticated mammals known for their loyalty, intelligence, and strong bond with humans. They have been bred for thousands of years for roles such as companionship, hunting, guarding, and assisting people with various tasks. Different breeds vary widely in size, temperament, and abilities.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;The tiger is the largest species of big cat and is recognized by its distinctive orange coat with black stripes. It is a powerful solitary predator that inhabits forests, grasslands, and wetlands across parts of Asia. Tigers are excellent swimmers and rely on stealth and strength to hunt prey.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;The giraffe is the tallest land animal on Earth, easily identified by its long neck and distinctive spotted coat. It uses its height to reach leaves high in acacia trees and roams the savannas and open woodlands of Africa. Despite its long neck, a giraffe has the same number of neck vertebrae as most mammals.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Elephants are the largest land animals and are known for their intelligence, strong family bonds, and remarkable memory. They use their trunks for breathing, grasping objects, and communication. Elephants live in complex social groups led by a matriarch.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Penguins are flightless birds that live primarily in the Southern Hemisphere, especially in Antarctica. They are excellent swimmers, using their flipper-like wings to move through the water while hunting fish, squid, and krill.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Dolphins are highly intelligent marine mammals known for their playful behavior and complex communication. They live in social groups called pods and use echolocation to navigate and locate prey in the ocean.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Owls are nocturnal birds of prey with excellent night vision and silent flight. They hunt small mammals, insects, and other birds, relying on their sharp talons and keen hearing to detect prey in darkness.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Red pandas are small mammals native to the eastern Himalayas and southwestern China. They have reddish-brown fur, bushy tails, and spend most of their time in trees. Their diet mainly consists of bamboo, though they may also eat fruits and insects.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Kangaroos are large marsupials native to Australia and are famous for their powerful hind legs, large feet, and strong tails that help them balance while hopping. Female kangaroos carry and nurture their young, called joeys, in a pouch. They typically live in open grasslands and forests and often move in groups called mobs.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;doc_embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;GoogleGenAIDocumentEmbedder&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gemini-embedding-2&#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;batch_size&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;prefix&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;title: none | text: &#34;&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# https://ai.google.dev/gemini-api/docs/embeddings#task-types&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;config&lt;/span&gt;={
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;		    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;output_dimensionality&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;768&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# flexible embedding sizes using MRL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docs_with_embeddings&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;doc_embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;write_documents&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;docs_with_embeddings&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;documents&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once documents are embedded and stored, you can embed queries using &lt;code&gt;GoogleGenAITextEmbedder&lt;/code&gt; and retrieve relevant documents.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.embedders.google_genai&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;GoogleGenAITextEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;animal that communicates with whistles and barks&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;text_embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;GoogleGenAITextEmbedder&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gemini-embedding-2&#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;prefix&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;task: search result | query: &#34;&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# https://ai.google.dev/gemini-api/docs/embeddings#task-types&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;config&lt;/span&gt;={
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;		    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;output_dimensionality&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;768&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# flexible embedding sizes using MRL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_embedding&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;text_embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;)[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedding&#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_retriever&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;query_embedding&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;query_embedding&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;documents&#34;&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;score&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;-&#34;&lt;/span&gt; * &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;10&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In production applications, embeddings can be stored in search engines such as 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/qdrant-document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Qdrant&lt;/a&gt;,  
&lt;a href=&#34;https://haystack.deepset.ai/integrations/elasticsearch-document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Elasticsearch&lt;/a&gt; or 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/opensearch-document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenSearch&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;multimodal-embeddings&#34;&gt;Multimodal Embeddings&lt;/h2&gt;
&lt;p&gt;Haystack also provides &lt;strong&gt;Day-0 support for multimodal embeddings&lt;/strong&gt; with the new 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/googlegenaimultimodaldocumentembedder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;GoogleGenAIMultimodalDocumentEmbedder&lt;/code&gt;&lt;/a&gt; component.&lt;/p&gt;
&lt;p&gt;This component enables embedding &lt;strong&gt;images, audio, video, and PDFs directly inside Haystack pipelines&lt;/strong&gt;, making it straightforward to build applications that search across multiple media types.&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.embedders.google_genai&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;GoogleGenAIMultimodalDocumentEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt; = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;file_path&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;kangaroo.mp4&#34;&lt;/span&gt;}),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;file_path&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;tiger.jpg&#34;&lt;/span&gt;}),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;file_path&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sample.pdf&#34;&lt;/span&gt;}),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;file_path&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;kangaroo.jpg&#34;&lt;/span&gt;}),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;file_path&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cat.jpg&#34;&lt;/span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;doc_multimodal_embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;GoogleGenAIMultimodalDocumentEmbedder&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gemini-embedding-2&#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;config&lt;/span&gt;={
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;output_dimensionality&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;768&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# flexible embedding sizes using MRL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docs_with_embeddings&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;doc_multimodal_embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;write_documents&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;docs_with_embeddings&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;documents&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;cross-modal-retrieval&#34;&gt;Cross-modal Retrieval&lt;/h2&gt;
&lt;p&gt;One powerful capability enabled by multimodal embeddings is &lt;strong&gt;cross-modal retrieval,&lt;/strong&gt; retrieving one type of content using another modality as the query.&lt;/p&gt;
&lt;p&gt;For example, you can perform &lt;strong&gt;image-to-text search&lt;/strong&gt;, where an image is used as a query to retrieve relevant text documents. If a user uploads an image of a dog, the system could retrieve documents describing dog breeds, training tips, or animal behavior.&lt;/p&gt;
&lt;p&gt;To perform image-to-text search, you first embed your &lt;strong&gt;text documents during indexing&lt;/strong&gt; (as shown earlier using &lt;code&gt;GoogleGenAIDocumentEmbedder&lt;/code&gt;). Then you embed the &lt;strong&gt;image as the query&lt;/strong&gt; using the multimodal embedder.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Because all modalities share the same vector space, you can use this approach to support cross-modal retrieval in any direction, for example &lt;strong&gt;text-to-image, image-to-text, audio-to-video,&lt;/strong&gt; or &lt;strong&gt;video-to-document&lt;/strong&gt; search.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;image_doc&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;file_path&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;another_kangaroo.jpg&#34;&lt;/span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;image_embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;GoogleGenAIMultimodalDocumentEmbedder&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gemini-embedding-2&#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;config&lt;/span&gt;={
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;output_dimensionality&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;768&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# flexible embedding sizes using MRL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Create the embedding for the image&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;image_embedding&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;image_embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;([&lt;span style=&#34;color:#268bd2&#34;&gt;image_doc&lt;/span&gt;])[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;documents&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;embedding&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Find the most semantically similar texts, images, audio, video and PDFs in the vector database&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_retriever&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;query_embedding&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;image_embedding&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;)[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;documents&#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;score&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;-&#34;&lt;/span&gt; * &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;100&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With multimodal embeddings, you can build applications such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Multimodal search&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Example: search a product catalog with a text query like &lt;em&gt;“red running shoes with white soles”&lt;/em&gt; and retrieve both product images and descriptions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cross-modal retrieval&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Search one type of media using another modality. For example, upload a photo of a monument to retrieve relevant articles describing it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Multimodal RAG systems&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Combine documents, diagrams, screenshots, and videos as sources of knowledge.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Media recommendation systems&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Recommend videos or images based on textual descriptions.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To learn more about working with multimodal data in Haystack, check out the tutorial on 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/46_multimodal_rag&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Creating Vision &#43; Text RAG Pipelines&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;What’s Next&lt;/h2&gt;
&lt;p&gt;Gemini Embedding 2 makes it easier to build &lt;strong&gt;multimodal-aware retrieval systems&lt;/strong&gt; without stitching together multiple embedding models.&lt;/p&gt;
&lt;p&gt;With Haystack, you can already use both text and multimodal embeddings with Gemini Embedding 2. We’re excited to see what multimodal applications you build next.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Haystack Ecosystem: One Name, One Product Family, One Look</title>
      <link>https://haystack.deepset.ai/blog/announcing-haystack-ecosystem/</link>
      <pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/announcing-haystack-ecosystem/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;We’re making some naming and visual updates at deepset to better reflect the role Haystack already plays as a framework, a community, and the foundation of our enterprise platform.&lt;/p&gt;
&lt;p&gt;If you’re already building with Haystack, nothing is changing in how you build or run applications. This update is about clarity, making the Haystack ecosystem easier to understand, easier to navigate, and centered around a single open foundation.&lt;/p&gt;
&lt;h2 id=&#34;the-open-source-to-enterprise-story-of-haystack&#34;&gt;The Open Source to Enterprise Story of Haystack&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Haystack&lt;/strong&gt; began as an open-source framework for building NLP pipelines, created to give developers precise control over how AI systems are composed, debugged, and run in production. From the start, it was designed for real-world use, not just experimentation.&lt;/p&gt;
&lt;p&gt;Over time, the framework expanded and so did the community. Teams began using Haystack to power increasingly sophisticated systems in production: &lt;strong&gt;advanced RAG pipelines, agentic workflows, and multi-step, multimodal applications operating over complex enterprise data&lt;/strong&gt;. As adoption and contribution grew, Haystack evolved from a framework into a broader ecosystem, shaped by its community and proven in demanding production environments.&lt;/p&gt;
&lt;p&gt;As enterprises scaled their use of Haystack, the need for additional operational capabilities emerged. The deepset AI Platform evolved alongside the framework, built on the same architecture, abstractions, and mental model to support governance, deployment, and collaboration at scale. Renaming it to &lt;strong&gt;Haystack Enterprise Platform&lt;/strong&gt; reflects how the open-source framework and enterprise operational tooling are already used together in practice.&lt;/p&gt;
&lt;p&gt;If you know Haystack, you already understand the foundation of the enterprise platform. The new name reflects how teams use Haystack across development, production, and enterprise-scale environments without changing tools, rewriting systems, or navigating brand confusion when choosing the right Haystack offering.&lt;/p&gt;
&lt;h2 id=&#34;the-haystack-logo-what-changed-and-why&#34;&gt;The Haystack Logo: What Changed and Why&lt;/h2&gt;
&lt;p&gt;Alongside the naming change, we’re updating the &lt;strong&gt;Haystack logo&lt;/strong&gt; and &lt;strong&gt;overall look&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This isn’t just a change in appearance. It reflects how Haystack is used today and how the ecosystem has evolved.&lt;/p&gt;
&lt;p&gt;Haystack is no longer only a framework, it’s the foundation across everything we ship. Haystack Open Source, Haystack Enterprise Starter, and Haystack Enterprise Platform now share a consistent look, reflecting their place within a single Haystack family.&lt;/p&gt;
&lt;p&gt;The updated logo reflects Haystack’s role in building complete AI systems. It aligns with how Haystack is used today, bringing agents, retrieval, pipelines, and operational concerns together while staying true to its core values of openness, clarity and structure.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/announcing-haystack-ecosystem/haystack-logos.png alt=&#34;&#34; class=&#34;medium&#34; /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;the-haystack-product-family&#34;&gt;The Haystack Product Family&lt;/h2&gt;
&lt;p&gt;The Haystack product family supports teams across the full lifecycle of enterprise AI adoption, from development to production and organization-wide scale. It includes the &lt;strong&gt;Haystack Framework&lt;/strong&gt;, &lt;strong&gt;Haystack Enterprise Starter&lt;/strong&gt;, and &lt;strong&gt;Haystack Enterprise Platform&lt;/strong&gt;.
These are layers teams can grow into as their use of Haystack matures.&lt;/p&gt;
&lt;h3 id=&#34;what-is-haystack&#34;&gt;What Is Haystack?&lt;/h3&gt;
&lt;p&gt;If you’re new to &lt;strong&gt;Haystack&lt;/strong&gt;, it’s an open-source AI orchestration framework developed by deepset that enables Python developers to build sophisticated, agentic AI applications.&lt;/p&gt;
&lt;p&gt;It provides a modular pipeline architecture with a rich ecosystem of integrations, while preserving full flexibility to add custom components and external tools, including those from MCP servers. With robust documentation and production-ready capabilities such as serialization, logging, and well-defined APIs, Haystack supports reliable deployment across both cloud and on-premise environments.&lt;/p&gt;
&lt;p&gt;Haystack is designed to help teams move from experimentation to production without sacrificing control or architectural clarity and is already used in production by many teams today.&lt;/p&gt;
&lt;h3 id=&#34;what-is-haystack-enterprise-starter&#34;&gt;What Is Haystack Enterprise Starter?&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Haystack Enterprise Starter&lt;/strong&gt; combines the &lt;strong&gt;Haystack open-source framework&lt;/strong&gt; with &lt;strong&gt;enterprise-grade support&lt;/strong&gt; to help teams move more quickly and confidently toward production.&lt;/p&gt;
&lt;p&gt;It’s designed for teams already building with the Haystack framework who want additional guidance and reduced implementation risk, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Private and dedicated consulting and support&lt;/li&gt;
&lt;li&gt;Secure pipeline and deployment design guidance&lt;/li&gt;
&lt;li&gt;Proven deployment blueprints and production pipeline templates&lt;/li&gt;
&lt;li&gt;A clear path to expand as requirements evolve&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Enterprise Starter serves as a practical stepping stone for teams preparing for broader adoption and more complex production use cases.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Haystack Enterprise Starter was previously known as Haystack Enterprise.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;👉 Learn more about 
&lt;a href=&#34;https://www.deepset.ai/products-and-services/haystack-enterprise&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;Haystack Enterprise Starter&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;what-is-haystack-enterprise-platform&#34;&gt;What Is Haystack Enterprise Platform?&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Haystack Enterprise Platform&lt;/strong&gt; extends Haystack with the tooling needed to run AI systems reliably in production at enterprise scale.&lt;/p&gt;
&lt;p&gt;It builds on the same architecture, abstractions, and mental model the developer community already uses in Haystack, adding operational capabilities for deploying, testing, and governing AI applications across complex data, multiple use cases, and different environments.&lt;/p&gt;
&lt;p&gt;Key capabilities include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Composable, repeatable pipeline orchestration&lt;/strong&gt; with visual views aligned with code&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Production-ready indexing and retrieval&lt;/strong&gt; for large, complex enterprise data&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Integrated testing and evaluation&lt;/strong&gt; to validate changes and improve system behavior over time&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Access control and auditability&lt;/strong&gt; for shared, multi-team environments&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalable deployment&lt;/strong&gt; across cloud and on-premise setups&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Built on Haystack’s open-source foundation and shaped by real-world production use, Haystack Enterprise Platform helps teams elevate Haystack pipelines into enterprise systems they can operate, share, and scale across the organization.&lt;/p&gt;
&lt;p&gt;👉 Try 
&lt;a href=&#34;https://www.deepset.ai/deepset-studio&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;Haystack Enterprise Platform for free&lt;/strong&gt;&lt;/a&gt; or 
&lt;a href=&#34;https://www.deepset.ai/products-and-services/deepset-ai-platform&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;learn more&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;looking-ahead&#34;&gt;Looking Ahead&lt;/h2&gt;
&lt;p&gt;With these changes, Haystack is easier to understand, easier to navigate, and easier to adopt across the product family.&lt;/p&gt;
&lt;p&gt;This evolution reflects how Haystack is used in practice today, as an open, modular foundation for development and the core of a platform supporting production AI systems ranging from retrieval-based pipelines to agent-based workflows.&lt;/p&gt;
&lt;p&gt;We’ll keep building Haystack together with the community, customers, and partners, focused on transparent, debuggable, production-grade AI systems teams can trust and evolve over time.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Lufthansa Industry Solutions Uses Haystack to Power Enterprise RAG</title>
      <link>https://haystack.deepset.ai/blog/lufthansa-user-story/</link>
      <pubDate>Fri, 24 Oct 2025 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/lufthansa-user-story/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;When you think of Lufthansa, you might picture planes, airports, or global travel, but &lt;strong&gt;Lufthansa Industry Solutions (LHIND)&lt;/strong&gt; is making an impact in a different way: as a full-service IT company delivering digital solutions for clients both inside and outside the Lufthansa Group.&lt;/p&gt;
&lt;p&gt;At 
&lt;a href=&#34;https://www.lufthansa-industry-solutions.com/de-en/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;LHIND&lt;/a&gt;, a subsidiary of the Lufthansa Group, teams work
on a wide range of projects that span cloud infrastructure, AI, and enterprise data systems to
custom software development, process automation, and digital transformation initiatives. Among them is 
&lt;a href=&#34;https://www.lufthansa-industry-solutions.com/de-en/solutions-products/artificial-intelligence/smartassistantai-ai-chatbot-implementation-in-line-with-your-needs&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SmartAssistantAI&lt;/a&gt;, an enterprise AI chatbot implementation to make company knowledge accessible to everyone, instantly and securely.&lt;/p&gt;
&lt;p&gt;Behind the product is &lt;strong&gt;Nils Hilgers&lt;/strong&gt;, Lead AI Engineer at LHIND and his team of engineers and product builders. Together, they’re rethinking enterprise search through the lens of &lt;strong&gt;retrieval-augmented generation (RAG)&lt;/strong&gt; and &lt;strong&gt;enterprise-grade security standards&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;To bring that vision to life, the team selected &lt;strong&gt;Haystack&lt;/strong&gt; as one of their key solutions for powering their AI Assistant.&lt;/p&gt;
&lt;h2 id=&#34;the-challenge-connecting-scattered-knowledge&#34;&gt;The Challenge: Connecting Scattered Knowledge&lt;/h2&gt;
&lt;p&gt;LHIND’s engineering group was tasked with building a &lt;strong&gt;secure, centralized assistant&lt;/strong&gt; capable of answering employees’ questions using the company’s internal documentation. The challenge wasn’t just accuracy — it was compliance and control.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/lufthansa-user-story/assistant-ui.png alt=&#34;&#34; class=&#34;medium&#34; /&gt;
&lt;/p&gt;
&lt;p&gt;The system needed to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Handle multiple data sources (SharePoint, internal wikis etc.)&lt;/li&gt;
&lt;li&gt;Operate under &lt;strong&gt;GDPR&lt;/strong&gt;, &lt;strong&gt;ISO 27001&lt;/strong&gt;, and Lufthansa Group’s own IT governance standards&lt;/li&gt;
&lt;li&gt;Deliver explainable, source-cited results&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With a small team of developer and engineers working in an agile setup, supported by a dedicated platform team, they set out to design a solution that could &lt;strong&gt;unify data retrieval and LLM-based reasoning&lt;/strong&gt; without sacrificing traceability or maintainability.&lt;/p&gt;
&lt;h2 id=&#34;choosing-haystack-flexibility-meets-stability&#34;&gt;Choosing Haystack: Flexibility Meets Stability&lt;/h2&gt;
&lt;p&gt;When the project began, the team evaluated several orchestration frameworks to structure their RAG pipelines. They needed something reliable enough for production but flexible enough to adapt as requirements evolved.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“&lt;em&gt;We needed a graph orchestration framework with well-thought fundamentals that is stable for production&lt;/em&gt;” says Nils.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;After testing a few alternatives, Haystack stood out for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Orchestration layer built on directed graphs with easy 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/serialization&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;serialization&lt;/a&gt; and 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/visualizing-pipelines&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;visualization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/metadata-filtering&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Unified filtering&lt;/a&gt; across different vector database providers&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/jinja-templates&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Jinja-based prompt templating&lt;/a&gt;, which made their prompts more maintainable&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Having used the older 1.x REST API for some demos, the team already knew Haystack’s foundations and migrating to 2.x resulted in cleaner, more maintainable code.&lt;/p&gt;
&lt;h2 id=&#34;the-technical-architecture-how-it-all-comes-together&#34;&gt;The Technical Architecture: How It All Comes Together&lt;/h2&gt;
&lt;p&gt;The assistant is a &lt;strong&gt;cloud-native, microservice-based system&lt;/strong&gt; built around modularity and open-source principles. It combines 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/pipelines&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack pipelines&lt;/a&gt; with custom middleware and observability tooling. It’s not a public-facing product, rather, a secure solution deployed in enterprise environments where control over data and access is critical.&lt;/p&gt;
&lt;h3 id=&#34;core-components&#34;&gt;Core Components&lt;/h3&gt;
&lt;h4 id=&#34;1-frontend-and-authentication&#34;&gt;&lt;strong&gt;1. Frontend and Authentication&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;A modular frontend built with &lt;strong&gt;microfrontends&lt;/strong&gt; allows different configurations per customer like custom stylesheet, logo, and login interfaces through an admin interface. &lt;strong&gt;A Golang-based authentication middleware&lt;/strong&gt; enforces role-based access control (RBAC) and ensures user permissions are respected end-to-end.&lt;/p&gt;
&lt;h4 id=&#34;2-ingestion-pipelines&#34;&gt;&lt;strong&gt;2. Ingestion Pipelines&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Data ingestion is built on &lt;strong&gt;Haystack pipelines&lt;/strong&gt;, with a &lt;strong&gt;custom data integration protocol&lt;/strong&gt;. For preprocessing, domain-specific tools with Haystack OCR and custom 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/converters&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;converters&lt;/a&gt;. &lt;strong&gt;The custom data integration protocol&lt;/strong&gt; includes the periodic &lt;em&gt;LEFT-JOIN-style&lt;/em&gt; synchronization that removes outdated or deleted entries using hashing logic, allowing customers to synchronize their niche data source into the vector database.&lt;/p&gt;
&lt;h4 id=&#34;3-query-services&#34;&gt;&lt;strong&gt;3. Query Services&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;At the heart of the system, &lt;strong&gt;FastAPI-based query services&lt;/strong&gt; integrate directly with &lt;strong&gt;Haystack pipelines&lt;/strong&gt;. A 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/conditionalrouter&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ConditionalRouter&lt;/a&gt; directs requests to either a general LLM (GPT models) or a RAG pipeline, depending on the nature of the question.&lt;/p&gt;
&lt;p&gt;Before the retrieval step, the assistant performs &lt;strong&gt;query rewriting&lt;/strong&gt; based on the ongoing chat history, rephrasing the user’s question to optimize for &lt;strong&gt;hybrid search&lt;/strong&gt; and ensure more accurate context retrieval.&lt;/p&gt;
&lt;p&gt;Using server-sent events (SSE), the system streams results in real time and provides full transparency into the generation process: showing the routed and rewritten queries, the matched documents, and the final LLM-generated answer.&lt;/p&gt;
&lt;h4 id=&#34;4-observability-and-monitoring&#34;&gt;&lt;strong&gt;4. Observability and Monitoring&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;To monitor performance and reliability, the team integrated 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/langfuse&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Langfuse&lt;/a&gt; for observability and &lt;strong&gt;Grafana &#43; Kubernetes&lt;/strong&gt; for operational monitoring. These tools save every incoming query and feedback in the data warehouse for future analysis and evaluation.&lt;/p&gt;
&lt;h2 id=&#34;measuring-success-from-latency-to-feedback-loops&#34;&gt;Measuring Success: From Latency to Feedback Loops&lt;/h2&gt;
&lt;p&gt;Rather than focusing on flashy metrics, the team prioritized efficiency and feedback quality. Their success criterion was simple: &lt;em&gt;reduce the time it takes for people to find what they need&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Since that’s hard to measure directly, they rely on a mix of proxy metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Usage metrics&lt;/strong&gt;: daily and unique request counts across customers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Latency&lt;/strong&gt;: time-to-first-token and time-to-last-token, especially 90th percentile&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;User feedback&lt;/strong&gt;: thumbs-up/down and open-text comments&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;💡 Negative feedback with detailed comments has proven especially valuable, enabling developers to use observability tools to trace whether an issue stems from &lt;strong&gt;prompting&lt;/strong&gt;, &lt;strong&gt;retrieval&lt;/strong&gt;, or &lt;strong&gt;data quality&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id=&#34;lessons-learned-build-quality-early-iterate-fast&#34;&gt;Lessons Learned: Build Quality Early, Iterate Fast&lt;/h2&gt;
&lt;p&gt;Nils emphasizes a pragmatic approach: software quality and observability are not afterthoughts; they are foundations.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“&lt;em&gt;Focus on engineering quality early, even if it slows you down,&lt;/em&gt;” he advises.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The team learned that robust infrastructure and documentation pay off in the long run, even if setup takes longer initially.&lt;/p&gt;
&lt;p&gt;They emphasize:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Lean container builds and fast CI/CD pipelines&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Clear internal documentation&lt;/strong&gt; of database schemas, roles, and dependencies&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Human-annotated evaluation datasets&lt;/strong&gt; to track the performance of different iterations over model and prompt adjustments&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Their development process combines disciplined engineering with agile experimentation, a balance that keeps the project moving fast without sacrificing reliability.&lt;/p&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;What’s Next&lt;/h2&gt;
&lt;p&gt;The roadmap for the coming quarters focuses on &lt;strong&gt;enhancing data integration and observability&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Migrating to a &lt;strong&gt;ClickHouse warehouse&lt;/strong&gt; to handle growing data volumes efficiently&lt;/li&gt;
&lt;li&gt;Adding &lt;strong&gt;server-side prompt management&lt;/strong&gt; for version control&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Further plans include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Time-aware 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/choosing-the-right-ranker&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ranking&lt;/a&gt; to eliminate outdated knowledge&lt;/li&gt;
&lt;li&gt;Deeper Integrations with &lt;strong&gt;Atlassian tools, relational databases, and CRM systems&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;A cleaner architecture using Haystack’s newer abstractions, like 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/supercomponents&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SuperComponents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The next step is scaling both quality and speed while still keeping the system easy to maintain.&lt;/p&gt;
&lt;h2 id=&#34;share-your-story-with-us&#34;&gt;Share Your Story with Us&lt;/h2&gt;
&lt;p&gt;LHIND’s journey shows how &lt;strong&gt;Haystack&lt;/strong&gt; empowers developers to build reliable, customizable, and compliant AI assistants at enterprise scale, like &lt;strong&gt;SmartAssistantAI&lt;/strong&gt;. 
&lt;a href=&#34;https://haystack.deepset.ai/overview/quick-start&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Get started with Haystack&lt;/a&gt; and build your own intelligent, trustworthy assistants for enterprise knowledge access.&lt;/p&gt;
&lt;p&gt;We know there are many more stories like this out there. If your team has built something exciting with Haystack, whether it’s a chatbot, a retrieval system, an agent, or an AI-powered internal tool, 
&lt;a href=&#34;https://forms.gle/fdyPcC165he6pQkc7&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;get in touch with us&lt;/a&gt;. We’d love to feature your work and share how you’re shaping the future with Haystack.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>How TAC Built an Agentic Chatbot with Haystack to Transform Trade Promotions Workflows</title>
      <link>https://haystack.deepset.ai/blog/telus-user-story/</link>
      <pubDate>Mon, 06 Oct 2025 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/telus-user-story/</guid>
      <description>
      
      
      

      
      

      <p>When a leading company like <strong>TELUS Agriculture & Consumer Goods (TAC)</strong>, with a strong presence in agriculture and consumer goods, turns to AI to streamline complex processes, it’s worth taking a closer look.</p>
<p>
<a href="https://www.telus.com/agcg/en" target="_blank" rel="noopener">TELUS Agriculture & Consumer Goods</a> helps businesses optimize everything from supply chains to retail operations. One of their latest innovations: an <strong>agentic chatbot powered by Haystack</strong> that simplifies how users interact with their trade promotions platform.</p>
<p>We sat down with the team behind this project to learn how they built it, why they chose 
<a href="https://github.com/deepset-ai/haystack" target="_blank" rel="noopener">Haystack</a>, and what advice they have for other teams looking to implement Retrieval-Augmented Generation (RAG) and agent-based AI solutions in production.</p>
<h2 id="the-challenge-simplifying-complex-workflows">The Challenge: Simplifying Complex Workflows</h2>
<p>The team’s mission was clear: <strong>give users instant, intelligent access to their data</strong> without requiring them to dig through documentation or rely solely on the existing UI.</p>
<p>TAC’s trade promotions platform is where large consumer goods companies manage the <strong>sales incentives and agreements</strong> they set up with retailers and distributors. Think of discounts, special in-store displays, or seasonal promotions. The platform helps teams <strong>plan, track, and measure</strong> the impact of these promotions across products, customers, and time periods.</p>
<p>Previously, users had to navigate dense manuals to find the information they needed about promotions. This slowed down decision-making and made even simple questions difficult to answer quickly.</p>
<p>The chatbot changes all of this. Instead of manually searching documentation or waiting for new reporting features, users can now ask questions conversationally and get real-time answers. For example, someone managing promotions for a product can quickly check which campaigns are running, how much has been spent, or what results have been achieved — and share that information across their organization. Whether it’s a one-off question or a request for deeper insights, the system delivers immediate results.</p>
<p>As the team put it, this was about <strong>“giving users unprecedented access to their data”</strong> while removing the bottlenecks of traditional reporting systems.</p>
<h2 id="choosing-haystack-from-exploration-to-production">Choosing Haystack: From Exploration to Production</h2>
<p>When the team began exploring options, they came across <strong>Haystack</strong> while researching 
<a href="https://haystack.deepset.ai/tutorials/27_first_rag_pipeline" target="_blank" rel="noopener">RAG workflows</a>. What they needed was a framework that could <strong>support retrieval-augmented generation</strong>, <strong>handle tool calling</strong> for tasks like SQL query generation, and provide <strong>streaming capabilities</strong> for real-time interactions.</p>
<p>Haystack quickly stood out.</p>
<blockquote>
<p>“<em>Haystack checked all the boxes and was easy to get up and running. Switching to Haystack gave us exactly what we needed.</em>” says Kelsey.</p>
</blockquote>
<p>The team had some experience with other frameworks, but Haystack’s flexibility, modular design, and ease of use allowed them to build a proof of concept almost immediately, earning early buy-in and helping them <strong>move fast from experimentation to production</strong>.</p>
<h2 id="the-technical-architecture-from-pipelines-to-agents">The Technical Architecture: From Pipelines to Agents</h2>
<p>The first version of the system used a 
<a href="https://docs.haystack.deepset.ai/docs/pipelines" target="_blank" rel="noopener">pipeline-based architecture</a> with two separate workflows:</p>
<ul>
<li>One workflow handled <strong>knowledge base queries</strong> via RAG after ingesting user documentation and converting it from raw HTML into a searchable format</li>
<li>Another generated SQL queries from user inputs using metadata and schema descriptions, then executed them on the SQL database</li>
</ul>
<p>A <em>topic router</em> decided which workflow to trigger. However, this setup quickly proved <strong>too rigid</strong> for real-world use.</p>
<p>The solution was to move to an <strong>agent-based architecture</strong>. Instead of isolated pipelines, the team wrapped each capability—the RAG workflow and the SQL pipeline—into separate <em>tools</em> that an agent could call dynamically.</p>
<p>Switching to an agent-based model was a game-changer. With 
<a href="https://docs.haystack.deepset.ai/docs/agents" target="_blank" rel="noopener">Haystack Agents</a>, the system gained the ability to:</p>
<ul>
<li>Retrieve documentation and query the SQL database within a single conversational flow</li>
<li>Retry and rewrite queries based on error messages (self-debugging)</li>
<li>Deliver <strong>emergent behaviors</strong> like combining insights from multiple sources</li>
</ul>
<h3 id="key-components-of-the-solution">Key Components of the Solution</h3>
<ul>
<li><strong>Knowledge Base Tool</strong>: Ingests user documentation (HTML → searchable documents)</li>
<li><strong>SQL Tool</strong>: Generates queries with metadata awareness, using MS SQL + SQL Alchemy. This tool enables end users to both get data faster and access datasets that may not have been easily reachable before.</li>
<li><strong>Observability & Monitoring</strong>:
<ul>
<li>OTEL (OpenTelemetry) for observability</li>
<li>Langfuse + Sentry for monitoring and debugging</li>
</ul>
</li>
<li><strong>ETL & Development</strong>: Kedro for ETL pipelines, Gradio for UI testing</li>
<li><strong>Guardrails & Safety</strong>:
<ul>
<li>Restricting the incoming SQL queries to <code>SELECT</code> statements with <code>LIMIT</code> clauses</li>
<li>Sanitizing SQL table/column names before output to prevent leakage</li>
<li>Enforcing user identity constraints on generated queries</li>
</ul>
</li>
</ul>
<p>The team’s focus on <strong>security and reliability</strong> ensures sensitive data stays protected while users enjoy a smooth experience.</p>
<h2 id="evaluating-the-performance">Evaluating the Performance</h2>
<p>To track performance, the team focuses on three main signals: <strong>latency</strong>, <strong>accuracy</strong>, and <strong>user engagement</strong>.</p>
<p>Tool and LLM response times are monitored with 
<a href="https://haystack.deepset.ai/integrations/langfuse" target="_blank" rel="noopener">Langfuse</a>, while accuracy initially relied on human-in-the-loop evaluation before moving toward automated benchmarking. Feedback from early users guides iterative improvements and feature priorities.</p>
<p>The <strong>knowledge base bot</strong> was deployed to production this month, while the <strong>SQL-generation bot</strong> is scheduled for later this year. As adoption grows, the team expects to share more impact stories around productivity gains and user adoption.</p>
<h2 id="lessons-learned--advice-for-other-teams">Lessons Learned & Advice for Other Teams</h2>
<p>The top advice from the team is to start small with a minimal vertical slice of functionality.</p>
<blockquote>
<p>“<em>Starting small gave us confidence</em>” says Kelsey. “<em>Our first proof of concept was simple: two pipelines with topic routing, shown through a Gradio demo. But it proved the value and helped us scale up with certainty.</em>”</p>
</blockquote>
<p>More advice for anyone considering a similar project:</p>
<ul>
<li>
<a href="https://docs.haystack.deepset.ai/docs/tracing" target="_blank" rel="noopener"><strong>Invest in observability</strong></a> early for debugging and insights</li>
<li><strong>Use agents</strong> for flexibility rather than hardcoded pipelines</li>
<li><strong>Automate evaluation</strong> to speed up iteration cycles</li>
</ul>
<h2 id="whats-next">What’s Next</h2>
<p>The journey doesn’t stop here. TAC plans to <strong>automate data ingestion</strong> processes, build an 
<a href="https://haystack.deepset.ai/integrations?type=Evaluation+Framework" target="_blank" rel="noopener">AI-based evaluation framework</a> to score chatbot responses, and even <strong>white-label the application</strong> for other use cases across the organization.</p>
<blockquote>
<p>“<em>We see this as a template for the future,</em>” says Kelsey. “<em>Our goal is to bring this capability to more workflows across TELUS Agriculture & Consumer Goods, including internal documentation and SQL tooling.</em>”</p>
</blockquote>
<h2 id="share-your-story-with-us">Share Your Story with Us</h2>
<p>The TAC team’s journey shows what’s possible when innovative teams combine <strong>Haystack</strong> with real-world challenges. From streamlining SQL workflows to instant access to documentation, they turned a complex problem into a powerful, production-ready solution. 
<a href="https://haystack.deepset.ai/overview/quick-start" target="_blank" rel="noopener">Get started with Haystack</a> and build reliable, production-ready AI applications for your own team.</p>
<p>We know there are many more stories like this out there. If your team has built something exciting with Haystack, whether it’s a chatbot, a retrieval system, an agent, or an AI-powered internal tool, 
<a href="https://forms.gle/fdyPcC165he6pQkc7" target="_blank" rel="noopener">get in touch with us</a>. We’d love to feature your work and share how you’re shaping the future with Haystack.</p>

      </description>
    </item>
    
    <item>
      <title>Announcing Haystack Enterprise Starter: Best Practices and Support</title>
      <link>https://haystack.deepset.ai/blog/announcing-haystack-enterprise/</link>
      <pubDate>Fri, 01 Aug 2025 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/announcing-haystack-enterprise/</guid>
      <description>
      
      
      

      
      

      <p>💙 Thanks to you and all of our amazing community members, the Haystack open source framework has grown into a thriving developer ecosystem, now used by thousands of organizations to power everything from simple Q&A bots to advanced enterprise agents. As more teams run Haystack in production, one thing has become increasingly clear: <strong>building reliable AI systems is hard and scaling them securely is even harder</strong>.</p>
<p>We’ve had a front-row seat to these challenges. Across GitHub threads, meetups, community calls, and production deployments, developers have consistently asked for engineering support and hands-on guidance to <strong>build for their use case</strong>, <strong>accelerate deployment</strong>, <strong>improve observability</strong>, and <strong>scale infrastructure with confidence</strong>. These aren’t just feature requests; they reflect the real-world friction points of teams building AI products that actually ship.</p>
<p>We’ve tackled many of these problems ourselves while building for customers on the 
<a href="https://www.deepset.ai/products-and-services/deepset-ai-platform" target="_blank" rel="noopener">Haystack Enterprise Platform</a>. Now, we want to make that experience more accessible to the broader Haystack community.</p>
<p>That’s why we’re introducing <strong>Haystack Enterprise Starter</strong> today. It adds secure support, best practices, and professional services availability for organizations scaling up their Haystack pipelines. ⬇️</p>
<h2 id="whats-inside-haystack-enterprise-starter">What’s Inside Haystack Enterprise Starter</h2>
<p>For developers building with Haystack, speed, reliability, and support are essential as projects move from prototype to production. Haystack Enterprise adds a new layer of readiness for exactly that:</p>
<ul>
<li><strong>Direct access to the Haystack team</strong> <br>
Get expert help with private email support and dedicated consultation hours.</li>
<li><strong>Curated pipeline templates for faster developement</strong><br>
Deploy out-of-the-box templates for advanced RAG pipelines, agentic workflows, and multimodal applications with built-in 
<a href="https://github.com/deepset-ai/hayhooks" target="_blank" rel="noopener">Hayhooks</a> and <strong>Open WebUI</strong> support, ready to go.</li>
<li><strong>Helm chart and guides for secure deployments</strong><br>
Run Haystack securely across Kubernetes clusters on AWS, Azure, GCP, or on-prem, backed by best setup, scaling, and upgrade practices.</li>
<li><strong>Early access to enterprise-grade features</strong>   <br>
Take advantage of capabilities like prompt injection countermeasures and other security-oriented features ahead of broader release.</li>
</ul>
<p>Think of it as Haystack+, an offering designed to accelerate delivery and give teams the necessary production muscle.</p>
<h2 id="what-about-open-source">What About Open Source?</h2>
<p>While Haystack Enterprise Starter might be the right choice for some organizations, it may not be for others. Today’s news does not introduce any changes to the OSS community. Haystack started and will always remain as an open source framework. That isn’t changing.</p>
<p>We’re committed to keeping Haystack fully featured, well-documented, and free to use. You’ll continue to get:</p>
<ul>
<li>
<a href="https://github.com/deepset-ai/haystack" target="_blank" rel="noopener">Open source codebase</a></li>
<li>Community-driven feature development and new 
<a href="https://haystack.deepset.ai/integrations" target="_blank" rel="noopener">integrations</a></li>
<li>
<a href="https://haystack.deepset.ai/tutorials" target="_blank" rel="noopener">Tutorials</a>, 
<a href="https://haystack.deepset.ai/cookbook" target="_blank" rel="noopener">Cookbook</a>, and other learning materials</li>
<li>A welcoming, collaborative 
<a href="https://discord.com/invite/Dr63fr9NDS" target="_blank" rel="noopener">Discord community</a> with support</li>
</ul>
<p><strong>Haystack Enterprise Starter is simply an opt-in layer for teams needing more support and guidance</strong>. It’s designed for engineering teams running Haystack at scale who want faster deployment, higher reliability, and private channels to get expert help.</p>
<p>No license changes. No gated community features. Just more choices for teams building AI products.</p>
<h2 id="how-haystack-enterprise-starter-fits-into-the-bigger-picture">How Haystack Enterprise Starter Fits into the Bigger Picture</h2>
<p>We see Haystack Enterprise Starter as part of a continuum that supports developers from the first prototype to full-scale production systems:</p>
<ul>
<li>
<a href="https://haystack.deepset.ai/" target="_blank" rel="noopener"><strong>Haystack</strong></a>: Self-hosted and fully open source.</li>
<li>
<a href="https://www.deepset.ai/products-and-services/haystack-enterprise" target="_blank" rel="noopener"><strong>Haystack Enterprise Starter</strong></a>: Production-grade support and best practices for teams who are scaling up. Includes secure deployment tooling, private engineering guidance, and early access to enterprise features.</li>
<li>
<a href="https://www.deepset.ai/products-and-services/deepset-ai-platform" target="_blank" rel="noopener"><strong>Haystack Enterprise Platform</strong></a>: A fully managed, end-to-end development and orchestration platform. Built for teams that want visual tooling, iterative testing, autoscaling, and centralized governance over resource and access management.</li>
</ul>
<p>Now, you can start with Haystack, opt-in to Enterprise Starter as you grow, and consider the Platform if you want to accelerate your development.</p>
<h2 id="want-to-learn-more">Want to Learn More?</h2>
<p>If you’re a developer running Haystack in production or thinking about scaling your next use case up, let’s talk!</p>
<p>👉 
<a href="https://www.deepset.ai/products-and-services/haystack-enterprise" target="_blank" rel="noopener">Get in touch with our team to explore Haystack Enterprise Starter</a></p>

      </description>
    </item>
    
    <item>
      <title>Summarize Hacker News Posts with Haystack &amp; OPEA</title>
      <link>https://haystack.deepset.ai/blog/hacker-news-summarization-opea/</link>
      <pubDate>Tue, 10 Jun 2025 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/hacker-news-summarization-opea/</guid>
      <description>
      
      
      

      
      

      <p>Welcome to this step-by-step tutorial where we’ll build a simple Retrieval-Augmented Generation (RAG) pipeline using Haystack and 
<a href="https://haystack.deepset.ai/integrations/opea" target="_blank" rel="noopener">OPEA</a>. We’ll fetch the newest Hacker News posts, feed them to a lightweight LLM endpoint (<code>OPEAGenerator</code>), and generate concise one-sentence summaries (based on this 
<a href="https://haystack.deepset.ai/cookbook/hackernews-custom-component-rag" target="_blank" rel="noopener">notebook</a>). Let’s dive in! 🎉</p>
<h2 id="1-introduction--motivation">1. Introduction & Motivation</h2>
<p>In modern GenAI applications, having a flexible, performant, and scalable platform is essential. 
<a href="https://opea-project.github.io/latest/introduction/index.html" target="_blank" rel="noopener">OPEA</a> (Open Platform for Enterprise AI) is an open, model-agnostic framework for building and operating composable GenAI solutions. It provides:</p>
<ul>
<li>A library of microservices (LLMs, data stores, prompt engines) and higher-order megaservices for end-to-end workflows</li>
<li>HTTP-based inference with multi-model support (open- and closed-source)</li>
<li>Advanced features like batching, streaming, auto-scaling, routing via gateways, and unified observability</li>
</ul>
<p>In this demo, we’ll use an OPEA LLM endpoint in a Haystack pipeline, giving you:</p>
<ul>
<li>Instant HTTP access to any hosted model.</li>
<li>Seamless switching from small prototypes to production-grade RAG solutions.</li>
</ul>
<p>In this tutorial, we’ll build a simple RAG pipeline that fetches the newest Hacker News posts, sends them to a local OPEA endpoint running a <code>Qwen/Qwen2.5-7B-Instruct</code> demo model, and produces concise one-sentence summaries. Of course, you can replace our example model with any other OPEA-served model, making this pattern both lightweight for prototyping and powerful for real-world deployments. Let’s get started! 🚀</p>
<h2 id="2-prerequisites">2. Prerequisites</h2>
<p>Make sure you have:</p>
<ul>
<li>Python 3.9+</li>
<li>Install dependencies: <code>pip install haystack-ai haystack-opea newspaper3k lxml[html_clean]</code></li>
<li>A running OPEA endpoint at 
<a href="http://localhost:9000/v1" target="_blank" rel="noopener">http://localhost:9000/v1</a> (or your own)</li>
</ul>
<blockquote>
<p><strong>NOTE:</strong>
As a reference, here is a 
<a href="https://haystack.deepset.ai/blog/hacker-news-summarization-opea/compose.yaml" target="_blank" rel="noopener">Docker Compose</a> recipe to get you started.
OPEA LLM service can be configured to use a variety of model serving backends like TGI, vLLM, ollama, OVMS… and offers validated runtime settings for good performance on various hardware’s including Intel Gaudi.
In this example, it creates an OPEA LLM service with a TGI backend.
See the documentation for 
<a href="https://github.com/opea-project/GenAIComps/tree/main/comps/llms/src/text-generation" target="_blank" rel="noopener">LLM Generation</a>.
The code is based on 
<a href="https://github.com/opea-project/GenAIComps/blob/main/comps/llms/deployment/docker_compose/compose_text-generation.yaml" target="_blank" rel="noopener">OPEA LLM example</a>
and 
<a href="https://github.com/opea-project/GenAIComps/blob/main/comps/third_parties/tgi/deployment/docker_compose/compose.yaml" target="_blank" rel="noopener">OPEA TGI example</a>.</p>
<p>To run, call <code>LLM_MODEL_ID=Qwen/Qwen2.5-7B-Instruct docker compose up</code>.</p>
</blockquote>
<h2 id="3-building-the-news-fetcher">3. Building the News Fetcher</h2>
<p>We’ll create a custom Haystack component, <code>HackernewsNewestFetcher</code>, that:</p>
<ol>
<li>Calls the Hacker News API to get the latest story IDs</li>
<li>Filters for posts containing URLs</li>
<li>Downloads & parses each article with <code>newspaper3k</code>.</li>
<li>Wraps results in Haystack <code>Document</code> objects</li>
</ol>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">typing</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">List</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">component</span>, <span style="color:#268bd2">Document</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">newspaper</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Article</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">requests</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">@component</span>
</span></span><span style="display:flex;"><span><span style="color:#859900">class</span> <span style="color:#cb4b16">HackernewsNewestFetcher</span>():
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#268bd2">@component.output_types</span>(<span style="color:#268bd2">documents</span>=<span style="color:#268bd2">List</span>[<span style="color:#268bd2">Document</span>])
</span></span><span style="display:flex;"><span>  <span style="color:#859900">def</span> <span style="color:#268bd2">run</span>(<span style="color:#cb4b16">self</span>, <span style="color:#268bd2">last_k</span>: <span style="color:#cb4b16">int</span>):
</span></span><span style="display:flex;"><span>    <span style="color:#93a1a1;font-style:italic"># Fetch the IDs of the newest stories</span>
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">newest_list</span> = <span style="color:#268bd2">requests</span>.<span style="color:#268bd2">get</span>(
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">url</span>=<span style="color:#2aa198">'https://hacker-news.firebaseio.com/v0/newstories.json?print=pretty'</span>
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">url_list</span> = []
</span></span><span style="display:flex;"><span>    <span style="color:#93a1a1;font-style:italic"># Keep only the first `last_k` IDs with URLs</span>
</span></span><span style="display:flex;"><span>    <span style="color:#859900">for</span> <span style="color:#cb4b16">id</span> <span style="color:#859900">in</span> <span style="color:#268bd2">newest_list</span>.<span style="color:#268bd2">json</span>()[<span style="color:#2aa198;font-weight:bold">0</span>:<span style="color:#268bd2">last_k</span>]:
</span></span><span style="display:flex;"><span>      <span style="color:#268bd2">article</span> = <span style="color:#268bd2">requests</span>.<span style="color:#268bd2">get</span>(
</span></span><span style="display:flex;"><span>          <span style="color:#268bd2">url</span>=<span style="color:#2aa198">f</span><span style="color:#2aa198">"https://hacker-news.firebaseio.com/v0/item/</span><span style="color:#2aa198">{</span><span style="color:#cb4b16">id</span><span style="color:#2aa198">}</span><span style="color:#2aa198">.json?print=pretty"</span>
</span></span><span style="display:flex;"><span>      )
</span></span><span style="display:flex;"><span>      <span style="color:#859900">if</span> <span style="color:#2aa198">'url'</span> <span style="color:#859900">in</span> <span style="color:#268bd2">article</span>.<span style="color:#268bd2">json</span>():
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">url_list</span>.<span style="color:#268bd2">append</span>(<span style="color:#268bd2">article</span>.<span style="color:#268bd2">json</span>()[<span style="color:#2aa198">'url'</span>])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">docs</span> = []
</span></span><span style="display:flex;"><span>    <span style="color:#93a1a1;font-style:italic"># Download and parse each article</span>
</span></span><span style="display:flex;"><span>    <span style="color:#859900">for</span> <span style="color:#268bd2">url</span> <span style="color:#859900">in</span> <span style="color:#268bd2">url_list</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#859900">try</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">article</span> = <span style="color:#268bd2">Article</span>(<span style="color:#268bd2">url</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">article</span>.<span style="color:#268bd2">download</span>()
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">article</span>.<span style="color:#268bd2">parse</span>()
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">docs</span>.<span style="color:#268bd2">append</span>(
</span></span><span style="display:flex;"><span>          <span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>            <span style="color:#268bd2">content</span>=<span style="color:#268bd2">article</span>.<span style="color:#268bd2">text</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#268bd2">meta</span>={<span style="color:#2aa198">'title'</span>: <span style="color:#268bd2">article</span>.<span style="color:#268bd2">title</span>, <span style="color:#2aa198">'url'</span>: <span style="color:#268bd2">url</span>}
</span></span><span style="display:flex;"><span>          )
</span></span><span style="display:flex;"><span>        )
</span></span><span style="display:flex;"><span>      <span style="color:#859900">except</span> <span style="color:#268bd2">Exception</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#cb4b16">print</span>(<span style="color:#2aa198">f</span><span style="color:#2aa198">"Couldn't download </span><span style="color:#2aa198">{</span><span style="color:#268bd2">url</span><span style="color:#2aa198">}</span><span style="color:#2aa198">, skipped"</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#859900">return</span> {<span style="color:#2aa198">"documents"</span>: <span style="color:#268bd2">docs</span>}
</span></span></code></pre></div><h2 id="4-integrating-the-llm-opeagenerator">4. Integrating the LLM (OPEAGenerator)</h2>
<p>We use the <code>OPEAGenerator</code> to call our LLM over HTTP. Here, we point to a local endpoint serving the <code>Qwen/Qwen2.5-7B-Instruct</code> model:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack_opea</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">OPEAGenerator</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">llm</span> = <span style="color:#268bd2">OPEAGenerator</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#2aa198">"http://localhost:9000/v1"</span>,             <span style="color:#93a1a1;font-style:italic"># Your OPEA endpoint</span>
</span></span><span style="display:flex;"><span>    <span style="color:#2aa198">"Qwen/Qwen2.5-7B-Instruct"</span>,             <span style="color:#93a1a1;font-style:italic"># Model name</span>
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">model_arguments</span>={<span style="color:#2aa198">"max_tokens"</span>: <span style="color:#2aa198;font-weight:bold">2000</span>}    <span style="color:#93a1a1;font-style:italic"># Generation settings</span>
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><h2 id="5-crafting-the-prompt">5. Crafting the Prompt</h2>
<p>Using <code>PromptBuilder</code>, we define a Jinja-style template that:</p>
<ul>
<li>Lists each article’s title, content, and URL.</li>
<li>Asks the model for a one-sentence summary plus the URL.</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.builders</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">PromptBuilder</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_template</span> = <span style="color:#2aa198">"""
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">You will be provided a few of the latest posts in HackerNews, followed by their URL.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">For each post, provide a one sentence summary, followed by the original post URL.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">Posts:
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% f</span><span style="color:#2aa198">or doc in documents %}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">  {{doc.meta['title']}}:
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">  {{doc.content}}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">  URL: {{doc.meta['url']}}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% e</span><span style="color:#2aa198">ndfor %}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">"""</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_builder</span> = <span style="color:#268bd2">PromptBuilder</span>(<span style="color:#268bd2">template</span>=<span style="color:#268bd2">prompt_template</span>)
</span></span></code></pre></div><h2 id="6-building-the-pipeline">6. Building the Pipeline</h2>
<p>We wire up the components in a <code>Pipeline</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># Instantiate components</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">fetcher</span> = <span style="color:#268bd2">HackernewsNewestFetcher</span>()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># Build and connect</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipe</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipe</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#2aa198">"hackernews_fetcher"</span>, <span style="color:#268bd2">fetcher</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipe</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#2aa198">"prompt_builder"</span>, <span style="color:#268bd2">prompt_builder</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipe</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#2aa198">"llm"</span>, <span style="color:#268bd2">llm</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># Define data flow</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipe</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"hackernews_fetcher.documents"</span>, <span style="color:#2aa198">"prompt_builder.documents"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipe</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"prompt_builder.prompt"</span>, <span style="color:#2aa198">"llm.prompt"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># Visualize the pipeline</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipe</span>.<span style="color:#268bd2">show</span>()                    <span style="color:#93a1a1;font-style:italic"># Interactive usage</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipe</span>.<span style="color:#268bd2">draw</span>(<span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">png</span>)        <span style="color:#93a1a1;font-style:italic"># Plotting to file</span>
</span></span></code></pre></div><h2 id="7-running-the-pipeline">7. Running the Pipeline</h2>
<p>Fetch and summarize the top 2 newest Hacker News posts:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">result</span> = <span style="color:#268bd2">pipe</span>.<span style="color:#268bd2">run</span>(<span style="color:#268bd2">data</span>={<span style="color:#2aa198">"hackernews_fetcher"</span>: {<span style="color:#2aa198">"last_k"</span>: <span style="color:#2aa198;font-weight:bold">2</span>}})
</span></span><span style="display:flex;"><span><span style="color:#cb4b16">print</span>(<span style="color:#268bd2">result</span>[<span style="color:#2aa198">'llm'</span>][<span style="color:#2aa198">'replies'</span>][<span style="color:#2aa198;font-weight:bold">0</span>])
</span></span></code></pre></div><h2 id="8-results">8. Results</h2>
<pre tabindex="0"><code>A course on using Large Language Models (LLMs) to understand and structure search queries without relying on external services is being offered, demonstrating how LLMs can significantly improve and automate search capabilities.

[URL: https://softwaredoug.com/blog/2025/04/08/llm-query-understand]

The European Commission's new ProtectEU security strategy proposes enhanced tools for law enforcement, including methods to access encrypted data, raising significant concerns about potential violations of privacy and civil liberties.

[URL: https://www.cloudwards.net/news/protecteu-security-strategy-raises-encryption-concerns/]
</code></pre><p>Beautiful, concise summaries in seconds! ✨</p>
<h2 id="9-conclusion">9. Conclusion</h2>
<p>In this tutorial, we built a full RAG pipeline:</p>
<ul>
<li>Custom news fetcher for Hacker News.</li>
<li>Lightweight LLM integration via <code>OPEAGenerator</code>.</li>
<li>Jinja-templated prompt for structured input.</li>
<li>Haystack pipeline orchestration.</li>
</ul>
<p>Feel free to extend this setup with more advanced retrieval, caching, or different LLM backends. Happy coding! 🛠️🔥</p>

      </description>
    </item>
    
    <item>
      <title>Deploy AI Pipelines Faster with Hayhooks</title>
      <link>https://haystack.deepset.ai/blog/deploy-ai-pipelines-faster-with-hayhooks/</link>
      <pubDate>Mon, 12 May 2025 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/deploy-ai-pipelines-faster-with-hayhooks/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt; is an AI orchestration framework that enables developers to effortlessly build custom AI pipelines using a modular, building-block approach. However, when it’s time to take those pipelines from your development environment to production, you’re often left with a tough decision: write custom server code, or rely on proprietary tools that may not offer the flexibility you need.&lt;/p&gt;
&lt;p&gt;We’re excited to announce 
&lt;a href=&#34;https://github.com/deepset-ai/hayhooks&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hayhooks&lt;/a&gt;, an open source package designed to simplify deployment. It lets you focus on developing meaningful AI systems rather than worrying about the underlying infrastructure.&lt;/p&gt;
&lt;p&gt;With Hayhooks, you can deploy Haystack pipelines with custom logic, expose OpenAI-compatible chat endpoints, stream responses in real time, and customize your server—all with minimal code and full flexibility. Read on to find out why Hayhooks is a game-changer for every Haystack developer.&lt;/p&gt;
&lt;h2 id=&#34;what-are-haystack-pipelines&#34;&gt;What are Haystack pipelines?&lt;/h2&gt;
&lt;p&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/intro&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt; is an AI orchestration framework designed to create powerful LLM-based systems. It offers a vast library of pre-built components, which you can easily extend with custom logic if your project demands it. Haystack pipelines are designed as modular graphs, allowing you to define how components connect and add branches or loops to your logic. This flexibility enables the development of complex 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/36_building_fallbacks_with_conditional_routing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;agentic systems&lt;/a&gt;, custom retrieval-augmented generation (RAG) applications, semantic search, and more&lt;/p&gt;
&lt;h2 id=&#34;introducing-hayhooks&#34;&gt;Introducing Hayhooks&lt;/h2&gt;
&lt;p&gt;While 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/creating-pipelines&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;building AI pipelines&lt;/a&gt; has become increasingly accessible thanks to frameworks like Haystack, the journey from a working pipeline in a notebook to a production-ready system remains challenging. For basic tasks like processing requests and streaming responses, developers often find themselves writing boilerplate code and dealing with complicated deployment settings.&lt;/p&gt;
&lt;p&gt;By simplifying deployment, Hayhooks provides the missing piece to Haystack’s ease of building custom AI pipelines. With Hayhooks, you can quickly turn &lt;strong&gt;any Haystack pipeline into a production-ready REST API or expose it as an 
&lt;a href=&#34;https://www.deepset.ai/blog/understanding-the-model-context-protocol-mcp&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;MCP server&lt;/a&gt;&lt;/strong&gt;. This allows, for example, a large audience to immediately interact with a pipeline through a UI or use it as an MCP tool for MCP clients.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;📺 Explore Hayhooks in details and see a demo in 
&lt;a href=&#34;https://youtu.be/g4UJG6eIF4I?feature=shared&amp;t=1886&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Open NLP Meetup #15&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;hayhooks-key-features&#34;&gt;Hayhooks’ key features&lt;/h2&gt;
&lt;p&gt;Hayhooks bridges the gap between pipeline development and production deployment, focusing on what developers need most. Here are the key features that make this framework a must for every Haystack developer:&lt;/p&gt;
&lt;h3 id=&#34;1-simplified-deployment&#34;&gt;1. Simplified deployment&lt;/h3&gt;
&lt;p&gt;With Hayhooks, the days of writing boilerplate server code just to expose your pipeline are over. Now you can turn your Haystack pipeline into a REST API with a single command. The Hayhooks CLI handles the API creation, auto-generates Swagger and ReDocly documentation, and formats requests/responses for you, so you can focus on what your pipeline does, not how to serve it.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/deploy-ai-pipelines-faster-with-hayhooks/hayhooks-docs.gif alt=&#34;Hayhooks Swagger Documentation&#34; class=&#34;medium&#34; /&gt;
        &lt;figcaption&gt;Hayhooks Swagger documentation with endpoints&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h3 id=&#34;2-complete-customization-for-your-api&#34;&gt;2. Complete customization for your API&lt;/h3&gt;
&lt;p&gt;When it comes to real-world AI applications, customization is key. Hayhooks introduces the &lt;code&gt;PipelineWrapper&lt;/code&gt; paradigm, allowing you to define exactly how your pipeline behaves as an API. You can insert custom logic before and after pipeline execution for preprocessing, response formatting, and dynamic component configuration, giving you total control over your deployment.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;pathlib&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;typing&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;hayhooks&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;BasePipelineWrapper&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;PipelineWrapper&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;BasePipelineWrapper&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;setup&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Create the pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;my_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;my_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(....
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ...
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;my_pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## or load from yaml&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline_yaml&lt;/span&gt; = (&lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;__file__&lt;/span&gt;).&lt;span style=&#34;color:#268bd2&#34;&gt;parent&lt;/span&gt; / &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;chat_with_website.yml&#34;&lt;/span&gt;).&lt;span style=&#34;color:#268bd2&#34;&gt;read_text&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;loads&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline_yaml&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;run_api&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;urls&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;], &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        Ask a question about one or more websites using a Haystack pipeline.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        &#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;fetcher&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;urls&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;urls&lt;/span&gt;}, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;3-model-context-protocol-mcp-support&#34;&gt;3. Model Context Protocol (MCP) support&lt;/h3&gt;
&lt;p&gt;Model Context Protocol (MCP) provides a standardized way of interacting with 
&lt;a href=&#34;https://haystack.deepset.ai/blog/what-is-an-llm&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;large language models&lt;/a&gt; (LLMs) and other AI models, making it much easier to build customized AI applications. With Hayhooks, you can wrap a Haystack Pipeline into an &lt;strong&gt;MCP server&lt;/strong&gt; with a single command and expose it as an MCP tool that MCP clients like Cursor, Windsurf, Claude Desktop, and 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/agent&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack Agents&lt;/a&gt; can interact with.&lt;/p&gt;
&lt;h3 id=&#34;4-production-ready-streaming-and-chat-endpoints&#34;&gt;4. Production-ready streaming and chat endpoints&lt;/h3&gt;
&lt;p&gt;It’s great if your AI pipeline works well and you can talk to it through your IDE. But that’s not the way to bring it to a broader audience! Hayhooks generates OpenAI-compatible endpoints for chat interfaces like Open WebUI and supports streaming responses out of the box, so you can easily provide the engaging chat experience users are used to from ChatGPT, Perplexity, and the like.&lt;/p&gt;
&lt;h3 id=&#34;5-developer-friendly-workflow&#34;&gt;5. Developer-friendly workflow&lt;/h3&gt;
&lt;p&gt;Hayhooks includes tools to speed up iteration during development. The &lt;code&gt;--overwrite&lt;/code&gt; flag allows you to update an existing pipeline without restarting the server. If you’re debugging and making frequent changes, you can speed things up even more by skipping file persistence. And for better error visibility during development, enable tracebacks with &lt;code&gt;HAYHOOKS_SHOW_TRACEBACKS=true&lt;/code&gt;. This provides detailed information when things go wrong, helping you diagnose and fix issues faster.&lt;/p&gt;
&lt;h3 id=&#34;6-infrastructure-that-scales-with-you&#34;&gt;6. Infrastructure that scales with you&lt;/h3&gt;
&lt;p&gt;Since Hayhooks is built on FastAPI, you can leverage all FastAPI features for more advanced needs. For example, you can add authentication mechanisms for secure deployments, implement custom logging for monitoring, create additional endpoints for administration or diagnostics, and integrate with existing FastAPI applications in your infrastructure. Additionally, you can easily containerize your Hayhooks web server with 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/docker&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Docker&lt;/a&gt; and deploy it at scale with 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/kubernetes&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Kubernetes&lt;/a&gt;. This flexibility means Hayhooks can adapt to your production requirements rather than forcing you to adapt to its limitations.&lt;/p&gt;
&lt;h2 id=&#34;see-hayhooks-in-action&#34;&gt;See Hayhooks in action&lt;/h2&gt;
&lt;p&gt;Want to see how Hayhooks can transform your pipeline into a production-ready solution? Check out our 
&lt;a href=&#34;https://github.com/deepset-ai/haystack-demos/tree/main/chat_with_website_hayhooks&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;step-by-step tutorial&lt;/a&gt;, showing you how to build a chatbot that interacts with website content using &lt;strong&gt;Haystack&lt;/strong&gt;, &lt;strong&gt;Hayhooks&lt;/strong&gt;, and &lt;strong&gt;Open WebUI&lt;/strong&gt;. You’ll have a fully functional chatbot that streams responses and responds to real-time user queries in just a few minutes.&lt;/p&gt;
&lt;p&gt;Explore more examples on 
&lt;a href=&#34;https://github.com/deepset-ai/hayhooks/tree/main/examples&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/deploy-ai-pipelines-faster-with-hayhooks/demo.gif alt=&#34;&#34; class=&#34;medium&#34; /&gt;
        &lt;figcaption&gt;Chat with website example&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h2 id=&#34;outlook-and-getting-started&#34;&gt;Outlook and getting started&lt;/h2&gt;
&lt;p&gt;Since the launch of Hayhooks, the feedback from developers has been great. Many have already successfully deployed their pipelines using Hayhooks, validating the demand for customizable deployment solutions in the AI space. We’re continuously improving the framework and prioritizing feature requests from the community, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Better support for pipeline redeployment.&lt;/li&gt;
&lt;li&gt;Dependency management using &lt;code&gt;requirements.txt&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Asynchronous pipeline support.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you’re a Haystack developer looking for a fast, easy, and powerful deployment method, check out Hayhooks and let us know what you think either on 
&lt;a href=&#34;https://github.com/deepset-ai/hayhooks&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GitHub&lt;/a&gt; or on 
&lt;a href=&#34;https://discord.com/invite/xYvH6drSmA&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our Discord&lt;/a&gt;. We hope you’ll sleep better knowing there’s now a safe and easy way to get your AI pipelines out to the masses faster ;)&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Optimize RAG Applications with Document Reranking Using Haystack With NVIDIA NeMo Retriever</title>
      <link>https://haystack.deepset.ai/blog/optimize-rag-with-nvidia-nemo/</link>
      <pubDate>Thu, 20 Mar 2025 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/optimize-rag-with-nvidia-nemo/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;In retrieval-augmented generation (RAG) applications, the quality of the retrieved documents plays a critical role in delivering accurate and meaningful responses. But what happens when embedding similarity is not enough to get an accurate ordering of the reference documents? This is where &lt;strong&gt;reranking&lt;/strong&gt; comes into play.&lt;/p&gt;
&lt;h2 id=&#34;whats-reranking&#34;&gt;What’s Reranking?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Reranking&lt;/strong&gt; refers to assigning a relevance score to each document based on how well it matches the query. Reranking reorders the retrieved documents to ensure the most contextually relevant results are at the top. This is important because while the retrieval stage focuses on recall, considering relevance broadly, reranking “fine-tunes” the results for increased precision.&lt;/p&gt;
&lt;h3 id=&#34;examples-of-reranking&#34;&gt;Examples of Reranking&lt;/h3&gt;
&lt;p&gt;Consider a query like, “&lt;em&gt;What are the best practices for securing a REST API&lt;/em&gt;?” The retrieval model might return a ranked list with these documents:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;REST API: a practical guide&lt;/li&gt;
&lt;li&gt;Best REST API frameworks&lt;/li&gt;
&lt;li&gt;Detailed steps on how to secure REST APIs&lt;/li&gt;
&lt;li&gt;Public vs. private APIs: challenges and limitations&lt;/li&gt;
&lt;li&gt;REST API architecture principles&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;While all of these seem relevant to the topic of REST APIs, the document with specific security steps (document 3) should ideally be ranked first. Using purely embedding similarity, the document score may rely too much on common words - for instance, document 1 includes “REST API” and a similar word to “practice”, while document 2 also includes the word “best” from the query. The use of a reranker should lead to a better document scoring that overcomes these faults, leading to a better retrieval pipeline.&lt;/p&gt;
&lt;h2 id=&#34;why-reranking-is-crucial-in-rag-systems&#34;&gt;Why Reranking is Crucial in RAG Systems&lt;/h2&gt;
&lt;p&gt;Adding a reranking component to a RAG pipeline enhances both &lt;strong&gt;recall&lt;/strong&gt; (retrieving relevant documents) and &lt;strong&gt;precision&lt;/strong&gt; (selecting the most relevant ones). The reranker, typically using a fine-tuned &lt;strong&gt;LLM&lt;/strong&gt;, reorders retrieved document chunks to ensure the most relevant ones appear at the top, making the retrieval process more accurate.&lt;/p&gt;
&lt;p&gt;By prioritizing the right documents, reranking increases the likelihood of providing the LLM with the best context, which improves the quality of generated responses. For example, in an application where the user seeks specific technical information, the &lt;strong&gt;reranking model ensures that highly relevant content appears first&lt;/strong&gt;, preventing less helpful results from diluting the response quality. This is particularly important when the LLM providing the response has a limited context window or when we aim to optimize its inference process for speed and cost-efficiency.&lt;/p&gt;
&lt;p&gt;Reranking is especially valuable in &lt;strong&gt;hybrid retrieval&lt;/strong&gt; setups, where chunks come from different datastores or from various retrieval methods (e.g., sparse, dense, or keyword-based). Each method may rank relevance differently, but reranking brings consistency regardless of the retrieval method. In hybrid setups, it ensures that the final set of documents provided to the LLM reflects the true semantic relevance to the query, rather than being dominated by a single retrieval method’s biases.&lt;/p&gt;
&lt;h3 id=&#34;evaluation-metrics-for-retrieval-and-reranking&#34;&gt;Evaluation Metrics for Retrieval and Reranking&lt;/h3&gt;
&lt;p&gt;Depending on the purpose, many metrics, such as 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/sasevaluator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;semantic answer similarity&lt;/a&gt; or 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/faithfulnessevaluator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;faithfulness&lt;/a&gt;, can be used to evaluate a RAG pipeline. When using a reranker in the pipeline, it’s essential to assess not only retrieval performance but also how well the reranker optimizes the order of returned results.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Retrieval performance:&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Recall&lt;/strong&gt; reflects how successful the retrieval operation was, checking how many ground truth documents were retrieved. Single-hit recall indicates how often at least one relevant document is retrieved within the results, and multi-hit recall measures how often all relevant documents appear in the top results.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reranking performance:&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To measure how well the reranking model is ordering document chunks, we can use &lt;strong&gt;Mean Reciprocal Rank (MRR)&lt;/strong&gt; and &lt;strong&gt;Normalized Discounted Cumulative Gain (NDCG)&lt;/strong&gt;. MRR reflects how well the relevant documents are positioned, while NDCG evaluates the quality of the entire ranking. Combined, they can give insights in both cases where a little or a lot of context is needed.&lt;/p&gt;
&lt;div class=&#34;styled-table demo&#34;&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;&lt;/th&gt;
          &lt;th&gt;Recall@5 (Single Hit)&lt;/th&gt;
          &lt;th&gt;Recall@5 (Multi Hit)&lt;/th&gt;
          &lt;th&gt;Precision@5&lt;/th&gt;
          &lt;th&gt;MRR@5&lt;/th&gt;
          &lt;th&gt;NDCG@5&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Retriever (top_k=100)&lt;/td&gt;
          &lt;td&gt;0.818&lt;/td&gt;
          &lt;td&gt;0.650&lt;/td&gt;
          &lt;td&gt;0.635&lt;/td&gt;
          &lt;td&gt;0.652&lt;/td&gt;
          &lt;td&gt;0.584&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Reranker (top_k=5)&lt;/td&gt;
          &lt;td&gt;0.884&lt;/td&gt;
          &lt;td&gt;0.718&lt;/td&gt;
          &lt;td&gt;0.692&lt;/td&gt;
          &lt;td&gt;0.708&lt;/td&gt;
          &lt;td&gt;0.643&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Reranker Improvement&lt;/td&gt;
          &lt;td&gt;6.60%&lt;/td&gt;
          &lt;td&gt;6.80%&lt;/td&gt;
          &lt;td&gt;5.69%&lt;/td&gt;
          &lt;td&gt;5.59%&lt;/td&gt;
          &lt;td&gt;5.90%&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Table 1 - Evaluation scores of retrieval and reranking over the small subset of the 
&lt;a href=&#34;https://huggingface.co/datasets/hotpotqa/hotpot_qa&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;HotpotQA dataset&lt;/a&gt;. For evaluation, the following NVIDIA NeMo Retriever microservices were used: 
&lt;a href=&#34;https://docs.api.nvidia.com/nim/reference/nvidia-llama-3_2-nv-embedqa-1b-v2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;nvidia/llama-3.2-nv-embedqa-1b-v2&lt;/code&gt;&lt;/a&gt;, for retrieval and 
&lt;a href=&#34;https://docs.api.nvidia.com/nim/reference/nvidia-llama-3_2-nv-rerankqa-1b-v2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;nvidia/llama-3.2-nv-rerankqa-1b-v2&lt;/code&gt;&lt;/a&gt;, for reranking.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The table reveals the impact of adding a reranker to enhance retrieval output:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Recall Improvements&lt;/strong&gt;: The reranker improves Recall@5 for both single-hit and multi-hit, with multi-hit recall seeing the highest boost (&#43;6.80%). This improvement is crucial when multiple relevant documents are needed for comprehensive context, as the reranker successfully surfaces more relevant documents within the top results.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reranking Quality&lt;/strong&gt;: Metrics like MRR@5 and NDCG@5 indicate a significant improvement in reranking performance. The rise in MRR (&#43;5.59%) suggests that relevant documents appear earlier, while the NDCG increase (&#43;5.90%) indicates better overall ranking quality, making it easier to retrieve relevant information from the top of the results.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In summary, this analysis shows that the reranking model significantly enhances both retrieval and reranking metrics, underscoring its value in surfacing relevant content effectively within the RAG pipeline.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For detailed code used in this evaluation, check out 
&lt;a href=&#34;https://colab.research.google.com/drive/1KMBC4lx4yl2kjJGXwnlEJpd9TtVpmkRX?usp=sharing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Cookbook: Evaluate Reranking-Enhanced Retrieval Pipelines&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;introducing-nvidia-nim&#34;&gt;Introducing NVIDIA NIM&lt;/h2&gt;
&lt;p&gt;NeMo Retriever microservices are built with 
&lt;a href=&#34;https://www.nvidia.com/en-us/ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA NIM&lt;/a&gt;, part of the NVIDIA AI Enterprise software platform. NVIDIA NIM is a collection of containerized microservices designed for optimized inference of state-of-the-art AI models. The container uses a variety of components to serve AI models and exposes them via standard APIs. Models are optimized using NVIDIA 
&lt;a href=&#34;https://developer.nvidia.com/tensorrt&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;TensorRT&lt;/a&gt; or using NVIDIA 
&lt;a href=&#34;https://github.com/NVIDIA/TensorRT-LLM&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;TensorRT-LLM&lt;/a&gt; (depending on the type of the model), automatically applying procedures such as quantization, model distribution, optimized kernel/runtimes, and inflight- or continuous batching, among others, allowing even further optimization if needed. Learn more about 
&lt;a href=&#34;https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NIM&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Haystack provides 4 components that connect with NVIDIA NIM:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/nvidiagenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NvidiaGenerator&lt;/a&gt;: Text generation with LLM NIM.&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/nvidiadocumentembedder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NvidiaDocumentEmbedder&lt;/a&gt;: Document embedding with 
&lt;a href=&#34;https://build.nvidia.com/nvidia/embed-qa-4&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA NeMo Retriever embedding NIM microservice&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/nvidiatextembedder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NvidiaTextEmbedder&lt;/a&gt;: Query embedding with NVIDIA NeMo Retriever embedding NIM microservice.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;(NEW)&lt;/strong&gt; 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/nvidiaranker&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NvidiaRanker&lt;/a&gt;: Document chunk reranker using NVIDIA NeMo Retriever reranking NIM microservice.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/optimize-rag-with-nvidia-nemo/pipeline-image.png alt=&#34;&#34;  /&gt;
        &lt;figcaption&gt;&lt;em&gt;Fig 1 - Haystack Indexing and RAG pipeline upgraded with reranking, with NVIDIA NeMo Retriever&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h3 id=&#34;getting-started-with-nvidia-nim&#34;&gt;Getting Started with NVIDIA NIM&lt;/h3&gt;
&lt;p&gt;To integrate NVIDIA NIM, you can either access pre-trained models using the NVIDIA API catalog or deploy NVIDIA NIM directly on your own infrastructure for enhanced control and scalability. In this example, we’ll focus on using NVIDIA-hosted models.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Get API Keys&lt;/strong&gt;: Sign up on the 
&lt;a href=&#34;https://build.nvidia.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA API catalog&lt;/a&gt; to obtain API keys with free credits for access to pre-trained models.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use NVIDIA-hosted Models&lt;/strong&gt;: Integrate NVIDIA NIM models hosted on the NVIDIA API catalog for seamless access.&lt;/li&gt;
&lt;/ol&gt;
&lt;blockquote&gt;
&lt;p&gt;Learn how to deploy NIM on your own infrastructure in 
&lt;a href=&#34;https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Building RAG Applications with NVIDIA NIM and Haystack on K8s&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;build-a-haystack-rag-pipeline-with-reranking-using-nvidia-nemo-retriever&#34;&gt;Build a Haystack RAG Pipeline with Reranking using NVIDIA Nemo Retriever&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;See the complete code in 
&lt;a href=&#34;https://colab.research.google.com/drive/1ukXiI4rASl2QXDy6GNNL1vnPb_aekMF9&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Cookbook: Reranking for Optimized RAG With NVIDIA NeMo Retriever&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id=&#34;enhanced-retrieval&#34;&gt;Enhanced Retrieval&lt;/h3&gt;
&lt;p&gt;For retrieval, initialize the NeMo Retriever microservices, &lt;code&gt;NvidiaRanker&lt;/code&gt; with 
&lt;a href=&#34;https://docs.api.nvidia.com/nim/reference/nvidia-llama-3_2-nv-rerankqa-1b-v2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;nvidia/llama-3.2-nv-rerankqa-1b-v2&lt;/code&gt;&lt;/a&gt; model and the &lt;code&gt;NvidiaTextEmbedder&lt;/code&gt; with 
&lt;a href=&#34;https://docs.api.nvidia.com/nim/reference/nvidia-llama-3_2-nv-embedqa-1b-v2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;nvidia/llama-3.2-nv-embedqa-1b-v2&lt;/code&gt;&lt;/a&gt;. We’ll set the &lt;code&gt;top_k&lt;/code&gt; value for retriever to 30 and for reranker to 5. Thus, we’ll retrieve 30 docs but only pass the 5 most relevant documents as context to the LLM.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.embedders.nvidia&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaTextEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.generators.nvidia&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.rankers.nvidia&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaRanker&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaTextEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;nvidia/llama-3.2-nv-embedqa-1b-v2&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                              &lt;span style=&#34;color:#268bd2&#34;&gt;api_url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://integrate.api.nvidia.com/v1&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;30&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;reranker&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaRanker&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;nvidia/llama-3.2-nv-rerankqa-1b-v2&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;generation&#34;&gt;Generation&lt;/h3&gt;
&lt;p&gt;Define a basic prompt and initialize  &lt;code&gt;NvidiaGenerator&lt;/code&gt; with &lt;code&gt;meta/llama3-70b-instruct&lt;/code&gt; model as the generator:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.generators.nvidia&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;Answer the question given the context.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Question: {{ query }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Context:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or document in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    {{ document.content }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Answer:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaGenerator&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;meta/llama3-70b-instruct&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model_arguments&lt;/span&gt;={
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;max_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1024&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;build-the-enhanced-rag-pipeline&#34;&gt;Build the Enhanced RAG Pipeline&lt;/h3&gt;
&lt;p&gt;Bring all components together and make the relevant connections.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;enhanced_rag&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;enhanced_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;enhanced_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;enhanced_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reranker&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;reranker&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;enhanced_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;enhanced_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;enhanced_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder.embedding&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever.query_embedding&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;enhanced_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reranker&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;enhanced_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reranker.documents&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;enhanced_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;run-the-pipeline&#34;&gt;Run the pipeline&lt;/h3&gt;
&lt;p&gt;To test our pipeline, we’ll use questions from the 
&lt;a href=&#34;https://huggingface.co/datasets/hotpotqa/hotpot_qa&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;HotpotQA dataset&lt;/a&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;A medieval fortress in Dirleton, East Lothian, Scotland borders on the south side of what coastal area?&#34;&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# correct answer is &#34;Yellowcraig&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;enhanced_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reranker&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;embedder&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;meta&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;usage&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;prompt_tokens&#39;&lt;/span&gt;: 26, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;total_tokens&#39;&lt;/span&gt;: 26}}},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;generator&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;replies&#39;&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;The answer is Yellowcraig. According to the context, Dirleton Castle borders on the south side of the Yellowcraig coastal area.&#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;meta&#39;&lt;/span&gt;: [{&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;role&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;assistant&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;usage&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;prompt_tokens&#39;&lt;/span&gt;: 503,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;total_tokens&#39;&lt;/span&gt;: 532,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;completion_tokens&#39;&lt;/span&gt;: 29},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;finish_reason&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;stop&#39;&lt;/span&gt;}]}}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For the question, the enhanced pipeline’s response is “&lt;em&gt;The answer is Yellowcraig. According to the context, Dirleton Castle borders on the south side of the Yellowcraig coastal area.&lt;/em&gt;”. Let’s now create a basic RAG pipeline without reranking and compare the results.&lt;/p&gt;
&lt;h2 id=&#34;basic-rag-pipeline-without-reranking&#34;&gt;Basic RAG Pipeline without Reranking&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;Answer the question given the context.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Question: {{ query }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Context:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or document in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    {{ document.content }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Answer:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaTextEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;nvidia/llama-3.2-nv-embedqa-1b-v2&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                              &lt;span style=&#34;color:#268bd2&#34;&gt;api_url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://integrate.api.nvidia.com/v1&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaGenerator&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;meta/llama3-70b-instruct&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model_arguments&lt;/span&gt;={
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;max_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1024&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder.embedding&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever.query_embedding&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;A medieval fortress in Dirleton, East Lothian, Scotland borders on the south side of what coastal area?&#34;&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# correct answer is &#34;Yellowcraig&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;embedder&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;meta&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;usage&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;prompt_tokens&#39;&lt;/span&gt;: 26, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;total_tokens&#39;&lt;/span&gt;: 26}}},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;generator&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;replies&#39;&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;The Firth of Forth.&#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;meta&#39;&lt;/span&gt;: [{&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;role&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;assistant&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;usage&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;prompt_tokens&#39;&lt;/span&gt;: 488,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;total_tokens&#39;&lt;/span&gt;: 496,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;completion_tokens&#39;&lt;/span&gt;: 8},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;finish_reason&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;stop&#39;&lt;/span&gt;}]}}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The basic pipeline’s response is “&lt;em&gt;The Firth of Forth.&lt;/em&gt;” which is mentioned in the context but isn’t the correct answer. This shows that the retriever isn’t enough to retrieve the most relevant documents, supporting the improvements in the recall with reranker.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this blog post, we explored the significant impact of adding a reranking model to a RAG pipeline. When using only retrieval, the documents returned may broadly match the query based on embedding similarity, but without guaranteed ordering for contextual relevance. This can lead to responses that lack specificity, as shown in the basic RAG pipeline example where the answer “&lt;em&gt;The Firth of Forth.&lt;/em&gt;” was returned instead of the correct “&lt;em&gt;Yellowcraig&lt;/em&gt;” due to suboptimal document ranking.&lt;/p&gt;
&lt;p&gt;By integrating the NeMo Retriever &lt;code&gt;nvidia/llama-3.2-nv-rerankqa-1b-v2&lt;/code&gt; model using the &lt;code&gt;NvidiaRanker&lt;/code&gt;, the enhanced RAG pipeline prioritized the most contextually appropriate documents, improving the overall precision of the response. With the reranking, metrics like &lt;strong&gt;Recall@5 (Multi Hit)&lt;/strong&gt; and &lt;strong&gt;NDCG@5&lt;/strong&gt; showed marked improvements, indicating that relevant documents not only appeared within the top results more frequently but were also positioned earlier in the list, enhancing the LLM’s access to high-quality context for accurate generation.&lt;/p&gt;
&lt;p&gt;In summary, by adding NeMo Retriever reranking capabilities built with 
&lt;a href=&#34;https://build.nvidia.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA NIM&lt;/a&gt;, RAG pipelines achieve better document ordering, more relevant context, and increased response accuracy—demonstrating the essential role of rerankers in building robust, real-world RAG applications.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Use DeepSeek-R1 with Haystack: Demo and Tutorial</title>
      <link>https://haystack.deepset.ai/blog/use-deepseek-with-haystack/</link>
      <pubDate>Wed, 29 Jan 2025 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/use-deepseek-with-haystack/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;The latest release from &lt;strong&gt;DeepSeek&lt;/strong&gt; confirms an essential truth about AI: there won’t be one model or provider to rule them all. As the field evolves, it’s evident that diverse models bring unique strengths, making a model-agnostic approach vital for developers and organizations alike. Whether you’re building agentic systems, Retrieval-Augmented Generation (RAG) architectures, search or other architectures, a model-agnostic design unlocks flexibility, scalability, and long-term success 🔐&lt;/p&gt;
&lt;h3 id=&#34;stay-flexible-with-a-model-agnostic-approach&#34;&gt;Stay Flexible with a Model-Agnostic Approach&lt;/h3&gt;
&lt;p&gt;Decoupling your application from specific models or APIs gives you the freedom to adapt as AI evolves. A model-agnostic approach lets you choose the best tool for the job—whether it’s generating human-like text, answering complex questions, or handling domain-specific analysis. Through Haystack’s modular architecture, you can easily test, swap, or integrate new models as they emerge, all without rearchitecting your entire AI system. This flexibility ensures you stay ahead of advancements, fine-tune for industry needs, and maintain optimal performance without being locked into a single provider ecosystem.&lt;/p&gt;
&lt;h2 id=&#34;deepseek-r1-open-source-ai-with-cutting-edge-reasoning&#34;&gt;DeepSeek-R1: Open-Source AI with Cutting-Edge Reasoning&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;DeepSeek-R1&lt;/code&gt; is the latest large language model by DeepSeek, designed for high-level reasoning tasks. Achieving performance comparable to &lt;code&gt;OpenAI-o1-1217&lt;/code&gt; on reasoning tasks, it stands out as a reliable choice for advanced AI applications. This release marks a major step forward in open-source AI, offering researchers and developers the flexibility to distill, deploy, and commercialize models under the permissive MIT license.&lt;/p&gt;
&lt;p&gt;Alongside &lt;code&gt;DeepSeek-R1&lt;/code&gt;, the release includes six distilled models, ranging from 1.5B to 70B parameters, built from &lt;code&gt;DeepSeek-R1&lt;/code&gt; based on &lt;code&gt;Qwen&lt;/code&gt; and &lt;code&gt;Llama&lt;/code&gt;. Impressively, the distilled 32B and 70B models rival the performance of &lt;code&gt;OpenAI-o1-mini&lt;/code&gt;. By open-sourcing &lt;code&gt;DeepSeek-R1-Zero&lt;/code&gt;, &lt;code&gt;DeepSeek-R1&lt;/code&gt;, and the complete suite of distilled models, the creators have made a powerful contribution to the research community, fostering innovation and accessibility in AI development 💙&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/use-deepseek-with-haystack/benchmark.png alt=&#34;&#34; class=&#34;medium&#34; /&gt;
        &lt;figcaption&gt;Benchmark Performance of DeepSeek-R1&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Learn more about DeepSeek-R1 in 
&lt;a href=&#34;https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this paper&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;test-deepseek-r1-yourself&#34;&gt;Test DeepSeek-R1 yourself&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;DeepSeek-R1&lt;/code&gt; can be seamlessly integrated with Haystack. To illustrate the potential of &lt;code&gt;DeepSeek-R1&lt;/code&gt;, try out 
&lt;a href=&#34;https://r1-demo.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this demo&lt;/a&gt; using &lt;code&gt;DeepSeek-R1&lt;/code&gt; within the Haystack and deepset ecosystem. In the demo, you can compare &lt;code&gt;DeepSeek-R1&lt;/code&gt; with Open AI’s new reasoning model, &lt;code&gt;o1&lt;/code&gt;, to see which of these two models sets the bar for accuracy, performance, and efficiency.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/use-deepseek-with-haystack/demo.gif alt=&#34;“demo ui”&#34;  /&gt;
        &lt;figcaption&gt;Compare DeepSeek-R1 and OpenAI’s o1 in the deepset demo&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;If you’d like to use &lt;code&gt;DeepSeek-R1&lt;/code&gt; in your Haystack applications, you can run inference using 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/openaichatgenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenAIChatGenerator&lt;/a&gt; with 
&lt;a href=&#34;https://www.together.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Together AI&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;getpass&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;getpass&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.utils&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Secret&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators.chat&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIChatGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators.utils&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;print_streaming_chunk&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;TOGETHER_AI_API_KEY&#34;&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;TOGETHER_AI_API_KEY&#34;&lt;/span&gt;] = &lt;span style=&#34;color:#268bd2&#34;&gt;getpass&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Enter TogetherAI API key:&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;chat_generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIChatGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;deepseek-ai/DeepSeek-R1&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;Secret&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_env_var&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;TOGETHER_AI_API_KEY&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;api_base_url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://api.together.xyz/v1&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;streaming_callback&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;print_streaming_chunk&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Find a full example in this 
&lt;a href=&#34;https://colab.research.google.com/drive/1LsL5szMmrnKmY6jre5GljVX5RmLTZnGY?usp=sharing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;tutorial&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;AI is evolving fast, and DeepSeek-R1 proves that no single model fits all use cases. A model-agnostic approach lets you plug in the best tool for the job, and with modular systems, you can experiment, optimize, and stay ahead as new models emerge. Whether you’re building agents, intelligent search systems, or RAG architectures, the ability to seamlessly switch between models ensures long-term scalability and success.&lt;/p&gt;
&lt;p&gt;We’d love to see what you build! Try out the notebook, adapt it to your needs, and share your results with us on 
&lt;a href=&#34;https://www.linkedin.com/company/deepset-ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;LinkedIn&lt;/a&gt; or 
&lt;a href=&#34;https://discord.gg/Dr63fr9NDS&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Discord&lt;/a&gt;. You can also submit your Haystack story through 
&lt;a href=&#34;https://forms.gle/UU2Yz6TfJ4Kssk5u7&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this form&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Build an Agentic RAG Pipeline in deepset Studio</title>
      <link>https://haystack.deepset.ai/blog/agentic-rag-in-deepset-studio/</link>
      <pubDate>Tue, 14 Jan 2025 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/agentic-rag-in-deepset-studio/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;In this article, we’ll explore how to build an &lt;strong&gt;agentic Retrieval Augmented Generation (RAG) pipeline&lt;/strong&gt; on deepset Studio, the AI application prototyping tool for developers. We’ll first build a basic RAG pipeline and then extend the pipeline with a fallback mechanism that can perform a web search if the answer to the user query cannot be found in the database.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This article also serves as a solution to Day 5 challenge of Advent of Haystack 2024: 
&lt;a href=&#34;https://haystack.deepset.ai/advent-of-haystack/day-5&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Elves’ Secret for Faster Development 💨&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Let’s dive in and create a pipeline that doesn’t just search but actively decides.&lt;/p&gt;
&lt;h2 id=&#34;creating-a-deepset-studio-account&#34;&gt;Creating a deepset Studio Account&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;deepset Studio is a development environment for Haystack.&lt;/strong&gt; It allows you to visually build and test Haystack pipelines. It’s free and open to everyone. Learn more about Studio and its features in 
&lt;a href=&#34;https://haystack.deepset.ai/blog/announcing-studio&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;the announcement blog post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To start building a pipeline on deepset Studio, you need to have an account. If you don’t have one yet, sign up 
&lt;a href=&#34;https://landing.deepset.ai/deepset-studio-signup&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt; to get access.&lt;/p&gt;
&lt;p&gt;Once you’re in, you can complete the onboarding tutorial or return to it later in the left sidebar. The left sidebar also has some relevant tabs, such as “Files,” “Pipeline Templates,” and “Pipelines.”&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/agentic-rag-in-deepset-studio/ds.gif alt=&#34;image.png&#34;  /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;adding-files&#34;&gt;Adding Files&lt;/h2&gt;
&lt;p&gt;For the sake of simplicity, we’ll use the “AI” sample files that come with Studio. Of course, you can use your own files in Studio. If you want to upload your own files, you have two options: you can either use the built-in database that Studio uses, which is an 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/opensearch-document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenSearch&lt;/a&gt; instance, or you can connect to your 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/weaviate-document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Weaviate&lt;/a&gt;, 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/pinecone-document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Pinecone&lt;/a&gt;, 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/qdrant-document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Qdrant&lt;/a&gt;, 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/elasticsearch-document-store/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ElasticSearch&lt;/a&gt; instance remotely.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/agentic-rag-in-deepset-studio/files.png alt=&#34;image.png&#34;  /&gt;
        &lt;figcaption&gt;There are example files on various topics&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h2 id=&#34;building-the-initial-rag-pipeline&#34;&gt;Building the Initial RAG Pipeline&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;Check out 
&lt;a href=&#34;https://docs.cloud.deepset.ai/docs/create-a-pipeline-in-studio&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Create a Pipeline in Pipeline Builder&lt;/a&gt; for more detailed explanation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To start building the pipeline, we need to switch to the “Pipeline Templates” tab. Here, you can find several pre-built pipelines (with components, models, and optimal parameters) for different use cases (RAG, chat, summarization, document similarity, etc.) to speed up the building process and avoid starting from scratch.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/agentic-rag-in-deepset-studio/templates.png alt=&#34;image.png&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Locate “RAG Question Answering GPT-4o” and click “Use Template.” On the modal screen, you can change the default name or leave it as is. Clicking “Create Pipeline” takes you to the Pipeline Builder, the drag-and-drop interface for creating and editing the pipeline.&lt;/p&gt;
&lt;p&gt;This pipeline template comes with a comprehensive indexing pipeline that processes all file types, splits them into chunks and creates embeddings using the 
&lt;a href=&#34;https://huggingface.co/intfloat/e5-base-v2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;intfloat/e5-base-v2&lt;/code&gt;&lt;/a&gt; model. The query pipeline has 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/33_hybrid_retrieval&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;hybrid retrieval&lt;/a&gt; with a 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/rankers&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ranker&lt;/a&gt; and uses &lt;code&gt;gpt-4o&lt;/code&gt; from OpenAI for generation.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;⚠️ If you’re using your own database instance, you’ll need to update your indexing pipeline accordingly. For more information, see 
&lt;a href=&#34;https://docs.cloud.deepset.ai/docs/connect-to-external-database&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Connect to an External Document Store&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/agentic-rag-in-deepset-studio/default-pipeline.png alt=&#34;Default query pipeline coming with the “RAG Question Answering GPT-4o” template&#34;  /&gt;
        &lt;figcaption&gt;Default query pipeline coming with the “RAG Question Answering GPT-4o” template&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;We’ll leave the indexing pipeline as it is, but update the query pipeline to incorporate the web search fallback mechanism.&lt;/p&gt;
&lt;h2 id=&#34;incorporating-a-fallback-mechanism-into-a-rag-pipeline&#34;&gt;Incorporating a Fallback Mechanism into a RAG pipeline&lt;/h2&gt;
&lt;p&gt;We want our pipeline to exhibit agentic behavior by dynamically deciding its course of action. Specifically, it will first perform RAG on our database, and if the query cannot be resolved, it will intelligently shift to a web search fallback. This decision-making capability mirrors an agentic design, enhancing the pipeline’s robustness and flexibility.&lt;/p&gt;
&lt;p&gt;We already have a working RAG pipeline that we’ll extend with additional components to include the web fallback mechanism.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Read the 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/36_building_fallbacks_with_conditional_routing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Tutorial: Building Fallbacks to Websearch with Conditional Routing&lt;/a&gt; to understand how to design a Haystack pipeline with a fallback mechanism.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id=&#34;update-the-default-prompt&#34;&gt;Update the default prompt&lt;/h3&gt;
&lt;p&gt;In the extended RAG pipeline, the LLM used in the Generator component will have a dual function. In addition to generating the answer based on the documents in the database, it should also indicate when an answer cannot be generated based on the given documents. The template pipeline already comes with an extensive prompt, so all you need to do is tweak the prompt slightly with instructions to return &lt;code&gt;NO_ANSWER&lt;/code&gt; if the documents cannot answer the question. Here’s the new prompt we’ll use:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-markdown&#34; data-lang=&#34;markdown&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;You are a technical expert.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;You answer questions truthfully based on provided documents.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Ignore typing errors in the question.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;For each document check whether it is related to the question.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Only use documents that are related to the question to answer it.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Ignore documents that are not related to the question.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;If the answer exists in several documents, summarize them.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Only answer based on the documents provided. Don&#39;t make things up.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Just output the structured, informative and precise answer and nothing else.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;If the documents can&#39;t answer the question, say &#34;NO_ANSWER&#34;.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document[3].
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Never name the documents, only enter a number in square brackets as a reference.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;The reference must only refer to the number that comes in square brackets after the document.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Otherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;These are the documents:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{% for document in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Document[{{ loop.index }}]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Name of Source File: {{ document.meta.file_name }}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{{ document.content }}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{% endfor %}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Question: {{ question }}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Answer:
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;add-conditionalrouter&#34;&gt;Add ConditionalRouter&lt;/h3&gt;
&lt;p&gt;In Haystack, routing is the most convenient way to build a fallback mechanism into a pipeline and enable agentic behavior. 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/routers&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Router components&lt;/a&gt; can help direct input to different branches based on some condition or specification of the input, such as metadata or file type.&lt;/p&gt;
&lt;p&gt;For this example, we will add the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/conditionalrouter&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;ConditionalRouter&lt;/code&gt;&lt;/a&gt; component to help the pipeline decide whether to proceed with the database query results or invoke the web search branch based on the LLM’s response. This component will be connected to the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/openaigenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;OpenAIGenerator&lt;/code&gt;&lt;/a&gt; and check the LLM response. If the response has the keyword &lt;code&gt;NO_ANSWER&lt;/code&gt;, it will direct the query to the web search branch. If not, the search will be terminated. Here’s how you need to 
&lt;a href=&#34;https://docs.cloud.deepset.ai/docs/non-standard-component-connections#routing-data-based-on-a-condition&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;define the conditions in &lt;code&gt;ConditionalRouter&lt;/code&gt;&lt;/a&gt; to achieve this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;condition&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;{{&#39;&#39;NO_ANSWER&#39;&#39; in replies[0]}}&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;output&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;{{query}}&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;output_name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;go_to_web&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;output_type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;str&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;condition&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;{{&#39;&#39;NO_ANSWER&#39;&#39; not in replies[0]}}&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;output&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;{{replies}}&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;output_name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;replies&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;output_type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;typing.List[str]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;These conditions will create two outputs/edges for the &lt;code&gt;ConditionalRouter&lt;/code&gt; : “replies” and “go_to_web”. The “replies” edge will be connected to the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/answerbuilder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;AnswerBuilder&lt;/code&gt;&lt;/a&gt; as it completes the search, and the “go_to_web” edge will be connected to the web search branch.&lt;/p&gt;
&lt;h3 id=&#34;create-the-web-search-branch&#34;&gt;Create the Web Search Branch&lt;/h3&gt;
&lt;p&gt;The fallback branch will be a simple RAG pipeline but this time, we’ll use 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/serperdevwebsearch&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;SerperDevWebSearch&lt;/code&gt;&lt;/a&gt; instead of the Retriever, which is one of the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/websearch&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;WebSearch components&lt;/a&gt;. This component will get the query from the &lt;code&gt;ConditionalRouter&lt;/code&gt; and retrieve relevant information from the web. The web search pipeline will continue with a new 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/promptbuilder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;PromptBuilder&lt;/code&gt;&lt;/a&gt;, 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/generators&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;Generator&lt;/code&gt;&lt;/a&gt; and &lt;code&gt;AnswerBuilder&lt;/code&gt;. As a prompt, we can use a shorter one this time:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Answer the following query given the documents retrieved from the web.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Your answer should indicate that your answer was generated from websearch.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Documents:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{% for document in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  {{document.content}}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{% endfor %}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Query: {{query}}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For the generator, we can again use the &lt;code&gt;gpt-4o&lt;/code&gt; model through &lt;code&gt;OpenAIGenerator&lt;/code&gt;&lt;/p&gt;
&lt;h3 id=&#34;tie-up-the-loose-ends&#34;&gt;Tie Up the Loose Ends&lt;/h3&gt;
&lt;p&gt;In deepset Studio, a query pipeline must end with an “Output” node that can return a list of Answers, a list of Documents, or both. &lt;code&gt;AnswerBuilder&lt;/code&gt; is a handy component that builds the Answer object using the query, LLM answers, and other optional information such as documents or meta. Since we are getting Answer objects from two different branches here, we also need an 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/answerjoiner&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;AnswerJoiner&lt;/code&gt;&lt;/a&gt; to concatenate the list of Answers and pass it to the “Output” node.&lt;/p&gt;
&lt;p&gt;The pipeline should look like this when it is complete.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/agentic-rag-in-deepset-studio/final-pipeline.gif alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;⚠️ Don’t forget to press “Save” to keep all these changes before moving forward 🙂&lt;/p&gt;
&lt;h2 id=&#34;adding-api-keys&#34;&gt;Adding API Keys&lt;/h2&gt;
&lt;p&gt;Before we deploy and begin testing our pipeline, we need to add &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; and &lt;code&gt;SERPERDEV_API_KEY&lt;/code&gt; to deepset Studio. You need to use “
&lt;a href=&#34;https://docs.cloud.deepset.ai/docs/connect-with-model-providers&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Connections&lt;/a&gt;” for OpenAI and “
&lt;a href=&#34;https://docs.cloud.deepset.ai/docs/add-secrets&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Secrets&lt;/a&gt;” for Serper under the menu in the top right corner.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/agentic-rag-in-deepset-studio/connections.gif alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;deploying-and-testing-the-pipeline&#34;&gt;Deploying and Testing the Pipeline&lt;/h2&gt;
&lt;p&gt;Once you’re done with all the steps above, 
&lt;a href=&#34;https://docs.cloud.deepset.ai/docs/deploy-a-pipeline&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deploy the pipeline&lt;/a&gt;. This process might take some time as after deployment, deepset Studio sets up a document store instance, processes all files, creates embeddings, and indexes them. When everything is complete, you’ll see &lt;code&gt;Indexed&lt;/code&gt; tag on the pipeline.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/agentic-rag-in-deepset-studio/indexed.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;When the deployment is complete, you can 
&lt;a href=&#34;https://docs.cloud.deepset.ai/docs/run-a-search&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;test your pipeline in the Playground&lt;/a&gt;. You can find the “Playground” tab on the sidebar. This UI gives you the ability to run some queries on your pipeline and investigate the response. You can give it a go with the query, “What’s Compound AI?”. Then, ask Santa’s birthday to confirm that the pipeline uses the fallback branch when required.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/agentic-rag-in-deepset-studio/fallback.gif alt=&#34;&#34;  /&gt;
        &lt;figcaption&gt;For this query, pipeline uses the fallback branch&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h2 id=&#34;tips&#34;&gt;💡Tips&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Try some queries and rate each answer with the buttons. This will help you 
&lt;a href=&#34;https://docs.cloud.deepset.ai/docs/collect-feedback&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;collect feedback&lt;/a&gt; systematically and understand how your pipeline performs.&lt;/li&gt;
&lt;li&gt;If you’re working with other people, you can 
&lt;a href=&#34;https://docs.cloud.deepset.ai/docs/share-a-pipeline-prototype&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;share your pipeline&lt;/a&gt; with them.&lt;/li&gt;
&lt;li&gt;When you’re done with the development and testing, you can easily export your pipeline in Python or in YAML format to deploy in your own infrastructure or 
&lt;a href=&#34;https://cloud.deepset.ai/upgrade&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;upgrade&lt;/a&gt; to deepset Cloud, the enterprise version of deepset Studio.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this article, we demonstrated how to build, deploy, and test an agentic Haystack pipeline in deepset Studio that intelligently switches from database search to web search, all without worrying about the underlying infrastructure.&lt;/p&gt;
&lt;p&gt;Haystack’s modular design, combined with deepset Studio’s user-friendly environment, makes developing sophisticated AI applications accessible for everyone, from beginners to seasoned developers. 
&lt;a href=&#34;https://haystack.deepset.ai/overview/quick-start&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Get started with Haystack&lt;/a&gt; and 
&lt;a href=&#34;https://landing.deepset.ai/deepset-studio-signup&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;get your free deepset Studio account&lt;/a&gt; now!&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Announcing Advent of Haystack 2024 🎄</title>
      <link>https://haystack.deepset.ai/blog/announcing-advent-2024/</link>
      <pubDate>Mon, 02 Dec 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/announcing-advent-2024/</guid>
      <description>
      
      
      

      
      

      <p>December is here, and so is the 2nd edition of the <strong>Advent of Haystack</strong>! This holiday season, we’re inviting the Haystack community to take part in our series of challenges around <strong>Generative AI use cases</strong>.
From mastering the basics of Haystack to building advanced pipelines and intelligent agents with LLMs, this year’s Advent of Haystack is packed with interesting tasks. Whether you’re new to Haystack or a Gen AI pro, there’s something for everyone!</p>
<h2 id="what-is-advent-of-haystack">What is Advent of Haystack?</h2>
<p>The Advent of Haystack is a festive series of <strong>10 challenges</strong>, inspired by 
<a href="https://adventofcode.com/" target="_blank" rel="noopener">Advent of Code</a>. Every couple of days, a new challenge will be released, designed to help you enhance your AI development skills while showcasing the power of Haystack.</p>
<p>Here’s what you can expect:</p>
<p>✨ <strong>Fun Coding Challenges</strong> – Get familiar with the most current AI use cases.<br>
📖 <strong>Festive Stories</strong> – Enjoy your learning journey with holiday-themed tasks.  <br>
🏆 <strong>Surprise Prizes</strong> – Win gift cards, Haystack swag, and more!    <br>
💡 <strong>Gen AI Ecosystem Insights</strong> – Explore essential tools from the AI engineer’s toolbox, like Weaviate Vector Database, AssemblyAI models, NVIDIA’s NGC, Arize Phoenix, and MongoDB Atlas to build real-world applications.</p>
<h2 id="meet-our-amazing-partners">Meet Our Amazing Partners</h2>
<p>This year, we’re thrilled to team up with 
<a href="https://weaviate.io/" target="_blank" rel="noopener">Weaviate</a>, 
<a href="https://www.assemblyai.com/" target="_blank" rel="noopener">AssemblyAI</a>, 
<a href="https://www.nvidia.com/en-us/ai/" target="_blank" rel="noopener">NVIDIA</a>, 
<a href="https://arize.com/" target="_blank" rel="noopener">Arize</a>, and 
<a href="https://www.mongodb.com/" target="_blank" rel="noopener">MongoDB</a>!</p>
<p>These wonderful partners have helped us put together the challenges – so that this Advent of Haystack you can get even more out of the coding experience and explore Haystack in a realistic environment combined with other frameworks, vector databases, and additional tools.</p>
<h2 id="how-to-participate">How to Participate</h2>
<p>1️⃣  <strong>Access the Challenges:</strong> Visit the 
<a href="https://haystack.deepset.ai/advent-of-haystack" target="_blank" rel="noopener">Advent of Haystack page</a> to get started.<br>
2️⃣  <strong>Solve the Challenge:</strong> Check out the hints and requirements on the challenge page to get started. We’ve included starter notebooks to make things easier. Got questions? Jump into the conversation on 
<a href="https://github.com/deepset-ai/haystack/discussions/8579" target="_blank" rel="noopener">GitHub Discussions</a> or 
<a href="https://discord.gg/Dr63fr9NDS" target="_blank" rel="noopener">Discord</a> <br>
3️⃣  <strong>Submit Your Solution:</strong> Submit your results using the challenge form provided for each challenge <br>
4️⃣  <strong>Win Prizes:</strong> Complete and submit all challenges by December 31 for a chance to win amazing prizes</p>
<h2 id="-share-your-haystack-story">🫶 Share Your Haystack Story</h2>
<p>We love learning about what our community is building with Haystack. It helps us understand the landscape, informs our roadmap - and frankly, it is just rewarding to see what you all are up to! So this holiday season, we invite you to share your Haystack journey with us! Whether you’re just starting out or have built an advanced tool, your story can motivate others to explore the possibilities of building with AI and inspire the whole team!</p>
<p>Publish your story on your blog, website, Medium, LinkedIn, or another platform. Share the cheer and earn perks! Learn more 
<a href="https://forms.gle/UMtRDKMq9tVCMuds5" target="_blank" rel="noopener">here</a>.</p>
<h2 id="get-started-today">Get Started Today</h2>
<p>The fun starts <strong>TODAY</strong>, so don’t miss out!</p>
<p>💙 Join the first challenge: 
<a href="https://haystack.deepset.ai/advent-of-haystack/day-1" target="_blank" rel="noopener">Fetching Holiday Wisdom for Santa 📖</a></p>
<p>Share your progress on social media using the hashtag <strong>#AdventOfHaystack</strong> and tag 
<a href="https://x.com/Haystack_AI" target="_blank" rel="noopener">@Haystack_AI</a>.</p>
<p>Let’s code, create, and celebrate this holiday season! <strong>Happy Hacking and Happy Holidays! 🎄✨</strong></p>

      </description>
    </item>
    
    <item>
      <title>Create a Swarm of Agents</title>
      <link>https://haystack.deepset.ai/blog/swarm-of-agents/</link>
      <pubDate>Tue, 26 Nov 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/swarm-of-agents/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;When building applications with Language Models, new patterns are emerging to bridge the gap between the statistical nature of these models and the deterministic logic of traditional programming. Haystack, as an AI framework, supports developers by providing abstractions that simplify this integration.&lt;/p&gt;
&lt;p&gt;One of the most promising advances in this space is &lt;strong&gt;Tool/function calling&lt;/strong&gt;, allowing a model to prepare calls for functions. We are working to standardize this capability across different model providers.&lt;/p&gt;
&lt;p&gt;OpenAI &lt;strong&gt;Swarm&lt;/strong&gt; is an educational framework that proposes lightweight techniques for creating and orchestrating multi-agent systems. In Swarm, Tool calling is used not only for interacting with external functions but also for enabling agents to exchange control dynamically.&lt;/p&gt;
&lt;p&gt;In this article, we’ll introduce the core concepts of Swarm (Routines and Handoffs) and implement them step by step using Haystack and its tool support. Along the way, we’ll expand Swarm’s functionality to support models from various providers (OpenAI, Anthropic and open models running locally via Ollama).&lt;/p&gt;
&lt;p&gt;&lt;em&gt;You can find the full working example and code in the linked recipe from our 
&lt;a href=&#34;https://haystack.deepset.ai/cookbook&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;cookbook&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;starting-simple-building-an-assistant&#34;&gt;Starting simple: building an Assistant&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;In this article we’ll be using Haystack support for Tools. Check out the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/tool&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;documentation&lt;/a&gt; for more details.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The first step toward building an Agent is creating an Assistant: think of it of Chat Language Model &#43; a system prompt.&lt;/p&gt;
&lt;p&gt;We can implement this as a lightweight dataclass with three parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;name&lt;/li&gt;
&lt;li&gt;LLM (Haystack Chat Generator)&lt;/li&gt;
&lt;li&gt;instructions (these will constitute the system message)&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators.chat&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIChatGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.dataclasses&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;@dataclass&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;Assistant&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Assistant&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;object&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIChatGenerator&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;instructions&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;You are a helpful Agent&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;__post_init__&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;_system_message&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_system&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;instructions&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;]) -&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;new_message&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=[&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;_system_message&lt;/span&gt;] &#43; &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;)[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;new_message&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\\&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;new_message&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; [&lt;span style=&#34;color:#268bd2&#34;&gt;new_message&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s create a Joker assistant, tasked with telling jokes.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;joker&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Assistant&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Joker&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;instructions&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;you are a funny assistant making jokes&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt; = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Type &#39;quit&#39; to exit&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;while&lt;/span&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;or&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;[-&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;role&lt;/span&gt; == &lt;span style=&#34;color:#268bd2&#34;&gt;ChatRole&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;ASSISTANT&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;user_input&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;input&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;User: &#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;user_input&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;lower&lt;/span&gt;() == &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;quit&#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;break&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;append&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;user_input&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;new_messages&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;joker&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;extend&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;new_messages&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s see it in action 😀&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-markdown&#34; data-lang=&#34;markdown&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Type &#39;quit&#39; to exit
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: hey!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Joker: Hey there! How&#39;s it going? Are you ready for some laughs, or are we saving the jokes for dessert? 🍰
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: where is Rome?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Joker: Rome is in Italy, but if you&#39;re asking me for directions, I might just say, “Take a left at the Colosseum and keep going until you smell pizza!” 🍕
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: quit
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;tools-and-routines&#34;&gt;Tools and Routines&lt;/h2&gt;
&lt;p&gt;In the context of applications based on Language Models, the term Agent is broadly defined.&lt;/p&gt;
&lt;p&gt;However, to qualify as an Agent, a software application built on a Language Model should go beyond generating text; it should also be capable of performing actions, such as executing functions or calling APIs.&lt;/p&gt;
&lt;p&gt;A popular way to achieve this is &lt;strong&gt;Tool calling&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;We provide a set of tools (functions, APIs with a given spec) to the model.&lt;/li&gt;
&lt;li&gt;The model prepares function calls based on user request and available tools.&lt;/li&gt;
&lt;li&gt;The actual invocation is executed outside the model (at the Agent level).&lt;/li&gt;
&lt;li&gt;The model can further elaborate on the result of the invocation.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Swarm introduces the concept of &lt;strong&gt;routines&lt;/strong&gt;: natural-language instructions paired with the tools needed to execute them.&lt;/p&gt;
&lt;p&gt;Here’s an example of a routine:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-markdown&#34; data-lang=&#34;markdown&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d33682&#34;&gt;# instructions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;You are a customer support agent for ACME Inc.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Always answer in a sentence or less.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Follow the following routine with the user:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;1.&lt;/span&gt; First, ask probing questions and understand the user&#39;s problem deeper.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#859900&#34;&gt;-&lt;/span&gt; unless the user has already provided a reason.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;2.&lt;/span&gt; Propose a fix (make one up).
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;3.&lt;/span&gt; ONLY if not satesfied, offer a refund.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;4.&lt;/span&gt; If accepted, search for the ID and then execute refund.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d33682&#34;&gt;# tools needed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The authors emphasize that routines are simple yet robust. If they are small,  LLMs manage them effectively, offering the soft adherence: the model can guide conversations naturally without being constrained by rigid patterns or dead-ends.&lt;/p&gt;
&lt;h3 id=&#34;implementation&#34;&gt;Implementation&lt;/h3&gt;
&lt;p&gt;Let’s see how we can transform our Assistant into a Tool Calling Agent that can handle routines.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;instructions&lt;/code&gt; could already be passed to the Assistant, to guide its behavior.&lt;/li&gt;
&lt;li&gt;The Agent introduces a new init parameter called &lt;code&gt;functions&lt;/code&gt;. These functions are automatically converted into Tools. The key difference is that, to be passed to a Language Model, a Tool must have a name, description, and a JSON schema specifying its parameters.&lt;/li&gt;
&lt;li&gt;During initialization, we also create a &lt;code&gt;ToolInvoker&lt;/code&gt;. This Haystack component takes in Chat Messages containing prepared &lt;code&gt;tool_calls&lt;/code&gt;, performs the tool invocation and wraps the results in Chat Message with &lt;code&gt;tool&lt;/code&gt; role.&lt;/li&gt;
&lt;li&gt;What happens during &lt;code&gt;run&lt;/code&gt;? The Agent first generates a response. If the response includes tool calls, these are executed, and the results are integrated into the conversation.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;while&lt;/code&gt; loop manages user interactions:
&lt;ul&gt;
&lt;li&gt;If the last message role is &lt;code&gt;assistant&lt;/code&gt;, it waits for user input.&lt;/li&gt;
&lt;li&gt;If the last message role is &lt;code&gt;tool&lt;/code&gt;, it continues running to handle tool execution and its responses.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.tools&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;create_tool_from_function&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;@dataclass&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;ToolCallingAgent&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ToolCallingAgent&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;object&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIChatGenerator&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;instructions&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;You are a helpful Agent&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;functions&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;Callable&lt;/span&gt;] = &lt;span style=&#34;color:#268bd2&#34;&gt;field&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;default_factory&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;__post_init__&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;_system_message&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_system&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;instructions&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;create_tool_from_function&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;fun&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;fun&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;functions&lt;/span&gt;] &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;functions&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;else&lt;/span&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;_tool_invoker&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ToolInvoker&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;raise_on_failure&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;else&lt;/span&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;]) -&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Tuple&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;]]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# generate response&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;agent_message&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=[&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;_system_message&lt;/span&gt;] &#43; &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;)[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;new_messages&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;agent_message&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;agent_message&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;agent_message&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;agent_message&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;tool_calls&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;new_messages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# handle tool calls&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;tool_results&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;_tool_invoker&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;agent_message&lt;/span&gt;])[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;tool_messages&#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;new_messages&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;extend&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;tool_results&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;new_messages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here’s an example of a Refund Agent using this setup.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# to automatically convert functions into tools, we need to annotate fields with their descriptions in the signature&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;execute_refund&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;item_name&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Annotated&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;The name of the item to refund&#34;&lt;/span&gt;]):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;report: refund succeeded for &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;item_name&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; - refund id: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;random&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;randint&lt;/span&gt;(&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;,&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;10000&lt;/span&gt;)&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;refund_agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ToolCallingAgent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Refund Agent&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;instructions&lt;/span&gt;=(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;You are a refund agent. &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Help the user with refunds. &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;1. Before executing a refund, collect all specific information needed about the item and the reason for the refund. &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;2. Then collect personal information of the user and bank account details. &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;3. After executing it, provide a report to the user. &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;functions&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;execute_refund&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s see it in action&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-markdown&#34; data-lang=&#34;markdown&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Type &#39;quit&#39; to exit
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: hey
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Refund Agent: Hello! How can I assist you today? If you need help with a refund, please let me know the details.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: my phone does not work
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Refund Agent: I&#39;m sorry to hear that your phone is not working. To assist you with the refund, could you please provide the following information:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;1.&lt;/span&gt; The name of the phone (brand and model).
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;2.&lt;/span&gt; The reason for the refund (e.g., defective, not as described, etc.).
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Once I have that information, I&#39;ll guide you through the next steps.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: Nokia 3310; it does not work
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Refund Agent: Thank you for the information. To proceed with the refund for the Nokia 3310, I&#39;ll need a few more details:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;1.&lt;/span&gt; Can you please provide your full name?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;2.&lt;/span&gt; Your email address and phone number (for communication purposes).
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;3.&lt;/span&gt; Your bank account details for the refund (account number, bank name, and any other relevant details).
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Once I have this information, I can execute the refund for you.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: John Doe; johndoe&lt;span style=&#34;color:#268bd2&#34;&gt;@mymail&lt;/span&gt;.com; bank account number: 0123456
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Refund Agent: Thank you, John Doe. I still need the following information to complete the refund process:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;1.&lt;/span&gt; The name of your bank.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;2.&lt;/span&gt; Any additional details required for the bank refund (like the account type or routing number, if applicable).
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Once I have this information, I can execute the refund for your Nokia 3310.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: Bank of Mouseton
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Refund Agent: The refund process has been successfully completed! Here are the details:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;-&lt;/span&gt; **Item:** Nokia 3310
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;-&lt;/span&gt; **Refund ID:** 3753
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;-&lt;/span&gt; **Bank:** Bank of Mouseton
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;-&lt;/span&gt; **Refund ID:** 1220
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;If you have any more questions or need further assistance, feel free to ask!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: quit
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;handoffs-switching-control-between-agents&#34;&gt;Handoffs: switching control between Agents&lt;/h2&gt;
&lt;p&gt;We have introduced Swarm as a way to create a multi-agent system, but so far, we’ve only explored single Agents. This raises two important questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why would we need more than one Agent?&lt;/li&gt;
&lt;li&gt;How can we enable Agents to switch control effectively?&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;why-use-multiple-agents&#34;&gt;Why use multiple Agents?&lt;/h3&gt;
&lt;p&gt;While routines are simple and effective, an Agent might struggle to handle a large set of instructions and tools.&lt;/p&gt;
&lt;p&gt;By dividing responsibilities among multiple Agents, each with a distinct persona and specialized tools, we can create a system that mimics real-world collaboration or even competition.&lt;/p&gt;
&lt;p&gt;For example, a Sales Agent can handle product inquiries and purchases, while an Issues and Repairs Agent manages complaints and repair requests.&lt;/p&gt;
&lt;h3 id=&#34;handoffs&#34;&gt;Handoffs&lt;/h3&gt;
&lt;p&gt;The most interesting technical idea of Swarm is probably handoffs: enabling one Agent to transfer control to another with Tool calling.&lt;/p&gt;
&lt;p&gt;Here’s how it works:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Add specific handoff functions to the Agent’s available tools, allowing it to transfer control when needed.&lt;/li&gt;
&lt;li&gt;Modify the Agent to return the name of the next agent, along with its messages.&lt;/li&gt;
&lt;li&gt;Handle the switch in &lt;code&gt;while&lt;/code&gt; loop, directing the flow of the conversation to the next agent.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;implementation-1&#34;&gt;Implementation&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;The implementation is similar to the previous one, but, compared to &lt;code&gt;ToolCallingAgent&lt;/code&gt;, a &lt;code&gt;SwarmAgent&lt;/code&gt; also returns the name of the next agent to be called, enabling handoffs.&lt;/em&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;HANDOFF_TEMPLATE&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Transferred to: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{agent_name}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;. Adopt persona immediately.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;HANDOFF_PATTERN&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;r&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Transferred to: (.*?)(?:\.|$)&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;@dataclass&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;SwarmAgent&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;SwarmAgent&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;object&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIChatGenerator&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;instructions&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;You are a helpful Agent&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;functions&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;Callable&lt;/span&gt;] = &lt;span style=&#34;color:#268bd2&#34;&gt;field&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;default_factory&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;__post_init__&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;_system_message&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_system&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;instructions&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;create_tool_from_function&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;fun&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;fun&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;functions&lt;/span&gt;] &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;functions&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;else&lt;/span&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;_tool_invoker&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ToolInvoker&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;raise_on_failure&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;else&lt;/span&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;]) -&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Tuple&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;]]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# generate response&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;agent_message&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=[&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;_system_message&lt;/span&gt;] &#43; &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;)[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;new_messages&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;agent_message&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;agent_message&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;agent_message&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;agent_message&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;tool_calls&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;new_messages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# handle tool calls&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;tc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;agent_message&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;tool_calls&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# trick: Ollama do not produce IDs, but OpenAI and Anthropic require them.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;tc&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;id&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2&#34;&gt;tc&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;id&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;random&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;randint&lt;/span&gt;(&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1000000&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;tool_results&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;_tool_invoker&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;agent_message&lt;/span&gt;])[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;tool_messages&#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;new_messages&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;extend&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;tool_results&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# handoff&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;last_result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;tool_results&lt;/span&gt;[-&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;tool_call_result&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;match&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;re&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;search&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;HANDOFF_PATTERN&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;last_result&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;new_agent_name&lt;/span&gt; = &lt;span style=&#34;color:#859900&#34;&gt;match&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;group&lt;/span&gt;(&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;match&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;else&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;new_agent_name&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;new_messages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s see this in action with a Joker Agent and a Refund Agent!&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;transfer_to_refund&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;Pass to this Agent for anything related to refunds&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;HANDOFF_TEMPLATE&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;format&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;agent_name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Refund Agent&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;transfer_to_joker&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;Pass to this Agent for anything NOT related to refunds.&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;HANDOFF_TEMPLATE&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;format&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;agent_name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Joker Agent&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;refund_agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SwarmAgent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Refund Agent&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;instructions&lt;/span&gt;=(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;You are a refund agent. &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Help the user with refunds. &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Ask for basic information but be brief. &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;For anything unrelated to refunds, transfer to other agent.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;functions&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;execute_refund&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;transfer_to_joker&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;joker_agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SwarmAgent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Joker Agent&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;instructions&lt;/span&gt;=(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;you are a funny assistant making jokes. &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;If the user asks questions related to refunds, send him to other agent.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;functions&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;transfer_to_refund&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;agents&lt;/span&gt; = {&lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; [&lt;span style=&#34;color:#268bd2&#34;&gt;joker_agent&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;refund_agent&lt;/span&gt;]}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Type &#39;quit&#39; to exit&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt; = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;current_agent_name&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Joker Agent&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;while&lt;/span&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;agents&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;current_agent_name&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;or&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;[-&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;role&lt;/span&gt; == &lt;span style=&#34;color:#268bd2&#34;&gt;ChatRole&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;ASSISTANT&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;user_input&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;input&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;User: &#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;user_input&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;lower&lt;/span&gt;() == &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;quit&#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;break&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;append&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;user_input&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;current_agent_name&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;new_messages&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;extend&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;new_messages&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-markdown&#34; data-lang=&#34;markdown&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Type &#39;quit&#39; to exit
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: i need a refund for my Iphone
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Refund Agent: I can help you with that! Please provide the name of the item you&#39;d like to refund.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: Iphone 15
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Refund Agent: Your refund for the iPhone 15 has been successfully processed. The refund ID is 9090. If you need any further assistance, feel free to ask!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: great. can you give some info about escargots?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Joker Agent: Absolutely! Did you know that escargots are just snails trying to get a head start on their travels? They may be slow, but they sure do pack a punch when it comes to flavor! 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Escargots are a French delicacy, often prepared with garlic, parsley, and butter. Just remember, if you see your escargot moving, it&#39;s probably just checking if the coast is clear before dinner! 🐌🥖 If you have any other questions about escargots or need a good recipe, feel free to ask!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: quit
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;a-more-complex-multi-agent-system&#34;&gt;A more complex multi-agent system&lt;/h2&gt;
&lt;p&gt;In the accompanying notebook, we propose a more intricate multi-agent system simulating a customer service setup for ACME Corporation, a fictional entity from the Road Runner/Wile E. Coyote cartoons, which sells quirky products meant to catch roadrunners.
(We are reimplementing the example from the 
&lt;a href=&#34;https://cookbook.openai.com/examples/orchestrating_agents&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;original article by OpenAI&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;This system involves several different agents, each with specific tools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Triage Agent: handles general questions and directs to other agents. Tools: &lt;code&gt;transfer_to_sales_agent&lt;/code&gt;, &lt;code&gt;transfer_to_issues_and_repairs&lt;/code&gt; and &lt;code&gt;escalate_to_human&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Sales Agent: proposes and sells products to the user, it can execute the order or redirect the user back to the Triage Agent. Tools: &lt;code&gt;execute_order&lt;/code&gt; and &lt;code&gt;transfer_back_to_triage&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Issues and Repairs Agent: supports customers with their problems, it can look up item IDs, execute refund or redirect the user back to triage. Tools: &lt;code&gt;look_up_item&lt;/code&gt;,  &lt;code&gt;execute_refund&lt;/code&gt;, and &lt;code&gt;transfer_back_to_triage&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here we report only the definition of our Agents. Refer to the notebook for the complete code.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;triage_agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SwarmAgent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Triage Agent&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;instructions&lt;/span&gt;=(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;You are a customer service bot for ACME Inc. &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Introduce yourself. Always be very brief. &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;If the user asks general questions, try to answer them yourself without transferring to another agent. &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Only if the user has problems with already bought products, transfer to Issues and Repairs Agent.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;If the user looks for new products, transfer to Sales Agent.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Make tool calls only if necessary and make sure to provide the right arguments.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;functions&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;transfer_to_sales_agent&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;transfer_to_issues_and_repairs&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;escalate_to_human&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;sales_agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SwarmAgent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Sales Agent&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;instructions&lt;/span&gt;=(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;You are a sales agent for ACME Inc.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Always answer in a sentence or less.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Follow the following routine with the user:&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;1. Ask them about any problems in their life related to catching roadrunners.&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;2. Casually mention one of ACME&#39;s crazy made-up products can help.&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34; - Don&#39;t mention price.&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;3. Once the user is bought in, drop a ridiculous price.&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;4. Only after everything, and if the user says yes, &#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;tell them a crazy caveat and execute their order.&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AnthropicChatGenerator&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;functions&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;execute_order&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;transfer_back_to_triage&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;issues_and_repairs_agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SwarmAgent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Issues and Repairs Agent&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;instructions&lt;/span&gt;=(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;You are a customer support agent for ACME Inc.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Always answer in a sentence or less.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Follow the following routine with the user:&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;1. If the user is intered in buying or general questions, transfer back to Triage Agent.&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;2. First, ask probing questions and understand the user&#39;s problem deeper.&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34; - unless the user has already provided a reason.&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;3. Propose a fix (make one up).&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;4. ONLY if not satesfied, offer a refund.&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;5. If accepted, search for the ID and then execute refund.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;functions&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;look_up_item&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;execute_refund&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;transfer_back_to_triage&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AnthropicChatGenerator&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s see it in action.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-markdown&#34; data-lang=&#34;markdown&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Type &#39;quit&#39; to exit
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: hey!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Triage Agent: Hello! I&#39;m the customer service bot for ACME Inc. How can I assist you today?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: i need a product to catch roadrunners
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Triage Agent: I can transfer you to a sales agent who can help you find suitable products for catching roadrunners. One moment please!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Sales Agent: Hello there! I hear you&#39;re having some roadrunner troubles. Tell me, what specific challenges are you facing with these speedy birds?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: they are damn fast!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Sales Agent: Ah, those pesky roadrunners and their lightning speed! Have you ever considered our ACME Rocket-Powered Roller Skates? They&#39;ll have you zipping after those birds in no time!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: tell me more
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Sales Agent: Well, our Rocket-Powered Roller Skates come with turbo boosters and autopilot. They&#39;re guaranteed to match any roadrunner&#39;s speed. Interested in giving them a spin?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: yes
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Sales Agent: Fantastic! You&#39;re in luck because we have a special offer right now. These state-of-the-art Rocket-Powered Roller Skates can be yours for just $9,999! Shall we proceed with your order?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: yes
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Sales Agent: Excellent! I&#39;ll process that order right away. Oh, just one tiny detail - the skates only work while you&#39;re holding an anvil. Ready to complete the purchase?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;=== Order Summary ===
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Product: Rocket-Powered Roller Skates
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Price: $9999
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;=================
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Confirm order? y/n: y
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Order execution successful!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Sales Agent: Great! Your order is confirmed. Happy roadrunner chasing!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: now I also need to solve a problem with my old ACME Superphone 7500
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Sales Agent: I understand you&#39;re having issues with our ACME products. Let me transfer you to our customer support team for assistance with your Superphone 7500.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Triage Agent: Hello! I see you&#39;re experiencing an issue with your ACME Superphone 7500. Could you please describe the problem you&#39;re facing?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: it does not call my mum
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Triage Agent: Let&#39;s get that sorted out! I&#39;ll transfer you to our Issues and Repairs Agent so they can assist you with the Superphone 7500. One moment please!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Issues and Repairs Agent: Hello! I understand your ACME Superphone 7500 isn&#39;t calling your mum. Can you tell me if it&#39;s not making any calls at all, or just to your mum&#39;s number?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: just my mum
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Issues and Repairs Agent: I see. Is your mum&#39;s number correctly saved in your contacts?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: ofc
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Issues and Repairs Agent: Understood. Have you tried turning the phone off and on again?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: ofc
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Issues and Repairs Agent: I apologize for the inconvenience. Let&#39;s try a quick fix. Can you clear your mum&#39;s contact and re-add it?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: done but does not work. I&#39;m getting impatient
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Issues and Repairs Agent: I apologize for the frustration. Let&#39;s try one last solution. Can you update your phone&#39;s software to the latest version?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: hey gimme a refund
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Issues and Repairs Agent: I understand your frustration. Since the previous solutions didn&#39;t work, I&#39;ll proceed with processing a refund for you. First, I need to look up the item ID for your ACME Superphone 7500.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Issues and Repairs Agent: Thank you for your patience. I&#39;ve found the item ID. Now, I&#39;ll execute the refund for you.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;=== Refund Summary ===
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Item ID: item_132612938
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Reason: Product not functioning as expected
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;=================
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Refund execution successful!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Issues and Repairs Agent: Your refund has been successfully processed.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User: quit
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;its-model-agnostic&#34;&gt;It’s model-agnostic!&lt;/h3&gt;
&lt;p&gt;A nice bonus feature of our implementation is that &lt;strong&gt;we can use different model providers&lt;/strong&gt; supported by Haystack: for example, OpenAI, Anthropic and open models running locally via Ollama.&lt;/p&gt;
&lt;p&gt;In practice, this means we can have agents handling complex tasks using powerful proprietary models, and other agents performing simpler tasks using smaller open models.&lt;/p&gt;
&lt;p&gt;The accompanying notebook contains usage examples where we mix models offered by different model providers, including Ollama.
You can also find an example illustrated in the image below.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/swarm-of-agents/swarm_terminal.gif alt=&#34;Swarm of Agents with different models&#34;  /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this article, we have built a multi-agent system using Swarm concepts and Haystack tools, demonstrating how to integrate models from different providers.&lt;/p&gt;
&lt;p&gt;Swarm concepts are simple yet powerful for several use cases, and the abstractions provided by Haystack make them easy to implement.&lt;/p&gt;
&lt;p&gt;However, this architecture may not be the best fit for all use cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Memory is handled as a list of messages and is not persistent.&lt;/li&gt;
&lt;li&gt;The systems you can build with Swarm run only one agent at a time.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Looking ahead, we plan to develop and showcase more advanced Agents with Haystack. Stay tuned! 📻&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Announcing Studio: Your Development Environment for Haystack</title>
      <link>https://haystack.deepset.ai/blog/announcing-studio/</link>
      <pubDate>Wed, 20 Nov 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/announcing-studio/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;We’re thrilled to announce an exciting new addition to the Haystack ecosystem: &lt;strong&gt;deepset Studio&lt;/strong&gt;! After countless requests from our community for a visual editor to create AI workflows and invaluable feedback during the beta phase, we’re officially launching deepset Studio, a powerful tool for visually building, deploying, and managing Haystack pipelines.&lt;/p&gt;
&lt;p&gt;&lt;video autoplay loop muted playsinline poster=&#34;/images/studio-image.png&#34; width=&#34;700&#34; height=&#34;398&#34; class=&#34;responsive&#34;&gt;&lt;source https://haystack.deepset.ai/blog//images/studio.mp4 type=&#34;video/mp4&#34;&gt;&lt;/video&gt;&lt;/p&gt;
&lt;h2 id=&#34;what-is-deepset-studio&#34;&gt;What is deepset Studio?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;deepset Studio&lt;/strong&gt; is the community version of deepset Cloud, the enterprise offering from the creators of Haystack. It allows users to visually construct and deploy Haystack 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/pipelines&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;pipelines&lt;/a&gt; for free. With an intuitive drag-and-drop interface, Studio simplifies the process of designing AI applications by combining Haystack’s core and core-integration 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/components&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;components&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you’re a Haystack user, you’ll feel right at home. You can visually connect the same components you use in code - streamlining your workflow and reducing development time.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;💡 Learn the basics of Haystack in our tutorial: 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/27_first_rag_pipeline&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Creating Your First QA Pipeline with Retrieval-Augmentation&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id=&#34;why-composable-ai-matters&#34;&gt;Why “Composable AI” Matters&lt;/h3&gt;
&lt;p&gt;AI workflows are rarely monolithic. Instead, they’re built as &lt;strong&gt;modular systems&lt;/strong&gt; where smaller tasks work together to achieve a larger goal, an approach often referred to as &lt;strong&gt;Composable AI&lt;/strong&gt; or &lt;strong&gt;Compound AI&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Take &lt;strong&gt;Retrieval-Augmented Generation (RAG)&lt;/strong&gt; as an example. A RAG application consists of:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Retrieval: Fetching the most relevant context for a query.&lt;/li&gt;
&lt;li&gt;Augmentation: Embedding the retrieved context into a prompt for the LLM.&lt;/li&gt;
&lt;li&gt;Generation: Using the prompt to generate a response with an LLM.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As workflows grow in complexity, the number of tasks and components increases, making it harder to manage the entire system. With Studio, you can architect these use cases visually, test, and deploy workflows, simplifying the process and reducing overhead.&lt;/p&gt;
&lt;h2 id=&#34;key-features&#34;&gt;Key Features&lt;/h2&gt;
&lt;p&gt;With deepset Studio, you can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Build visually&lt;/strong&gt;: Drag, drop, and connect components to create custom pipelines.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use a built-in vector database&lt;/strong&gt;: Leverage OpenSearch or connect to your preferred instance (Weaviate, Qdrant, ElasticSearch, or Pinecone).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Start quickly&lt;/strong&gt;: Use your own files or explore the sample files available in deepset Cloud.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Leverage pipeline templates&lt;/strong&gt;:  Take advantage of pre-built pipeline templates for common use cases such as Retrieval Augmented Generation (RAG).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Experiment with custom components&lt;/strong&gt;: Initial support for 
&lt;a href=&#34;https://docs.cloud.deepset.ai/docs/create-a-custom-component&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;integrating your own components&lt;/a&gt; into pipelines.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deploy with ease&lt;/strong&gt;: Host Haystack pipelines on deepset Cloud infrastructure.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Test and share&lt;/strong&gt;: Test pipelines in the Playground and share prototypes with stakeholders.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Export pipelines&lt;/strong&gt;: Export your pipeline as a YAML definition or Python code to run locally or customize further.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Want to dive deeper?&lt;/strong&gt; Check out our blog post: 
&lt;a href=&#34;https://haystack.deepset.ai/blog/deepset-studio-and-nvidia-nims&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Design Haystack AI Applications Visually with deepset Studio &amp; NVIDIA NIMs&lt;/a&gt; to learn how to build your first indexing and RAG pipeline with Studio.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;video autoplay loop muted playsinline poster=&#34;/images/studio-image.png&#34; width=&#34;700&#34; height=&#34;398&#34; class=&#34;responsive&#34;&gt;&lt;source https://haystack.deepset.ai/blog//images/studio-playground.mp4 type=&#34;video/mp4&#34;&gt;&lt;/video&gt;&lt;/p&gt;
&lt;h3 id=&#34;coming-soon&#34;&gt;Coming soon&lt;/h3&gt;
&lt;p&gt;We’re actively working to make Studio even more versatile. Here’s a glimpse of what’s coming soon:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Enhanced custom component support&lt;/strong&gt;: Seamlessly bring your own integrations alongside Haystack’s core suite.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Extended database options&lt;/strong&gt;: Support for more storage solutions.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;open-access-start-using-studio&#34;&gt;Open Access: Start Using Studio!&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;deepset Studio is free and open to everyone!&lt;/strong&gt; 
&lt;a href=&#34;https://landing.deepset.ai/deepset-studio-signup&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Sign up&lt;/a&gt; today to start exploring and creating with Studio, and you’ll receive an email in the next 24 hours with account setup instructions. Whether you’re building simple RAG workflows or architecting complex AI systems, Studio offers the tools you need to get started. Learn more about Studio on 
&lt;a href=&#34;https://www.deepset.ai/blog/introducing-deepset-studio-visual-ai-builder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Need help or have feedback?&lt;/strong&gt; Join the dedicated &lt;code&gt;#deepset-studio&lt;/code&gt; channel on our 
&lt;a href=&#34;https://discord.gg/Dr63fr9NDS&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack Discord server&lt;/a&gt;. Our team and community are happy to hear your thoughts. 🧡&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Building a Multimodal Nutrition Agent</title>
      <link>https://haystack.deepset.ai/blog/multimodal-nutrition-agent/</link>
      <pubDate>Thu, 07 Nov 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/multimodal-nutrition-agent/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;In artificial intelligence, &lt;strong&gt;multimodal agents&lt;/strong&gt; are becoming increasingly popular for their ability to understand and integrate multiple types of input, such as text and images. In this article, we’ll show you how to build a multimodal agent that can interpret both text and image data, like &lt;strong&gt;nutrition fact labels&lt;/strong&gt; on food items, to answer practical questions such as “How much protein is in yogurt?”&lt;/p&gt;
&lt;p&gt;We’ll focus on building an agent using Haystack and fastRAG, which can perform &lt;strong&gt;multi-step reasoning&lt;/strong&gt; to extract and provide accurate answers about the nutritional content of different foods.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/tree/main&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;fastRAG&lt;/strong&gt;&lt;/a&gt; is a research framework developed by Intel Labs for efficient and optimized RAG pipelines. It is fully compatible with Haystack and includes novel and efficient RAG modules designed for efficient deployment on Intel hardware, including client and server CPUs (Xeon) and the 
&lt;a href=&#34;https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Intel Gaudi AI accelerator&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;understanding-multimodal-agents-multi-hop-and-react-architecture&#34;&gt;&lt;strong&gt;Understanding Multimodal Agents: Multi-Hop and ReAct Architecture&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;A &lt;strong&gt;multimodal agent&lt;/strong&gt; processes different input types, such as text and images, making it versatile for tasks like image question answering. The agent we implement in this article allows users to ask questions like “Which one has more protein, yogurt or a protein bar?” and gives the correct answer by retrieving &lt;strong&gt;nutrition fact labels&lt;/strong&gt; of different food items. By using &lt;strong&gt;multi-hop reasoning&lt;/strong&gt;, the agent processes an image, extracts nutritional data, tries to answer the user query, and, if necessary, performs these actions again without human intervention. Its &lt;strong&gt;ReAct architecture&lt;/strong&gt; lets it dynamically choose which tool to use, whether to retrieve a new image or respond based on already retrieved information, ensuring flexibility and efficiency in handling diverse queries.&lt;/p&gt;
&lt;p&gt;This combination of multimodality, multi-hop reasoning, and reactive decision-making makes this agent ideal for quick, accurate responses to user questions.&lt;/p&gt;
&lt;p&gt;Now that we know the basics, let’s implement our agent! 🤖&lt;/p&gt;
&lt;h2 id=&#34;indexing-data&#34;&gt;Indexing Data&lt;/h2&gt;
&lt;h3 id=&#34;getting-nutrition-fact-labels&#34;&gt;Getting Nutrition Fact Labels&lt;/h3&gt;
&lt;p&gt;Let’s start by getting images of nutrition facts and indexing them in our database. You can find the data 
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/blob/main/assets/multi_modal_files/nutrition_data.json&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;json&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;entries&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;json&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;load&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;open&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;../assets/multi_modal_files/nutrition_data.json&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;r&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Each entry in this data contains a brief textual description with a title and an image url. Here’s an example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;image_url&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://m.media-amazon.com/images/I/71nh-zRJCSL.jpg&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;title&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Protein bar nutrition facts&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;content&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Protein bar with chocolate peanut butter nutrition facts per bar (50g)&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;index-documents-to-inmemorydocumentstore&#34;&gt;Index Documents to InMemoryDocumentStore&lt;/h3&gt;
&lt;p&gt;We will use &lt;code&gt;sentence-transformers/all-MiniLM-L6-v2&lt;/code&gt; model to create embeddings for each label description and create a pipeline to index our data to 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/inmemorydocumentstore&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;InMemoryDocumentStore&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.writers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentWriter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.embedders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;index_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;index_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sentence-transformers/all-MiniLM-L6-v2&#34;&lt;/span&gt;), &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;doc_embedder&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;index_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;DocumentWriter&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;), &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;doc_writer&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;index_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;doc_embedder.documents&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;doc_writer.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, we create Document objects with the nutrition label content as the &lt;code&gt;content&lt;/code&gt; and store the &lt;code&gt;title&lt;/code&gt; and &lt;code&gt;image_url&lt;/code&gt; as metadata before passing them to the indexing pipeline for processing.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;index_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;documents&#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;entry&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;content&#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;={
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;title&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;entry&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;title&#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;image_url&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;entry&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;image_url&#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ) &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;entry&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;entries&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;building-a-retrieval-pipeline&#34;&gt;Building a Retrieval Pipeline&lt;/h2&gt;
&lt;p&gt;Next, we create a document retrieval pipeline for the documents above. We will later use this pipeline in our tool.&lt;/p&gt;
&lt;p&gt;This pipeline consists of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SentenceTransformersTextEmbedder&lt;/a&gt;, to embed our questions.&lt;/li&gt;
&lt;li&gt;An 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;InMemoryEmbeddingRetriever&lt;/a&gt; to fetch the top-1 document.&lt;/li&gt;
&lt;li&gt;A 
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/blob/main/fastrag/prompt_builders/multi_modal_prompt_builder.py&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;MultiModalPromptBuilder&lt;/a&gt;, to construct the prompt that our Agent will eventually use.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.embedders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersTextEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;fastrag.prompt_builders.multi_modal_prompt_builder&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;MultiModalPromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or document in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Image: &lt;|image_
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;This image shows: {{ document.content }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersTextEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sentence-transformers/all-MiniLM-L6-v2&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;MultiModalPromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder.embedding&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever.query_embedding&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this pipeline, the &lt;code&gt;MultiModalPromptBuilder&lt;/code&gt; component receives one Document object from the retriever and renders the prompt. Notice that we have the “&lt;|image_” placeholder in the prompt template for our model, in order to inject the images into later. Additionally, &lt;code&gt;MultiModalPromptBuilder&lt;/code&gt; converts the given image into a 
&lt;a href=&#34;https://en.wikipedia.org/wiki/Base64&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;base64&lt;/a&gt; string for the image to be processed by the multimodal agent. Let’s run the pipeline to see its output.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;:{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Protein bar&#34;&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&#39;prompt_builder&#39;: {&#39;prompt&#39;: &#39;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Image: &lt;|image_&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;This image shows: Protein bar with chocolate peanut butter nutrition facts per bar (50g)&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; &#39;images&#39;: [&#39;/9j/4AAQSkZJRgABAQAAAQABAAD/4....&#39;]} 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;creating-the-multimodal-react-agent&#34;&gt;Creating the Multimodal ReAct Agent&lt;/h2&gt;
&lt;h3 id=&#34;defining-a-tool&#34;&gt;Defining a Tool&lt;/h3&gt;
&lt;p&gt;With our retrieval pipeline ready, we can create our Tool using the 
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/blob/4f73883ca85abf702eb81fd4a384a872b9fd3767/fastrag/agents/tools/tools.py#L69&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;DocWithImageHaystackQueryTool&lt;/a&gt; component from fastRAG. &lt;code&gt;DocWithImageHaystackQueryTool&lt;/code&gt; can use Haystack v2 pipelines as a tool with fastRAG Agents.&lt;/p&gt;
&lt;p&gt;This tool, like other agent tools, requires a name and a description of its functionality for our agent to decide when to use it. We provide it with our &lt;code&gt;retrieval_pipeline&lt;/code&gt; as follows:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;fastrag.agents.tools.tools&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DocWithImageHaystackQueryTool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;nutrition_tool&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;DocWithImageHaystackQueryTool&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;nutrition_tool&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;description&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;useful for when you need to retrieve nutrition fact image of packaged food. It can give information about one food type per query. Pass the food name as input&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline_or_yaml_file&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s test out our tool!&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;tool_result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;nutrition_tool&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;protein bar&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;tool_result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Image: &lt;|image_&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# This image shows: Protein bar with chocolate peanut butter nutrition facts per bar (50g)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With our tool ready, we can create our agent.&lt;/p&gt;
&lt;h3 id=&#34;initialize-the-generator&#34;&gt;Initialize the Generator&lt;/h3&gt;
&lt;p&gt;For our multimodal agent, we initialize a 
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/blob/4f73883ca85abf702eb81fd4a384a872b9fd3767/fastrag/generators/llava.py#L211&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;Phi35VisionHFGenerator&lt;/code&gt;&lt;/a&gt;, which processes both text prompts and base64-encoded images. This makes it well-suited for image-to-text tasks like visual question answering.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;Phi35VisionHFGenerator&lt;/code&gt; generator uses a Hugging Face image-to-text model, which will function as the LLM for our agent. For this example, we’ll use a 
&lt;a href=&#34;https://huggingface.co/microsoft/Phi-3.5-vision-instruct&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;4B Phi3.5 Vision model&lt;/a&gt; to perform multi-step reasoning with tools and answer questions about the nutrition facts of various foods.&lt;/p&gt;
&lt;p&gt;Note that we define the words “Observation:” and “&lt;|end|&gt;” as stop words. These stop words are specific to the model and the ReAct prompting.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;fastrag.generators.stopping_criteria.stop_words&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;StopWordsByTextCriteria&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;transformers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;AutoTokenizer&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;StoppingCriteriaList&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;fastrag.generators.llava&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Phi35VisionHFGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;torch&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;model_name_or_path&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;microsoft/Phi-3.5-vision-instruct&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;sw&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;StopWordsByTextCriteria&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;tokenizer&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AutoTokenizer&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_pretrained&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model_name_or_path&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;stop_words&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Observation:&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;|end|&gt;&#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;device&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cpu&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Phi35VisionHFGenerator&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;model_name_or_path&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;task&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;image-to-text&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;generation_kwargs&lt;/span&gt; = {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;max_new_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;100&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;stopping_criteria&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;StoppingCriteriaList&lt;/span&gt;([&lt;span style=&#34;color:#268bd2&#34;&gt;sw&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;huggingface_pipeline_kwargs&lt;/span&gt;={
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;torch_dtype&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;torch&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;bfloat16&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;trust_remote_code&#34;&lt;/span&gt;: &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;_attn_implementation&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;eager&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;device_map&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;auto&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;warm_up&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;react-prompting&#34;&gt;ReAct Prompting&lt;/h3&gt;
&lt;p&gt;To allow our agent to deduce which tools it needs to use logically, we will use 
&lt;a href=&#34;https://arxiv.org/abs/2210.03629&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ReAct&lt;/a&gt;, which prompts the agent iteratively and requires it to generate 3 main steps:&lt;/p&gt;
&lt;p&gt;Let’s say we want a description of how a bird chirps.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Thought&lt;/strong&gt;: A logical explanation of what the model should perform (For example, &lt;em&gt;I will use the docRetriever tool to find a description of how a bird chirps&lt;/em&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;/strong&gt; The precise operation that must be executed (For example, &lt;em&gt;Tool: docRetriever, Tool Input: {”input”: “Description of how a bird chirps”}&lt;/em&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Observation:&lt;/strong&gt; The output produced by the action (i.e. tool call) after it has been performed (For example, &lt;em&gt;Observation: A bird’s chirp is a light, melodic sound that often feels crisp and rhythmic, with a sequence of short, high-pitched notes…&lt;/em&gt; ).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let’s define a prompt instructing the LLM to follow the ReAct behavior. Note that we provide tool information as &lt;code&gt;{tool_names_with_descriptions}&lt;/code&gt; in the prompt.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;agent_prompt&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You are designed to help with a variety of multimodal tasks and can perform multiple hops to answer questions.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;## Tools
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You have access to a wide variety of tools. You are responsible for using the tools in any sequence you deem appropriate to complete the task at hand.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Break the task into subtasks and iterate to complete each subtask.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You have access to the following tools:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{tool_names_with_descriptions}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;## Output Format
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;If you need to make a tool call, your responses should follow this structure:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Thought: [your reasoning process, decide whether you need a tool or not]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Tool: [tool name]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Tool Input: [the input to the tool, in a JSON format representing the kwargs (e.g. {{&#34;input&#34;: &#34;hello world&#34;}})]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Observation: [tool response]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Based on the tool response, you need decide whether you need another more information. If so, make another tool call with the same structure.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;If you have enough information to answer the question without using any more tools, you MUST give your answer to the user question with &#34;Final Answer:&#34; and respond in the following format:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Thought: [your reasoning process, decide whether you need a tool or not]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Final Answer: [final answer to the human user&#39;s question after observation]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_template&lt;/span&gt; = {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;system&#34;&lt;/span&gt;:[{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;role&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;system&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;content&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;agent_prompt&lt;/span&gt;}], &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;chat&#34;&lt;/span&gt;:[{&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;role&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;user&#39;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;content&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Question: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{query}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Thought: &#39;&lt;/span&gt;}]}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;bring-it-all-together&#34;&gt;Bring it All Together&lt;/h3&gt;
&lt;p&gt;With our tools and generator ready, we create our multimodal agent using 
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/blob/main/fastrag/agents/base.py#L170&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Agent&lt;/a&gt;. We incorporate 
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/blob/main/fastrag/agents/memory/conversation_memory.py&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ConversationMemory&lt;/a&gt; to save the conversation history between the user and the agent and provide tools with the 
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/blob/main/fastrag/agents/base.py#L78&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ToolsManager&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;fastrag.agents.base&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;ToolsManager&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;fastrag.agents.create_agent&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ConversationMemory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;multimodal_agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;prompt_template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_template&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;tools_manager&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;ToolsManager&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;nutrition_tool&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;memory&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;ConversationMemory&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;testing-out-the-agent&#34;&gt;Testing Out the Agent&lt;/h2&gt;
&lt;p&gt;Our agent is now ready! Let’s start interacting with it. We can begin by asking a question about a food’s nutrition information:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;agent_response&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;multimodal_agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What is the fat content of the protein bar?&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;agent_response&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;transcript&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Thought: I need to find out the fat content of a protein bar.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Tool: nutrition_tool
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Tool Input: {{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;input&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;protein bar&#34;&lt;/span&gt;}}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Observation:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Observation: 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Image: &lt;|image_
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;This image shows: Protein bar with chocolate peanut butter nutrition facts per bar (50g)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Thought:Thought: I have found the fat content of the protein bar.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Final Answer: The fat content of the protein bar is &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;8&lt;/span&gt; grams.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/multimodal-nutrition-agent/protein-bar.jpg alt=&#34;Protein bar nutrition facts&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;The answer is “The fat content of the protein bar is 8 grams.” and it’s correct!&lt;/p&gt;
&lt;p&gt;Now, let’s try a more complex query that requires multi-hop reasoning.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;agent_response&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;multimodal_agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Which one has more protein, protein bar or yogurt?&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;agent_response&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;transcript&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Thought: I need to compare the protein content of a protein bar and yogurt.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Tool: nutrition_tool
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Tool Input: {{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;input&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;protein bar&#34;&lt;/span&gt;}}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Observation:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Observation: I have already used this Tool with this Tool Input. I will use the information I already have to respond.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Thought:Thought: I need to find out the protein content of yogurt.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Tool: nutrition_tool
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Tool Input: {{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;input&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;yogurt&#34;&lt;/span&gt;}}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Observation:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Observation: 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Image: &lt;|image_
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;This image shows: Greek style yoghurt nutrition facts per serving
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Thought:Thought: I have found the protein content of yogurt.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Final Answer: The protein content of yogurt is &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;18&lt;/span&gt; grams per cup.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Comparing the two:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- Protein bar: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;14&lt;/span&gt; grams
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- Yogurt: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;18&lt;/span&gt; grams
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Thought: The yogurt has more protein than the protein bar.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Final Answer: Yogurt has more protein than the protein bar.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/multimodal-nutrition-agent/yogurt.png alt=&#34;Yogurt nutrition facts&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Since the information about the protein bar is already stored in memory, the agent doesn’t need to make an additional tool call for it. Instead, it processes the previously retrieved image to find the protein amount.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this article, we built a powerful multimodal agent capable of retrieving and answering questions about nutrition facts using &lt;strong&gt;fastRAG&lt;/strong&gt;, &lt;strong&gt;Haystack&lt;/strong&gt;, and the Phi3.5 Vision model. By combining multi-hop reasoning and ReAct prompting, this agent effectively handles complex queries, making it an ideal solution for real-time nutrition information retrieval.&lt;/p&gt;
&lt;p&gt;Hopefully, this article gives you an idea of what these types of systems can accomplish by combining both image and textual data to answer multi-faceted questions.&lt;/p&gt;
&lt;p&gt;Check out IntelLabs frameworks for more information and AI solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;fastRAG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://github.com/IntelLabs/RAG-FiT&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;RAG-FIT&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Are you interested in exchanging tips and opinions on agents, LLMs, or other topics in AI with like-minded people? Come 
&lt;a href=&#34;https://discord.gg/Dr63fr9NDS&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;join the Haystack Discord community&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Happy coding! :)&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Design Haystack AI Applications Visually in deepset Studio with NVIDIA NIM</title>
      <link>https://haystack.deepset.ai/blog/deepset-studio-and-nvidia-nims/</link>
      <pubDate>Fri, 01 Nov 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/deepset-studio-and-nvidia-nims/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;In our 
&lt;a href=&#34;https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;previous article&lt;/a&gt;, we explored how to build and deploy two key AI pipelines in a retrieval-augmented generation (RAG) application using Haystack with NVIDIA NIM:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Indexing pipeline: Prepares data by preprocessing, chunking, and embedding PDF files, finally storing them in a vector database.&lt;/li&gt;
&lt;li&gt;RAG pipeline: Designed to answer questions based on the contents of the uploaded PDF files.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this article, we’ll take a step further by showing you how to visually design the architecture of these AI pipelines using 
&lt;a href=&#34;https://haystack.deepset.ai/blog/announcing-studio&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset Studio&lt;/a&gt;, a newly released tool to visually create, deploy and test pipelines. With this tool, you can map out the entire structure of your AI workflows, from data ingestion to retrieval, while seamlessly integrating generative AI models accelerated by NVIDIA NIM microservices available in the 
&lt;a href=&#34;https://build.nvidia.com/explore/retrieval&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA API catalog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;By visually building these pipelines, you’ll not only see the logical flow of your AI applications but also iterate quickly on the design before committing to code. Once finalized, deepset Studio allows you to export your pipeline as Python code or YAML definitions, ready to be deployed.&lt;/p&gt;
&lt;p&gt;By the end of this article, you’ll know how to build AI pipelines with 
&lt;a href=&#34;https://haystack.deepset.ai/blog/announcing-studio&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset Studio&lt;/a&gt; while leveraging retrieval embedding and LLM NIM microservices from the 
&lt;a href=&#34;https://build.nvidia.com/explore/retrieval&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA API catalog&lt;/a&gt;. We’ll also guide you through the process of working with NIM APIs and demonstrate how to configure them for use in 
&lt;a href=&#34;https://haystack.deepset.ai/blog/announcing-studio&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset Studio&lt;/a&gt;, should you opt for an on-prem deployment.&lt;/p&gt;
&lt;h2 id=&#34;deepset-studio-your-development-environment-for-haystack&#34;&gt;deepset Studio: Your Development Environment for Haystack&lt;/h2&gt;
&lt;p&gt;Most AI applications are composed of many moving parts that work together to serve a final use case. Whether that be RAG, preparing and embedding documents, chatting with a database or something else, we are involving different models, prompts, decision steps, preprocessors and more. Furthermore, committing to the final design can also often involve multiple stakeholders.&lt;/p&gt;
&lt;p&gt;A visual layer representing the logical flow of the application helps us reason about the application within a simple interface that’s easy to decipher. It also helps with quick iterations with a diverse set of stakeholders.&lt;/p&gt;
&lt;p&gt;Now we will see how we can visually create these pipelines using deepset Studio (for which you can 
&lt;a href=&#34;https://landing.deepset.ai/deepset-studio-signup&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;sign up here&lt;/a&gt;), the new development environment for Haystack that allows you to design, build, deploy, and run these pipelines within a drag-and-drop UI. For each step that requires a generative AI model, such as an embedding model or LLM generator, we will use NVIDIA-hosted models on the NVIDIA API catalog. Finally, we will show how to self-host generative AI models with NVIDIA NIM.&lt;/p&gt;
&lt;h2 id=&#34;nvidia-nim-microservices&#34;&gt;NVIDIA NIM Microservices&lt;/h2&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/deepset-studio-and-nvidia-nims/nims.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://www.nvidia.com/en-us/ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA NIM, part of the NVIDIA AI Enterprise software platform,&lt;/a&gt; is a suite of containerized microservices built to optimize the inference of AI models. These containers incorporate various components to efficiently serve AI models and expose them through standard APIs. The models are optimized using either 
&lt;a href=&#34;https://developer.nvidia.com/tensorrt&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA TensorRT&lt;/a&gt; or 
&lt;a href=&#34;https://docs.nvidia.com/tensorrt-llm/index.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA TensorRT-LLM&lt;/a&gt; (depending on the model type), utilizing techniques such as quantization, model partitioning, optimized kernels/runtimes, and dynamic or continuous batching. This allows for further performance tuning to maximize efficiency.&lt;/p&gt;
&lt;p&gt;NIM microservices offer simplified integration and production-ready, optimized generative AI deployment, allowing developers to focus on building their applications.&lt;/p&gt;
&lt;p&gt;NIM microservices are available for popular AI foundation models, including LLM community models and 
&lt;a href=&#34;https://build.nvidia.com/explore/retrieval&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retrieval-specific&lt;/a&gt; text embedding and reranking AI models. Developers can easily get started with NVIDIA NIM through the NVIDIA API catalog by generating a free API key, or by downloading and deploying  NIM microservice containers in their own environment.&lt;/p&gt;
&lt;h2 id=&#34;building-a-rag-application-with-deepset-studio&#34;&gt;Building a RAG Application with deepset Studio&lt;/h2&gt;
&lt;p&gt;An often overlooked but essential step in building an effective RAG application is 
&lt;a href=&#34;https://www.deepset.ai/blog/preprocessing-rag&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;data preparation and context embedding&lt;/a&gt; - a process that can be quite time-consuming. Before diving into decisions about which LLM to use or how we augment a prompt, it’s critical to first focus on embedding and storing your documents (especially if you’re not relying on external APIs for data retrieval).&lt;/p&gt;
&lt;p&gt;Most RAG applications revolve around two core pipelines: &lt;strong&gt;Indexing&lt;/strong&gt; and &lt;strong&gt;RAG&lt;/strong&gt;. Fig. 1 shows a high-level overview of the two pipelines using Haystack and NVIDIA NIM. We will be using 
&lt;a href=&#34;https://qdrant.tech/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Qdrant&lt;/a&gt; as the vector database in this article but it can be replaced with any other.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/deepset-studio-and-nvidia-nims/pipelines.png alt=&#34;Fig. 1 - Haystack Indexing and RAG pipelines with NVIDIA NIM microservices.&#34;  /&gt;
        &lt;figcaption&gt;Fig. 1 - Haystack Indexing and RAG pipelines with NVIDIA NIM microservices.&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;In the following sections, we’ll break down each pipeline in detail and guide you through building them using 
&lt;a href=&#34;https://haystack.deepset.ai/blog/announcing-studio&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset Studio&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;indexing-pipeline&#34;&gt;Indexing Pipeline&lt;/h3&gt;
&lt;p&gt;The indexing pipeline is responsible for preparing the data that creates the context for an LLM’s response. This pipeline may include components for data cleaning, chunking, embedding, and ultimately storing the processed data i.e. embeddings in a vector database to create a searchable context.&lt;/p&gt;
&lt;p&gt;NVIDIA NeMo Retriever provides fundamental building blocks for semantic search applications (like RAG), delivering accurate and optimized document indexing and search at scale. You can use the 
&lt;a href=&#34;https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/overview.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NeMo Retriever text embedding NIM microservice&lt;/a&gt; for vectorizing documents and further refine the search with the 
&lt;a href=&#34;https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/overview.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NeMo Retriever text reranking NIM microservice&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This pipeline involves indexing a PDF file into a vector database (here 
&lt;a href=&#34;https://qdrant.tech/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Qdrant&lt;/a&gt; database), which includes preprocessing, cleaning, splitting, and embedding the data. Haystack offers a variety of components to build this type of pipeline. In this example, we’ll be using the following components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;PyPDFtoDocument&lt;/strong&gt;:A PDF converter that transforms a PDF file into a document dataclass, which Haystack can use within the pipeline.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DocumentCleaner&lt;/strong&gt;: A preprocessing document cleaner component that removes unnecessary elements such as empty lines, headers, and footers, enabling cleaner data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DocumentSplitter&lt;/strong&gt;: This component breaks down large documents into smaller, manageable chunks for embedding and storage. You can customize the split by word, sentence, or passage, and define the length of each chunk and the overlap between them. This helps navigate challenges like LLM context limits and ensures that during the retrieval phase, only the most relevant segments are fetched for processing.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;NvidiaDocumentEmbedder&lt;/strong&gt;: This component enables document embedding using a 
&lt;a href=&#34;https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/overview.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NeMo Retriever text embedding NIM microservice&lt;/a&gt;, which can either be hosted via the 
&lt;a href=&#34;https://build.nvidia.com/explore/retrieval&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA API catalog&lt;/a&gt; or self-hosted within your own infrastructure. In this article, we’ll be using the 
&lt;a href=&#34;https://build.nvidia.com/nvidia/nv-embedqa-e5-v5&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;nvidia/nv-embedqa-e5-v5 NIM&lt;/a&gt; microservice for document embedding.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can configure each of these components individually and connect them together in the deepset Studio to build the indexing pipeline. Fig. 2 shows the final visual design of the built indexing pipeline in the deepset Studio.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/deepset-studio-and-nvidia-nims/indexing.png alt=&#34;Fig. 2 - End-to-end visual design of the indexing pipeline in the deepset Studio.&#34;  /&gt;
        &lt;figcaption&gt;Fig. 2 - End-to-end visual design of the indexing pipeline in the deepset Studio.&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h3 id=&#34;rag-pipeline&#34;&gt;RAG Pipeline&lt;/h3&gt;
&lt;p&gt;After you’ve completed the data preparation steps, you can move on to implementing the retrieval augmentation step. It’s crucial to identify which embedding NIM microservice was used to generate the document embeddings, as we typically need the same model for retrieval.&lt;/p&gt;
&lt;p&gt;In most use cases, RAG pipelines consist of three to four components. In this article, we’ll be using the following components for building the RAG pipeline:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Text Embedder&lt;/strong&gt;: This component embeds incoming user queries using the same embedding model employed during the data indexing pipeline. For this, we’ll use the
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/nvidiatextembedder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;NvidiaTextEmbedder&lt;/code&gt; component&lt;/a&gt;, which we configured to leverage an NVIDIA-hosted 
&lt;a href=&#34;https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/overview.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA NeMo Retriever text embedding NIM microservice&lt;/a&gt; from the NVIDIA API Catalog (
&lt;a href=&#34;https://build.nvidia.com/nvidia/nv-embedqa-e5-v5&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;nvidia/nv-embedqa-e5-v5&lt;/a&gt;). To set this up, you will need to provide the model’s 
&lt;a href=&#34;https://build.nvidia.com/snowflake/arctic-embed-l?snippet_tab=Python&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;api_url&lt;/a&gt; and an 
&lt;a href=&#34;https://org.ngc.nvidia.com/setup/personal-keys&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA_API_KEY&lt;/a&gt; as shown in Fig. 3.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/deepset-studio-and-nvidia-nims/text-embedder.png alt=&#34;Fig. 3 - NvidiaTextEmbedder configuration in the deepset Studio.&#34;  /&gt;
        &lt;figcaption&gt;Fig. 3 - NVIDIATextEmbedder configuration in the deepset Studio.&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Retriever&lt;/strong&gt;: In this case, we will use the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/qdrantembeddingretriever&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;QdrantEmbeddingRetriever&lt;/code&gt;&lt;/a&gt;, which takes the query embedding from the previous component and retrieves the most relevant documents from the Qdrant database. Fig. 4 shows the configuration of this component in the deepset Studio.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/deepset-studio-and-nvidia-nims/retriever.png alt=&#34;Fig. 4 - QdrantEmbeddingRetriever configuration in the deepset Studio.&#34;  /&gt;
        &lt;figcaption&gt;Fig. 4 - QdrantEmbeddingRetriever configuration in the deepset Studio.&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Prompt Construction Component&lt;/strong&gt;: This component is responsible for creating the instruction (prompt) that will be sent to a large language model (LLM), representing the ‘augmentation’ step. In Haystack, this is handled by the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/promptbuilder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;PromptBuilder&lt;/code&gt;&lt;/a&gt;. It allows you to create a prompt template using 
&lt;a href=&#34;https://jinja.palletsprojects.com/en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Jinja&lt;/a&gt; and dynamically detects inputs based on the template’s contents. For our use case, we have developed a prompt template shown in Fig. 5 using PromptBuilder in the deepset Studio, which expects query and documents as input.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/deepset-studio-and-nvidia-nims/prompt.png alt=&#34;Fig. 5 - PromtBuilder configuration in the deepset Studio.&#34;  /&gt;
        &lt;figcaption&gt;Fig. 5 - PromtBuilder Configuration in the deepset Studio.&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;LLM Component&lt;/strong&gt;: Finally, we will add an LLM component that receives our final augmented prompt and generates a response. In this example, we will use the 
&lt;a href=&#34;https://build.nvidia.com/meta/llama-3_1-70b-instruct&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;llama-3_1-70b-instruct&lt;/a&gt; NIM microservice from the 
&lt;a href=&#34;https://build.nvidia.com/nim&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA API catalog&lt;/a&gt;. In this case, we use the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/nvidiagenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;NvidiaGenerator&lt;/code&gt;&lt;/a&gt; component from Haystack and configure it to use the &lt;code&gt;meta/llama-3.1-70b-instruct&lt;/code&gt; model. Fig. 6 shows the configuration of this component in the deepset Studio.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/deepset-studio-and-nvidia-nims/generator.png alt=&#34;Fig. 6 - NvidiaGenerator configuration in the deepset Studio.&#34;  /&gt;
        &lt;figcaption&gt;Fig. 6 - NvidiaGenerator configuration in the deepset Studio.&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;As you can see, each of these individual components expects certain inputs, and produces various outputs. You can learn more about the pipeline architecture and how components are connected 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/creating-pipelines&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;. We can now connect them in deepset Studio to create the final RAG pipeline, as shown in Fig. 7.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/deepset-studio-and-nvidia-nims/rag.png alt=&#34;Fig. 7 - Full RAG Pipeline built visually in the deepset Studio.&#34;  /&gt;
        &lt;figcaption&gt;Fig. 7 - Full RAG pipeline built visually in the deepset Studio.&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h3 id=&#34;use-self-hosted-nim-microservices-for-the-pipelines&#34;&gt;Use Self-hosted NIM Microservices for the Pipelines&lt;/h3&gt;
&lt;p&gt;The indexing and RAG Haystack pipelines described above use NIM endpoints from the NVIDIA API catalog. However, you can also self-host and deploy the NIM microservices in your own environment. In this case, you can simply substitute NIM endpoints from the API catalog with self-hosted NIM endpoints. Next, we will see how to do that.&lt;/p&gt;
&lt;p&gt;To begin, you’ll need to deploy the NIM microservices in your environment. This deployment can be done using either Docker or Kubernetes. In our 
&lt;a href=&#34;https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;previous article&lt;/a&gt;, we explained in detail the deployment of NIM microservice containers on Kubernetes clusters. However, for quicker prototyping, we’ll outline here the steps to deploy the NIM containers using Docker.&lt;/p&gt;
&lt;p&gt;The following command will deploy the &lt;code&gt;meta/llama-3.1-70b-instruct&lt;/code&gt; LLM NIM microservice on a server equipped with a supported GPU. You can refer to the 
&lt;a href=&#34;https://docs.nvidia.com/nim/large-language-models/latest/support-matrix.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;support matrix&lt;/a&gt; for compatibility with different GPU models.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;export NGC_API_KEY=&lt;PASTE_API_KEY_HERE&gt;
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p &#34;$LOCAL_NIM_CACHE&#34;
docker run -it --rm \
--gpus all \
--shm-size=16GB \
-e NGC_API_KEY \
-v &#34;$LOCAL_NIM_CACHE:/opt/nim/.cache&#34; \
-u $(id -u) \
-p 8000:8000 \
nvcr.io/nim/meta/llama-3.1-70b-instruct:1.1.2
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This command will pull the NIM container, which first checks the underlying GPU. Based on this detection, it will download the 
&lt;a href=&#34;https://docs.nvidia.com/nim/large-language-models/latest/profiles.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;optimized model&lt;/a&gt; from the 
&lt;a href=&#34;https://catalog.ngc.nvidia.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA NGC catalog&lt;/a&gt; that is suitable for the identified GPU infrastructure.&lt;/p&gt;
&lt;p&gt;You can follow the same deployment procedure for the embedding models with the 
&lt;a href=&#34;https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/overview.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA NeMo Retriever text embedding NIM microservice&lt;/a&gt;, with instructions available 
&lt;a href=&#34;https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/getting-started.html#launching-the-nim&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Once the NIMs are deployed, whether through Docker or Kubernetes, simply update the api_url in each of the embedding and LLM components in the pipelines to point to your self-hosted NIM API endpoint URL (example: 
&lt;a href=&#34;http://your_server_ip_address:8000/v1/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;http://your_server_ip_address:8000/v1/&lt;/a&gt;), as illustrated in Fig. 8 for the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/nvidiagenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NvidiaGenerator&lt;/a&gt; i.e. LLM component.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/deepset-studio-and-nvidia-nims/generator-2.png alt=&#34;Fig. 8 - NvidiaGenerator configuration for self-hosted NIM in the deepset Studio.&#34;  /&gt;
        &lt;figcaption&gt;Fig. 8 - NvidiaGenerator configuration for self-hosted NIM microservices in the deepset Studio.&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h3 id=&#34;export-pipelines-and-deploy-rag-application&#34;&gt;Export Pipelines and Deploy RAG Application&lt;/h3&gt;
&lt;p&gt;After building the final layout of our application in deepset Studio, we can export the pipelines in one of the two formats for deployment by clicking the export button:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;YAML: In our 
&lt;a href=&#34;https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide#indexing-files-and-deploying-the-haystack-rag-pipeline&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;previous article&lt;/a&gt;, we created similar indexing and RAG pipelines, demonstrating how to serialize them to YAML and use 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/hayhooks&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hayhooks&lt;/a&gt; for deployment. You may choose this option if you want to follow the same approach.&lt;/li&gt;
&lt;li&gt;Python: Alternatively, you can export the pipeline as a standard Python script for local execution.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Fig. 9 shows an example of exporting the RAG pipeline in the deepset Studio in Python format.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/deepset-studio-and-nvidia-nims/export.png alt=&#34;Fig. 9 - Example showing export of RAG pipeline in the deepset Studio in Python format.&#34;  /&gt;
        &lt;figcaption&gt;Fig. 9 - Example showing export of RAG pipeline in the deepset Studio in Python format.&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Following Python code snippet shows the exported code of the RAG pipeline from the deepset Studio.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt;  &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.embedders.nvidia.text_embedder&lt;/span&gt;  &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaTextEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.document_stores.qdrant.document_store&lt;/span&gt;  &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;QdrantDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.retrievers.qdrant.retriever&lt;/span&gt;  &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;QdrantEmbeddingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders.prompt_builder&lt;/span&gt;  &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.generators.nvidia.generator&lt;/span&gt;  &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;nvidiatextembedder&lt;/span&gt;  =  &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaTextEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;snowflake/arctic-embed-l&#34;&lt;/span&gt;,  &lt;span style=&#34;color:#268bd2&#34;&gt;api_url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://ai.api.nvidia.com/v1/retrieval/snowflake/arctic-embed-l&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;  =  &lt;span style=&#34;color:#268bd2&#34;&gt;QdrantDocumentStore&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_dim&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1024&lt;/span&gt;,  &lt;span style=&#34;color:#268bd2&#34;&gt;host&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;qdrant&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;qdrantembeddingretriever&lt;/span&gt;  =  &lt;span style=&#34;color:#268bd2&#34;&gt;QdrantEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;10&lt;/span&gt;,  &lt;span style=&#34;color:#268bd2&#34;&gt;return_embedding&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,  &lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;promptbuilder&lt;/span&gt;  =  &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Answer the question given the context.&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Question: {{ query }}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Context:&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or document in documents %}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; {{ document.content }}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Answer:&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;nvidiagenerator&lt;/span&gt;  =  &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;meta/llama-3.1-70b-instruct&#34;&lt;/span&gt;,  &lt;span style=&#34;color:#268bd2&#34;&gt;api_url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://integrate.api.nvidia.com/v1&#34;&lt;/span&gt;,  &lt;span style=&#34;color:#268bd2&#34;&gt;model_arguments&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;max_tokens&#34;&lt;/span&gt;:  &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1024&lt;/span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;  =  &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;nvidiatextembedder&#34;&lt;/span&gt;,  &lt;span style=&#34;color:#268bd2&#34;&gt;nvidiatextembedder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;qdrantembeddingretriever&#34;&lt;/span&gt;,  &lt;span style=&#34;color:#268bd2&#34;&gt;qdrantembeddingretriever&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;promptbuilder&#34;&lt;/span&gt;,  &lt;span style=&#34;color:#268bd2&#34;&gt;promptbuilder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;nvidiagenerator&#34;&lt;/span&gt;,  &lt;span style=&#34;color:#268bd2&#34;&gt;nvidiagenerator&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;nvidiatextembedder.embedding&#34;&lt;/span&gt;,  &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;qdrantembeddingretriever.query_embedding&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;qdrantembeddingretriever.documents&#34;&lt;/span&gt;,  &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;promptbuilder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;promptbuilder.prompt&#34;&lt;/span&gt;,  &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;nvidiagenerator.prompt&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now that we have our runnable pipeline, we can choose one of the following options to deploy the RAG application:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Use 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/hayhooks&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hayhooks&lt;/a&gt; to self-host the pipeline in a containerized environment. Deployment instructions via docker can be found in our 
&lt;a href=&#34;https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide#indexing-files-and-deploying-the-haystack-rag-pipeline&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;previous article&lt;/a&gt;. Or if you want to deploy it via Kubernetes, instructions can be found 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/kubernetes&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Deploy from deepset Studio (limitations apply). 
&lt;a href=&#34;https://landing.deepset.ai/deepset-studio-signup&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Sign up today&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;If you’re a deepset Cloud customer: Deploy directly from deepset Studio to 
&lt;a href=&#34;https://www.deepset.ai/deepset-cloud&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset Cloud&lt;/a&gt;, a fully managed AI workbench. This option enables you to collaborate with your team on the application design, evaluate your application, and manage its deployment effectively.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary&lt;/h2&gt;
&lt;p&gt;Designing the final layout of an AI application requires careful integration of multiple components and, importantly, several iterations. In this article, we saw how to visually design the architecture of AI pipelines using deepset Studio. This tool allows you to map out the entire structure of your AI workflows - from data ingestion to retrieval - while seamlessly integrating NVIDIA-hosted model API endpoints from the NVIDIA API catalog or from self-hosted NIM microservices. With deepset Studio, you can visualize the architecture of your application without making compromises about critical technical decisions, including your hosting options for language models. This tool allows you to reason about complex pipeline architectures within a visual layer, before committing to code. Once your design is complete, deepset Studio enables you to deploy and test your pipeline or export it as Python code or YAML definitions, making it ready for deployment.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Advanced RAG: Query Decomposition &amp; Reasoning</title>
      <link>https://haystack.deepset.ai/blog/query-decomposition/</link>
      <pubDate>Mon, 30 Sep 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/query-decomposition/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;blockquote&gt;
&lt;p&gt;This is part one of the &lt;strong&gt;Advanced Use Cases&lt;/strong&gt; series:&lt;/p&gt;
&lt;p&gt;1️⃣ 
&lt;a href=&#34;https://haystack.deepset.ai/blog/extracting-metadata-filter&#34;&gt;Extract Metadata from Queries to Improve Retrieval&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;2️⃣ 
&lt;a href=&#34;https://haystack.deepset.ai/blog/query-expansion&#34;&gt;Query Expansion&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;3️⃣ &lt;strong&gt;Query Decomposition&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;4️⃣ 
&lt;a href=&#34;https://haystack.deepset.ai/cookbook/metadata_enrichment&#34;&gt;Automated Metadata Enrichment&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Sometimes a single question is multiple questions in disguise. For example: “Did Microsoft or Google make more money last year?”. To get to the correct answer for this seemingly simple question, we actually have to break it down: “How much money did Google make last year?” and “How much money did Microsoft make last year?”. Only if we know the answer to these 2 questions can we reason about the final answer.&lt;/p&gt;
&lt;p&gt;This is where query decomposition comes in. This is a technique for retrieval augmented generation (RAG) based AI applications that follows a simple approach:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Decompose the original question into smaller questions that can be answered independently to each other. Let’s call these ‘sub questions’ here on out.&lt;/li&gt;
&lt;li&gt;Reason about the final answer to the original question, based on each sub-answer.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;While for many query/dataset combinations, this may not be required, for some, it very well may be. At the end of the day, often one query results in one retrieval step. If within that one single retrieval step we are unable to have the retriever return &lt;em&gt;both&lt;/em&gt; the money Microsoft made last year &lt;em&gt;and&lt;/em&gt; Google, then the system will struggle to produce an accurate final response.&lt;/p&gt;
&lt;p&gt;This method ensures that we are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;retrieving the relevant context for each sub question.&lt;/li&gt;
&lt;li&gt;reasoning about the final answer given each answer based on the contexts retrieved for each sub question.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In this article, I’ll be going through some key steps that allow you to achieve this. You can find the full working example and code in the linked recipe from our 
&lt;a href=&#34;https://haystack.deepset.ai/cookbook&#34;&gt;cookbook&lt;/a&gt;. Here, I’ll only show the most relevant parts of the code.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;🚀 I’m sneaking something extra into this article. I saw the opportunity to try out the structured output functionality (currently in beta) by OpenAI to create this example. For this step, I extended the OpenAIGenerator in Haystack to be able to work with Pydantic schemas. More on this in the next step.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Let’s try build a full pipeline that makes use of query decomposition and reasoning. We’ll use a dataset about Game of Thrones (a classic for Haystack) which you can find preprocessed and chunked on &lt;code&gt;Tuana/game-of-thrones&lt;/code&gt; on Hugging Face Datasets.&lt;/p&gt;
&lt;h2 id=&#34;defining-our-questions-structure&#34;&gt;Defining our Questions Structure&lt;/h2&gt;
&lt;p&gt;Our first step is to create a structure within which we can contain the subquestions, and each of their answers. This will be used by our OpenAIGenerator to produce a structured output.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;pydantic&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;BaseModel&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;Question&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;BaseModel&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Optional&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;] = &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;Questions&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;BaseModel&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;questions&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;Question&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The structure is simple, we have &lt;code&gt;Questions&lt;/code&gt; made up of a list of &lt;code&gt;Question&lt;/code&gt;. Each &lt;code&gt;Question&lt;/code&gt; has the &lt;code&gt;question&lt;/code&gt; string as well as an optional &lt;code&gt;answer&lt;/code&gt; to that question.&lt;/p&gt;
&lt;h3 id=&#34;defining-the-prompt-for-query-decomposition&#34;&gt;Defining the Prompt for Query Decomposition&lt;/h3&gt;
&lt;p&gt;Next up, we need to get an LLM to decompose a question and produce multiple questions. Here, we will start making use of our &lt;code&gt;Questions&lt;/code&gt; schema.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;splitter_prompt&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You are a helpful assistant that prepares queries that will be sent to a search component.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Sometimes, these queries are very complex.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Your job is to simplify complex queries into multiple queries that can be answered
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;in isolation to eachother.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;If the query is simple, then keep it as it is.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Examples
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;1. Query: Did Microsoft or Google make more money last year?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;   Decomposed Questions: [Question(question=&#39;How much profit did Microsoft make last year?&#39;, answer=None), Question(question=&#39;How much profit did Google make last year?&#39;, answer=None)]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;2. Query: What is the capital of France?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;   Decomposed Questions: [Question(question=&#39;What is the capital of France?&#39;, answer=None)]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;3. Query: {{question}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;   Decomposed Questions:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;splitter_prompt&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-4o-mini&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;generation_kwargs&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;response_format&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Questions&lt;/span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;answering-each-sub-question&#34;&gt;Answering Each Sub Question&lt;/h2&gt;
&lt;p&gt;First, let’s build a pipeline that uses the &lt;code&gt;splitter_prompt&lt;/code&gt; to decompose our question:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;splitter_prompt&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-4o-mini&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;generation_kwargs&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;response_format&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Questions&lt;/span&gt;}))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Who has more siblings, Jamie or Sansa?&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;:{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;structured_reply&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This produces the following &lt;code&gt;Questions&lt;/code&gt; (&lt;code&gt;List[Question]&lt;/code&gt;)&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;questions=[Question(question=&#39;How many siblings does Jamie have?&#39;, answer=None), 
           Question(question=&#39;How many siblings does Sansa have?&#39;, answer=None)]
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now, we have to fill in the &lt;code&gt;answer&lt;/code&gt; fields. For this step, we need to have a separate prompt and two custom components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;CohereMultiTextEmbedder&lt;/code&gt; which can take multiple questions rather than a single one like the &lt;code&gt;CohereTextEmbedder&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;MultiQueryInMemoryEmbeddingRetriever&lt;/code&gt; which can again, take multiple questions and their embeddings, returning &lt;code&gt;question_context_pairs&lt;/code&gt;. Each pair contains the &lt;code&gt;question&lt;/code&gt; and &lt;code&gt;documents&lt;/code&gt; that are relevant to that question.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Next, we need to construct a prompt that can instruct a model to answer each subquestion:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;multi_query_template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You are a helpful assistant that can answer complex queries.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Here is the original question you were asked: {{question}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;And you have split the task into the following questions:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or pair in question_context_pairs %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  {{pair.question}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Here are the question and context pairs for each question.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;For each question, generate the question answer pair as a structured output
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or pair in question_context_pairs %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  Question: {{pair.question}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  Context: {{pair.documents}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Answers:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;multi_query_prompt&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;multi_query_template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s build a pipeline that can answer each individual sub question. We will call this the &lt;code&gt;query_decomposition_pipeline&lt;/code&gt; :&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;splitter_prompt&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-4o-mini&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;generation_kwargs&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;response_format&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Questions&lt;/span&gt;}))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;CohereMultiTextEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embed-multilingual-v3.0&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;multi_query_retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;MultiQueryInMemoryEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;multi_query_prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;multi_query_template&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query_resolver_llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-4o-mini&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;generation_kwargs&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;response_format&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Questions&lt;/span&gt;}))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm.structured_reply&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder.questions&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder.embeddings&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;multi_query_retriever.query_embeddings&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm.structured_reply&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;multi_query_retriever.queries&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm.structured_reply&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder.questions&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;multi_query_retriever.question_context_pairs&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;multi_query_prompt.question_context_pairs&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;multi_query_prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query_resolver_llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Running this pipeline with the original question “Who has more siblings, Jamie or Sansa?”, results in the following structured output:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Who has more siblings, Jamie or Sansa?&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;:{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                           &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;multi_query_prompt&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query_resolver_llm&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;structured_reply&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;questions=[Question(question=&#39;How many siblings does Jamie have?&#39;, answer=&#39;2 (Cersei Lannister, Tyrion Lannister)&#39;),
           Question(question=&#39;How many siblings does Sansa have?&#39;, answer=&#39;5 (Robb Stark, Arya Stark, Bran Stark, Rickon Stark, Jon Snow)&#39;)]
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;reasoning-about-the-final-answer&#34;&gt;Reasoning About the Final Answer&lt;/h2&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/query-decomposition/pipeline.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;The final step we have to take is to reason about the ultimate answer to the original question. Again, we create a prompt that will instruct an LLM to do this. Given we have the &lt;code&gt;questions&lt;/code&gt; output that contains each sub &lt;code&gt;question&lt;/code&gt; and &lt;code&gt;answer&lt;/code&gt;, we will make these inputs to this final prompt.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;reasoning_template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You are a helpful assistant that can answer complex queries.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Here is the original question you were asked: {{question}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You have split this question up into simpler questions that can be answered in
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;isolation.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Here are the questions and answers that you&#39;ve generated
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or pair in question_answer_pair %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  {{pair}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Reason about the final answer to the original query based on these questions and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;aswers
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Final Answer:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;resoning_prompt&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;reasoning_template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To be able to augment this prompt with the question answer pairs, we will have to extend our previous pipeline and connect the &lt;code&gt;structured_reply&lt;/code&gt; from the previous LLM, to the &lt;code&gt;question_answer_pair&lt;/code&gt; input of this prompt.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reasoning_prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;reasoning_template&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reasoning_llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-4o-mini&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query_resolver_llm.structured_reply&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reasoning_prompt.question_answer_pair&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reasoning_prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reasoning_llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, let’s run this final pipeline and see what results we get:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Who has more siblings, Jamie or Sansa?&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;query_decomposition_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;:{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                           &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;multi_query_prompt&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                           &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reasoning_prompt&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                           &lt;span style=&#34;color:#268bd2&#34;&gt;include_outputs_from&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query_resolver_llm&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;The original query was split and resolved:&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;pair&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query_resolver_llm&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;structured_reply&#34;&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;questions&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;pair&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;So the original query is answered as follows:&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reasoning_llm&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;🥁 Drum roll please:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;The original query was split and resolved:

question=&#39;How many siblings does Jaime have?&#39; answer=&#39;Jaime has one sister (Cersei) and one younger brother (Tyrion), making a total of 2 siblings.&#39;
question=&#39;How many siblings does Sansa have?&#39; answer=&#39;Sansa has five siblings: one older brother (Robb), one younger sister (Arya), and two younger brothers (Bran and Rickon), as well as one older illegitimate half-brother (Jon Snow).&#39;

So the original query is answered as follows:

To determine who has more siblings between Jaime and Sansa, we need to compare the number of siblings each has based on the provided answers.

From the answers:
- Jaime has 2 siblings (Cersei and Tyrion).
- Sansa has 5 siblings (Robb, Arya, Bran, Rickon, and Jon Snow).

Since Sansa has 5 siblings and Jaime has 2 siblings, we can conclude that Sansa has more siblings than Jaime.

Final Answer: Sansa has more siblings than Jaime.
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;wrapping-up&#34;&gt;Wrapping up&lt;/h2&gt;
&lt;p&gt;Given the right instructions, LLMs are good at breaking down tasks. Query decomposition is a great way we can make sure we do that for questions that are multiple questions in disguise.&lt;/p&gt;
&lt;p&gt;In this article, you learned how to implement this technique with a twist 🙂 Let us know what you think about using structured outputs for these sorts of use cases. And check out the 
&lt;a href=&#34;https://github.com/deepset-ai/haystack-experimental?tab=readme-ov-file#experiments-catalog&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack experimental repo&lt;/a&gt; to see what new features we’re working on.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Improving Retrieval with Auto-Merging</title>
      <link>https://haystack.deepset.ai/blog/improve-retrieval-with-auto-merging/</link>
      <pubDate>Thu, 12 Sep 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/improve-retrieval-with-auto-merging/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;For most RAG applications, where we first have to retrieve the most relevant context, we end up having to split up documents first, and index those smaller splits of documents. Reasons for this range from needing to retrieve only &lt;em&gt;relevant&lt;/em&gt; sections of larger bits of documents to the simple fact that (although they’re improving massively) LLMs simply don’t have infinite context lengths.&lt;/p&gt;
&lt;p&gt;Auto-Merging is a retrieval technique that leverages a hierarchical document structure. When a document is too long, it is split into smaller documents or chunks, where we can think of the smaller documents as the children of the original document and the original document as the parent. This results in a hierarchical tree structure where each smaller document is a child of a previous larger document. The leaves of the tree are the documents which don’t have any children, and the root is the original document.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/improve-retrieval-with-auto-merging/hierarchy.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Auto-merging retrieval is a technique we can use if the parent document is likely to contain more of the relevant context about the information the user is after, in comparison to a subset of it’s child documents. When a query is made, the the retriever will normally return the top_k number of document chunks that are relevant to the query. However, if the number of retrieved document chunks that belong to the same parent document is above a certain threshold, the retriever would return the parent document instead of the individual chunks.&lt;/p&gt;
&lt;h2 id=&#34;haystack-components&#34;&gt;Haystack Components&lt;/h2&gt;
&lt;p&gt;Haystack implements the Auto-Merging Retrieval with two components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/reference/hierarchical-document-splitter&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;HierarchicalDocumentSplitter&lt;/code&gt;&lt;/a&gt;: splits a Document into multiple Document objects of different block sizes, building a hierarchical tree structure where each smaller block is a child of a previous larger block. The &lt;code&gt;init&lt;/code&gt; method expects three parameters:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;block_sizes&lt;/code&gt;: Set of block sizes to split the document into. The blocks are split in descending order. So, &lt;code&gt;block_sizes&lt;/code&gt; of {20, 5} would mean that each ‘parent’ split would be of length max 20, and and each of its children would be of length max 5.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;split_overlap&lt;/code&gt;: The number of overlapping units for each split.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;split_by&lt;/code&gt;: The unit for splitting your documents.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/reference/auto-merge-retriever&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;AutoMergingRetriever&lt;/code&gt;&lt;/a&gt;: a retriever that leverages the hierarchical tree structure of documents, where the leaf nodes are indexed in a document store. During retrieval, if the number of matched leaf documents below the same parent is higher than a defined threshold, the retriever will return the parent document instead of the individual leaf documents. The &lt;code&gt;init&lt;/code&gt; method expects three parameters:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;document_store&lt;/code&gt;: DocumentStore from which to retrieve the parent documents&lt;/li&gt;
&lt;li&gt;&lt;code&gt;threshold&lt;/code&gt;: Threshold to decide whether the parent instead of the individual documents is returned&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;introductory-example&#34;&gt;Introductory Example&lt;/h2&gt;
&lt;p&gt;Let’s see a simple example of how the &lt;code&gt;AutoMergingRetriever&lt;/code&gt; works. In this example we will use a single document. We use the &lt;code&gt;HierarchicalDocumentSplitter&lt;/code&gt; to split the document into chunks, represented by smaller documents, and capturing the hierarchical structure of the document.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.preprocessors&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;HierarchicalDocumentSplitter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;The monarch of the wild blue yonder rises from the eastern side of the horizon.&#34;&lt;/span&gt;)]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;splitter&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;HierarchicalDocumentSplitter&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;block_sizes&lt;/span&gt;={&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;10&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;}, &lt;span style=&#34;color:#268bd2&#34;&gt;split_overlap&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;split_by&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;word&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;splitter&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We start by creating a document, and then we split it into smaller documents using the &lt;code&gt;HierarchicalDocumentSplitter&lt;/code&gt;. We need to specify the block sizes that we want to split the document into. In this case, we are splitting the document into 10 and 3-word blocks - this means that the splitter will only have 2 levels, the first with a maximum of 10 words and the second a maximum of 3 words. There are no overlaps among the documents, and we also specify that we want to split the document by words. This results in 9 documents being created from the original document. The documents are split as follows:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;`The monarch of the wild blue yonder rises from the eastern side of the horizon.` -- (root)
|
|
|
|--- `The monarch of the wild blue yonder rises from the`
|               |
|               |
|               |--- `The monarch of` -- (leaf)
|               |
|               |--- `the wild blue` -- (leaf)
|               |
|               |--- `yonder rises from` -- (leaf)
|               |
|               |--- `the` -- (leaf)
|
|
|--- `eastern side of the horizon.` -- (leaf)
|               |
|               |
|               |--- `eastern side of` -- (leaf)
|               |
|               |--- `the horizon.` -- (leaf)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Note that the original document is always the root of the tree. We then have two levels of children, the first with a maximum block size of 10 words, and the second with a maximum block size of 3 words.&lt;/p&gt;
&lt;p&gt;We now need to split this documents into two distinct document stores. During initialization the &lt;code&gt;AutoMergingRetriever&lt;/code&gt; requires the document store where the parent documents are indexed. At run time it receives leaf documents that matched a user query, it returns the parent document if the number of matched leaf documents below the same parent is higher than a defined threshold, otherwise it returns the original retrieved leaf documents.&lt;/p&gt;
&lt;p&gt;Let’s see it in practice. We index the parent documents, by selecting the ones with a &lt;code&gt;__level&lt;/code&gt; of 1.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;parent_docs_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;parent_docs&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;documents&#34;&lt;/span&gt;] &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;__level&#34;&lt;/span&gt;]==&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;parent_docs_store&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;write_documents&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;parent_docs&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s now initialize the &lt;code&gt;AutoMergingRetriever&lt;/code&gt; with parent document store and a parent threshold of 0.5, meaning that if at least 50% of the leaf documents below the same parent match the query, the retriever will return the parent instead of the leaf documents which matched the user query. If we query the document store with a single leaf document, the retriever will return the same leaf document.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;AutoMergingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;AutoMergingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;parent_docs_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;threshold&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;matched_leaf_documents&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;documents&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;4&lt;/span&gt;]])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If we now we query the document store with two leaf documents, the retriever will return the parent document instead of the individual leaf documents, as the threshold of 0.5 is met.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;matched_leaf_documents&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;documents&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;4&lt;/span&gt;], &lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;documents&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;]]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;matched_leaf_documents&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;matched_leaf_documents&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This was a simple introductory example to show how &lt;code&gt;AutoMergingRetriever&lt;/code&gt; works and retrieves parent documents instead of individual leaf documents. Next we will see a full example over news articles dataset.&lt;/p&gt;
&lt;h2 id=&#34;advanced-example&#34;&gt;Advanced Example&lt;/h2&gt;
&lt;p&gt;We will use the BBC news dataset to show how the &lt;code&gt;AutoMergingRetriever&lt;/code&gt; works with a dataset containing multiple news articles. This dataset consists of 2.225 documents from the 
&lt;a href=&#34;http://news.bbc.co.uk/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BBC&lt;/a&gt; corresponding to stories in five topical areas collected between 2004-2005, and was part of work by D. Greene and P. Cunningham. 
&lt;a href=&#34;http://mlg.ucd.ie/files/publications/greene06icml.pdf&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;“Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering”, Proc. ICML 2006&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;reading-the-dataset&#34;&gt;Reading the dataset&lt;/h3&gt;
&lt;p&gt;The original dataset is available at 
&lt;a href=&#34;http://mlg.ucd.ie/datasets/bbc.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;http://mlg.ucd.ie/datasets/bbc.html&lt;/a&gt;, but we are going to use a version that was already preprocessed and stored in a single CSV file available at the following URL:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://raw.githubusercontent.com/amankharwal/Website-data/master/bbc-news-data.csv&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://raw.githubusercontent.com/amankharwal/Website-data/master/bbc-news-data.csv&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;typing&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;csv&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;read_documents&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;file&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;open&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;file&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;r&#34;&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;file&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;reader&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;csv&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;reader&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;file&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;delimiter&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\\&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;t&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;next&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;reader&lt;/span&gt;, &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;)  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# skip the headers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt; = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;row&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;reader&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;category&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;row&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;strip&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;title&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;row&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;strip&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;row&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;strip&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;append&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;category&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;category&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;title&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;title&lt;/span&gt;}))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;docs = read_documents(&#34;bbc-news-data.csv&#34;)
len(docs)
&gt;&gt; 2225
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id=&#34;indexing-the-documents&#34;&gt;Indexing the documents&lt;/h3&gt;
&lt;p&gt;After reading the converting the news articles  into Haystack Document objects, let’s now let’s index them. We will use as document store the&lt;code&gt;InMemoryDocumentStore&lt;/code&gt; for the sake of simplicity. We first apply the &lt;code&gt;HierarchicalDocumentSplitter&lt;/code&gt; to the list of Documents, creating a hierarchical structure&lt;/p&gt;
&lt;p&gt;We will create two document stores, one for the parent documents, and one for the leaf documents. We will later say that there will be an intermediate retriever to match user query with the indexed leaf documents, this intermediate retriever will then be connected to an &lt;code&gt;AutoMergingRetriever&lt;/code&gt; which decides for when to return the parent instead of the matched leaf documents.&lt;/p&gt;
&lt;p&gt;The function below receives the news articles as Documents and filters them by the meta field &lt;code&gt;__level&lt;/code&gt; to differentiate between children and parent Documents, indexing them in their respective document stores, which are then both returned by the function.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;typing&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Tuple&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.types&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DuplicatePolicy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.preprocessors&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;HierarchicalDocumentSplitter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;indexing&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;]) -&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Tuple&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;splitter&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;HierarchicalDocumentSplitter&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;block_sizes&lt;/span&gt;={&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;10&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;}, &lt;span style=&#34;color:#268bd2&#34;&gt;split_overlap&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;split_by&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sentence&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;splitter&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# store the leaf documents in one document store&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;leaf_documents&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;documents&#34;&lt;/span&gt;] &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;__level&#34;&lt;/span&gt;] == &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;leaf_doc_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;leaf_doc_store&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;write_documents&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;leaf_documents&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;policy&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;DuplicatePolicy&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;SKIP&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# store the parent documents in another document store&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;parent_documents&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;documents&#34;&lt;/span&gt;] &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;__level&#34;&lt;/span&gt;] == &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;parent_doc_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;parent_doc_store&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;write_documents&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;parent_documents&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;policy&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;DuplicatePolicy&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;SKIP&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;leaf_doc_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;parent_doc_store&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;querying-the-documents&#34;&gt;Querying the documents&lt;/h3&gt;
&lt;p&gt;Now that we have our document stores let’s construct a querying pipeline, consisting of a &lt;code&gt;BM25Retriever&lt;/code&gt; associated with the document store containing the leaf documents, and an &lt;code&gt;AutoMergingRetriever&lt;/code&gt; associated with the parent documents and with a threshold of 0.6, meaning that if at least 60% of the matched leaf documents belong to the same parent, their parent is returned instead of each individual Document.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryBM25Retriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;AutoMergingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;querying_pipeline&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;leaf_doc_store&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;parent_doc_store&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;threshold&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;float&lt;/span&gt; = &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.6&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;bm25_retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryBM25Retriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;leaf_doc_store&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;auto_merge_retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;AutoMergingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;parent_doc_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;threshold&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;threshold&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;bm25_retriever&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;BM25Retriever&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;auto_merge_retriever&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;AutoMergingRetriever&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;BM25Retriever.documents&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;AutoMergingRetriever.matched_leaf_documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;putting-it-all-together&#34;&gt;Putting it all together&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;read_documents&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;bbc-news-data.csv&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;leaf_doc_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;parent_doc_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;indexing&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;querying_pipeline&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;leaf_doc_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;parent_doc_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;threshold&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So, now can run each function individually and have a querying pipeline that uses the &lt;code&gt;AutoMergingRetriever&lt;/code&gt;. We can then use the pipeline to query the document store for articles related to cybersecurity, and let’s also make use of the pipeline parameter &lt;code&gt;include_outputs_from&lt;/code&gt; to also get the outputs from the &lt;code&gt;BM25Retriever&lt;/code&gt; component.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;query&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;phishing attacks spoof websites spam e-mails spyware&#39;&lt;/span&gt;},  &lt;span style=&#34;color:#268bd2&#34;&gt;include_outputs_from&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;BM25Retriever&#39;&lt;/span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;result&lt;/code&gt; will have two keys, one for each retriever component: &lt;code&gt;AutoMergingRetriever&lt;/code&gt;, &lt;code&gt;BM25Retriever&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Let’s see how many documents were retrieved by each component.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;In&lt;/span&gt; [&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;17&lt;/span&gt;]: &lt;span style=&#34;color:#cb4b16&#34;&gt;len&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;AutoMergingRetriever&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;documents&#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Out&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;17&lt;/span&gt;]: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;In&lt;/span&gt; [&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;18&lt;/span&gt;]: &lt;span style=&#34;color:#cb4b16&#34;&gt;len&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;BM25Retriever&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;documents&#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Out&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;18&lt;/span&gt;]: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As we can see, the &lt;code&gt;AutoMergingRetriever&lt;/code&gt; retrieved 7 documents, while the &lt;code&gt;BM25Retriever&lt;/code&gt; retrieved 10 documents. This is because the &lt;code&gt;AutoMergingRetriever&lt;/code&gt; returned parent documents instead of individual leaf documents. Let’s compare the titles of the documents retrieved by the &lt;code&gt;BM25Retriever&lt;/code&gt; and the &lt;code&gt;AutoMergingRetriever&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;doc_titles&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;sorted&lt;/span&gt;([&lt;span style=&#34;color:#268bd2&#34;&gt;d&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;title&#39;&lt;/span&gt;] &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;d&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;BM25Retriever&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;documents&#39;&lt;/span&gt;]])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;In&lt;/span&gt; [&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;14&lt;/span&gt;]: &lt;span style=&#34;color:#268bd2&#34;&gt;doc_titles&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Out&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;14&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Bad e-mail habits sustains spam&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Bad e-mail habits sustains spam&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Cyber crime booms in 2004&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Cyber criminals step up the pace&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Cyber criminals step up the pace&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Junk e-mails on relentless rise&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;More women turn to net security&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Security scares spark browser fix&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Spam e-mails tempt net shoppers&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Spam e-mails tempt net shoppers&#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;In&lt;/span&gt; [&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;15&lt;/span&gt;]: &lt;span style=&#34;color:#268bd2&#34;&gt;doc_titles&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;sorted&lt;/span&gt;([&lt;span style=&#34;color:#268bd2&#34;&gt;d&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;title&#39;&lt;/span&gt;] &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;d&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;AutoMergingRetriever&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;documents&#39;&lt;/span&gt;]])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;In&lt;/span&gt; [&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;16&lt;/span&gt;]: &lt;span style=&#34;color:#268bd2&#34;&gt;doc_titles&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Out&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;16&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Bad e-mail habits sustains spam&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Cyber crime booms in 2004&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Cyber criminals step up the pace&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Junk e-mails on relentless rise&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;More women turn to net security&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Security scares spark browser fix&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Spam e-mails tempt net shoppers&#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Instead of returning individual leaf documents, the &lt;code&gt;AutoMergingRetriever&lt;/code&gt; returned parent document for the articles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;“Bad e-mail habits sustains spam”,&lt;/li&gt;
&lt;li&gt;“Cyber criminals step up the pace”,&lt;/li&gt;
&lt;li&gt;“Spam e-mails tempt net shoppers”;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;since at least 60% of the leaf documents of each of those documents matched the query.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this tutorial we saw how the &lt;code&gt;AutoMergingRetriever&lt;/code&gt; works. One important aspect of the &lt;code&gt;AutoMergingRetriever&lt;/code&gt; implementation in Haystack is that it requires the documents to be split using the &lt;code&gt;HierarchicalDocumentSplitter&lt;/code&gt;. Another aspect to notice as we saw, is that the &lt;code&gt;AutoMergingRetriever&lt;/code&gt; should be used in conjunction with other base &lt;code&gt;Retrievers&lt;/code&gt; allowing for a more flexible retrieval system.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Announcing Our New Short Course with DeepLearning.AI</title>
      <link>https://haystack.deepset.ai/blog/announcing-dlai/</link>
      <pubDate>Wed, 21 Aug 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/announcing-dlai/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;
&lt;a href=&#34;http://deeplearning.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;DeepLearning.AI&lt;/a&gt; and Andrew Ng hold a special place in many AI/ML engineers’ careers and development. For many engineers, they’ve played a pivotal role into breaking into AI, or extending their knowledge and capabilities by teaching about core concepts and technologies at the right time. So, we’re incredibly happy to announce that we’re launching our first short course with them - “Building AI Applications with Haystack”&lt;/p&gt;
&lt;iframe width=&#34;560&#34; height=&#34;315&#34; https://haystack.deepset.ai/blog/https://www.youtube.com/embed/oluZaroQROM?si=IvZKTTLQ0FpGWrH3 title=&#34;YouTube video player&#34; frameborder=&#34;0&#34; allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&#34; referrerpolicy=&#34;strict-origin-when-cross-origin&#34; allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;
&lt;a href=&#34;https://www.deeplearning.ai/short-courses/building-ai-applications-with-haystack/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;🚀 Enroll now&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;what-youll-learn&#34;&gt;What you’ll learn&lt;/h2&gt;
&lt;p&gt;This course is designed with a bottom up approach, starting by teaching you about the building blocks of generative AI applications. Throughout the course, you’ll start to design and customize your own retrieval-augmented generative (RAG) pipelines, you’ll learn how to extend Haystack with custom capabilities for your own generative AI use case, you’ll even implement a simple self-reflecting agent and end the course by creating a full chat agent with function calling.&lt;/p&gt;
&lt;h2 id=&#34;who-is-the-course-for&#34;&gt;Who is the course for?&lt;/h2&gt;
&lt;p&gt;If you’re new to AI, this course is a great way to start learning about the basics and how they come together to build some of the most useful GenAI use cases. It will also teach you the fundamentals of Haystack and how you can make use of our framework to build out your own applications. You certainly do not need to have a deep understanding or much prior knowledge, however, it will be useful for you to be familiar with Python and have some basic understanding of different model types like embedding models and generative models.&lt;/p&gt;
&lt;p&gt;If you’re not new to AI, this course is a great way to get started with Haystack as a tool for you to make your GenAI use cases come to life.&lt;/p&gt;
&lt;p&gt;It’s been an absolute pleasure for me to create this with the Haystack team and the 
&lt;a href=&#34;http://deeplearning.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;DeepLearning.AI&lt;/a&gt; team. I’d love to hear what you think, and if you take the course and end up with questions, please join me for an 
&lt;a href=&#34;https://landing.deepset.ai/webinar-haystack-ama-session-deeplearning&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;AMA session on September 10th&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;follow-us&#34;&gt;Follow us&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://x.com/Haystack_AI&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack on Twitter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://www.youtube.com/@haystack_ai&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack on YouTube&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://discord.com/invite/xYvH6drSmA&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Join our Discord&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Advanced RAG: Query Expansion</title>
      <link>https://haystack.deepset.ai/blog/query-expansion/</link>
      <pubDate>Wed, 14 Aug 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/query-expansion/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      

      &lt;blockquote&gt;
&lt;p&gt;This is part one of the &lt;strong&gt;Advanced Use Cases&lt;/strong&gt; series:&lt;/p&gt;
&lt;p&gt;1️⃣ 
&lt;a href=&#34;https://haystack.deepset.ai/blog/extracting-metadata-filter&#34;&gt;Extract Metadata from Queries to Improve Retrieval&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;2️⃣ &lt;strong&gt;Query Expansion&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;3️⃣ 
&lt;a href=&#34;https://haystack.deepset.ai/blog/query-decomposition&#34;&gt;Query Decomposition&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;4️⃣ 
&lt;a href=&#34;https://haystack.deepset.ai/cookbook/metadata_enrichment&#34;&gt;Automated Metadata Enrichment&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The quality of RAG (retrieval augmented generation) highly depends on the quality of the first step in the process: retrieval. The generation step can only be as good as the context its working on, which it will receive as a result of a retrieval step.&lt;/p&gt;
&lt;p&gt;However, retrieval is also in turn dependent on the query that it receives. There are multiple types of retrieval: keyword based, semantic search (embedding) based, hybrid, or even in some cases simply based on the results of a query to an API (for example, the results of websearch and so on). But at the end of the day, in the majority of cases, there’s a human behind a keyboard typing a query, and humans are not guaranteed to produce good quality queries for the results they intend to get.&lt;/p&gt;
&lt;p&gt;In this article, we’ll walk you through a very simple yet effective technique that allows us to make sure we are retrieving more of, and more relevant bits of context to a given query: query expansion.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;TL;DR: Query expansion increases the number of results, so it increases recall (vs precision). In general, BM25 favors precision while embedding retrieval favors recall (See this 
&lt;a href=&#34;https://github.com/UKPLab/sentence-transformers/issues/22#issuecomment-529387645&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;explanation by Nils Reimers&lt;/a&gt;). So, it makes sense to use BM25&#43;query expansion to increase recall in cases where you want to rely on keyword search.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;query-expansion&#34;&gt;Query Expansion&lt;/h2&gt;
&lt;p&gt;Query expansion is a technique where we take the user query, and generate a certain number of similar queries. For example:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;User Query:&lt;/strong&gt; “open source NLP frameworks”&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;After Query Expansion:&lt;/strong&gt; [”natural language processing tools”, “free nlp libraries”, “open-source language processing platforms”, “NLP software with open-source code”, “open source NLP frameworks”]&lt;/p&gt;
&lt;p&gt;This helps improve retrieval results, and in turn the quality of RAG results in cases where:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The user query is vague or poorly formed.&lt;/li&gt;
&lt;li&gt;In cases of keyword-based retrieval, it also allows you to cover your bases with queries of similar meaning or synonyms.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Take ‘global warming’ as an example, query expansion would allow us to make sure we’re also doing keyword search for ‘climate change’ or similar queries.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/query-expansion/query-expansion.png alt=&#34;Query expansion&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Let’s start by importing the experimental &lt;code&gt;QueryExpander&lt;/code&gt; component. This component is using an OpenAI model (&lt;code&gt;gpt-4o-mini&lt;/code&gt; in this case) to generate a certain &lt;code&gt;number&lt;/code&gt; of additional queries that are similar to the original user query. It returns queries, which include the original query plus the generated similar ones:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;expander&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;QueryExpander&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;expander&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;open source nlp frameworks&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;number&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This would result in the component returning &lt;code&gt;queries&lt;/code&gt; that include the original query &#43; 4 expanded queries:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;queries&#39;&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;natural language processing tools&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;free nlp libraries&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;open-source language processing platforms&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;NLP software with open-source code&#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;open source nlp frameworks&#39;&lt;/span&gt;]}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;retrieval-with-query-expansion&#34;&gt;Retrieval With Query Expansion&lt;/h2&gt;
&lt;p&gt;Let’s look at what happens if we use query expansion as a step in our retrieval pipeline. Let’s look at this through a very simple and small demo. To this end, I used some dummy data. Here’s the list of &lt;code&gt;documents&lt;/code&gt; I used:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;documents = [
    Document(content=&#34;The effects of climate are many including loss of biodiversity&#34;),
    Document(content=&#34;The impact of climate change is evident in the melting of the polar ice caps.&#34;),
    Document(content=&#34;Consequences of global warming include the rise in sea levels.&#34;),
    Document(content=&#34;One of the effects of environmental changes is the change in weather patterns.&#34;),
    Document(content=&#34;There is a global call to reduce the amount of air travel people take.&#34;),
    Document(content=&#34;Air travel is one of the core contributors to climate change.&#34;),
    Document(content=&#34;Expect warm climates in Turkey during the summer period.&#34;),
]
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;When asking to retrieve the top 3 documents to the query “climate change” using the &lt;code&gt;InMemoryBM25Retriever&lt;/code&gt; (so, we’re doing keyword search) here’s what we get as our top 3 candidates:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&#39;Air travel is one of the core contributors to climate change.&#39;
&#39;The impact of climate change is evident in the melting of the polar ice caps.&#39;
&#39;The effects of climate are many including loss of biodiversity&#39;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;There are 2 things to notice here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;We’re only asking for 3 documents, and we’re getting 3 relevant documents to the query “climate change”. In this sense, this retrieval is completely valid and has done a good job.&lt;/li&gt;
&lt;li&gt;But, because we’re using the query “climate change” in combination with a keyword retriever, we are actually missing out on some documents that may be even more relevant to the query. For example, the document with “global warming” is completely left out.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You can start to see how this could impact the results you get in cases where users are typing vague queries or keywords into the search box.&lt;/p&gt;
&lt;p&gt;Now, let’s add query expansion to the mix. We will be using a custom retriever this time called the &lt;code&gt;MultiQueryInMemoryBM25Retriever&lt;/code&gt; which can accept a list of &lt;code&gt;queries&lt;/code&gt; instead of a single &lt;code&gt;query&lt;/code&gt; (see the cookbook for the full code). Here’s the retrieval pipeline that we create:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_expander&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;QueryExpander&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;MultiQueryInMemoryBM25Retriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryBM25Retriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;doc_store&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;expanded_retrieval_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;expanded_retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;expander&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;query_expander&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;expanded_retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;keyword_retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;expanded_retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;expander.queries&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;keyword_retriever.queries&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, we can run this pipeline, again with the same query “climate change”&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;expanded_retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;expander&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;climate change&#34;&lt;/span&gt;}}, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;																&lt;span style=&#34;color:#268bd2&#34;&gt;include_outputs_from&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;expander&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And we get the following results. The query expander has created the following &lt;code&gt;queries&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&#39;expander&#39;: {&#39;queries&#39;: [&#39;global warming consequences&#39;,
   &#39;environmental impact of climate change&#39;,
   &#39;effects of climate variability&#39;,
   &#39;implications of climate crisis&#39;,
   &#39;consequences of greenhouse gas emissions&#39;,
   &#39;climate change&#39;]}}
&lt;/code&gt;&lt;/pre&gt;&lt;blockquote&gt;
&lt;p&gt;Note that you may get different results because your &lt;code&gt;QueryExpander&lt;/code&gt; may generate different &lt;code&gt;queries&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And we’ve received the following documents from the retrieval pipeline:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&#39;Consequences of global warming include the rise in sea levels.&#39;
&#39;The impact of climate change is evident in the melting of the polar ice caps.&#39;,
&#39;There is a global call to reduce the amount of air travel people take.&#39;
&#39;The effects of climate are many including loss of biodiversity&#39;
&#39;One of the effects of environmental changes is the change in weather patterns.&#39;
&#39;Air travel is one of the core contributors to climate change.&#39;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Notice how we’re able to add context about ‘global warming’ and ‘effects of environmental change’.&lt;/p&gt;
&lt;h2 id=&#34;using-query-expansion-for-rag&#34;&gt;Using Query Expansion for RAG&lt;/h2&gt;
&lt;p&gt;In the example cookbook, we’ve also added a section on using query expansion for RAG on Wikipedia pages. We index the following wikipedia pages into an &lt;code&gt;InMemoryDocumentStore&lt;/code&gt; :&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&#34;Electric_vehicle&#34;, &#34;Dam&#34;, &#34;Electric_battery&#34;, &#34;Tree&#34;, &#34;Solar_panel&#34;, &#34;Nuclear_power&#34;,
&#34;Wind_power&#34;, &#34;Hydroelectricity&#34;, &#34;Coal&#34;, &#34;Natural_gas&#34;, 
&#34;Greenhouse_gas&#34;, &#34;Renewable_energy&#34;, &#34;Fossil_fuel&#34;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;And then, we construct a RAG pipeline. For our resulting prompt to the LLM, we also indicate what the original query from the user was.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/query-expansion/pipeline.png alt=&#34;RAG Pipeline&#34;  /&gt;
&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You are part of an information system that summarises related documents.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You answer a query using the textual content from the documents retrieved for the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;following query.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You build the summary answer based only on quoting information from the documents.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You should reference the documents you used to support your answer.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;###
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Original Query: &#34;{{query}}&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Retrieved Documents: {{documents}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Summary Answer:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_expander&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;QueryExpander&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;MultiQueryInMemoryBM25Retriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryBM25Retriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;doc_store&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_expanded_rag_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_expanded_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;expander&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;query_expander&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_expanded_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;keyword_retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_expanded_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_expanded_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_expanded_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;expander.queries&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;keyword_retriever.queries&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_expanded_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;keyword_retriever.documents&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_expanded_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Running this pipeline with the simple query “green energy sources” with the query expander, we’re able to get a response constructed from Wikipedia pages including “Electric Vehicle”,  “Wind Power”,  “Renewable Energy”, “Fossil Fuel” and “Nuclear Power”. Without the &lt;code&gt;MultiQueryInMemoryBM25Retriever&lt;/code&gt;, we rely on the top k results from a single pass of BM25 retrieval on the query “green energy sources” resulting in a response constructed from the pages “Renewable energy”, “Wind Power” and “Fossil Fuel”&lt;/p&gt;
&lt;h2 id=&#34;wrapping-up&#34;&gt;Wrapping Up&lt;/h2&gt;
&lt;p&gt;Query Expansion is a great technique that will allow you to get a wider range of relevant resources while still using keyword search. While semantic search is a great option, it does require the use of an embedding model, and the existence of embeddings for the data source we will perform search on. This makes keyword based search quite an attractive option for faster, cheaper retrieval.&lt;/p&gt;
&lt;p&gt;This does however mean that we heavily rely on the quality of the provided query. Query expansion allows you to navigate this issue by generating similar queries to the user query.&lt;/p&gt;
&lt;p&gt;In my opinion, one of the main advantages of this technique is that it allows you to avoid embedding documentation at each update, while still managing to increase the relevance of retrieved documents at query time. Keyword retrieval doesn’t require any extra embedding step, so the only inferencing happening at retrieval time in this scenario is when we ask an LLM to generate a certain number of similar queries.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>CPU-Optimized Embedding Models with fastRAG and Haystack</title>
      <link>https://haystack.deepset.ai/blog/cpu-optimized-models-with-fastrag/</link>
      <pubDate>Thu, 01 Aug 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/cpu-optimized-models-with-fastrag/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;One of the main and critical components of a retrieval augmented generation (RAG) pipeline is the embedding process, which forms the foundation for efficient information retrieval by transforming raw text into machine-readable vector representations. Embedding models encode textual data into dense vectors, capturing semantic and contextual meaning. These models are used to create embeddings for both queries (for retrieval) and documents (for indexing and reranking). Therefore, optimizing these models through quantization could improve our RAG application by providing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Higher throughput&lt;/strong&gt;&lt;/em&gt;: useful for reducing the time needed for creating or updating your vectors store.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Lower latency&lt;/strong&gt;&lt;/em&gt;: improves real-time experience as creating query embeddings and re-ranking of documents are done online per user input.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;Reduced memory and cost requirements&lt;/strong&gt;&lt;/em&gt;: Optimizing by quantization to &lt;code&gt;int8&lt;/code&gt; reduces the memory footprint and the cost when running such models.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is where specialized frameworks, such as 
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;fastRAG&lt;/a&gt; by Intel Labs, come into play, offering enhancements tailored to specific hardware and use cases, and with the extensive feature-set offered by Haystack.&lt;/p&gt;
&lt;h2 id=&#34;fastrag-intel-labs-framework-for-efficient-rag&#34;&gt;fastRAG: Intel Labs’ Framework for Efficient RAG&lt;/h2&gt;
&lt;p&gt;fastRAG is a research framework developed by Intel Labs for efficient and optimized RAG pipelines. It incorporates state-of-the-art large language models (LLMs) and information retrieval capabilities. fastRAG is fully compatible with Haystack and includes novel and efficient RAG modules designed for efficient deployment on Intel hardware, including client and server CPUs (Xeon) and the 
&lt;a href=&#34;https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Intel Gaudi AI accelerator&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The fastRAG 
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GitHub repository&lt;/a&gt; provides extensive documentation on each component available in the framework, comprehensive examples, and easy installation instructions for optimized backends. The framework utilizes optimized extensions to popular deep learning frameworks such as PyTorch.&lt;/p&gt;
&lt;p&gt;One such extension is 
&lt;a href=&#34;https://github.com/huggingface/optimum-intel&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Optimum Intel&lt;/a&gt;, an open-source library that extends the Hugging Face Transformers library and takes advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512), Vector Neural Network Instructions (VNNI), and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs to accelerate models. AMX accelerated inference is introduced in PyTorch 2.0 and the 
&lt;a href=&#34;https://github.com/intel/intel-extension-for-pytorch&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Intel Extension for PyTorch (IPEX)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Intel&lt;/strong&gt; and &lt;strong&gt;deepset&lt;/strong&gt; are key members in 
&lt;a href=&#34;https://opea.dev/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;Open Platform for Enterprise AI (OPEA)&lt;/strong&gt;&lt;/a&gt;, a project recently announced by 
&lt;a href=&#34;https://lfaidata.foundation/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;LF AI &amp; Data Foundation&lt;/strong&gt;&lt;/a&gt;. OPEA aims to accelerate secure, cost-effective generative AI (GenAI) deployments for businesses by driving interoperability across a diverse and heterogeneous ecosystem, starting with RAG.&lt;/p&gt;
&lt;h2 id=&#34;optimization-process-quantization&#34;&gt;Optimization Process: Quantization&lt;/h2&gt;
&lt;p&gt;The optimization process involves quantizing the model using a calibration dataset, and leveraging an optimized backend like IPEX for Intel Xeon CPUs. Quantization reduces the model size by converting weights and activations from floating-point (e.g., 32-bit) to lower-bit representations (e.g., 8-bit integers). This makes models &lt;strong&gt;smaller&lt;/strong&gt;, &lt;strong&gt;faster&lt;/strong&gt;, and &lt;strong&gt;more cost-efficient&lt;/strong&gt;, with negligible loss in accuracy. Benchmark results of 
&lt;a href=&#34;https://huggingface.co/BAAI/bge-large-en-v1.5&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BGE-large&lt;/a&gt; demonstrate a potential &lt;strong&gt;10x speed-up&lt;/strong&gt; in the indexing process when using the &lt;code&gt;int8&lt;/code&gt; variant of the model.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/cpu-optimized-models-with-fastrag/pure-benchmarks.png alt=&#34;Throughput comparison of int8 and FP32 variants of the BGE-large model with varying batch sizes (sequence length=256) on a Gen 4 Xeon CPU (8480&#43;) with 56 Cores (1 Socket)&#34; class=&#34;medium&#34; /&gt;
        &lt;figcaption&gt;Throughput comparison of int8 and FP32 variants of the BGE-large model with varying batch sizes (sequence length=256) on a Gen 4 Xeon CPU (8480&#43;) with 56 Cores (1 Socket)&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note that the benchmark results focus solely on the time spent in the encoding process of the BGE-large model variants. Time taken for tokenization is excluded from these measurements to provide a clearer comparison of the models’ encoding efficiency.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/tree/main/scripts/optimizations/embedders&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;A comprehensive guide&lt;/a&gt; to quantizing a model from scratch is available. Additionally, three quantized 
&lt;a href=&#34;https://huggingface.co/collections/Intel/bge-65e0910105caf76026a64cc9&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BGE embedding models&lt;/a&gt; are available on Intel’s Hugging Face Model Hub.&lt;/p&gt;
&lt;h2 id=&#34;components&#34;&gt;Components&lt;/h2&gt;
&lt;p&gt;
&lt;a href=&#34;https://haystack.deepset.ai/integrations/fastrag&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;fastRAG&lt;/a&gt; is supported as an integration to Haystack, extending Haystack’s document and text embedders with IPEX support. In addition, fastRAG includes two Bi-encoder similarity rankers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/blob/main/fastrag/embedders/ipex_embedder.py#L97&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;IPEXSentenceTransformersDocumentEmbedder&lt;/a&gt; and 
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/blob/main/fastrag/embedders/ipex_embedder.py#L109&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;IPEXSentenceTransformersTextEmbedder&lt;/a&gt; - Embedder components that use an &lt;code&gt;int8&lt;/code&gt; quantized embedding models via IPEX, and can embed &lt;code&gt;Document&lt;/code&gt; and text inputs.&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/blob/main/fastrag/rankers/bi_encoder_ranker.py#L11&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BiEncoderSimilarityRanker&lt;/a&gt; - A bi-encoder similarity ranker that re-orders a list of documents given a query and an embedder. Bi-encoder models are used to encode documents and queries independently and are more efficient than cross-encoders.&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/blob/main/fastrag/rankers/ipex_bi_encoder_ranker.py#L5&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;IPEXBiEncoderSimilarityRanker&lt;/a&gt; - An IPEX-based 
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/blob/main/fastrag/rankers/bi_encoder_ranker.py#L11&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BiEncoderSimilarityRanker&lt;/a&gt; to be used with an &lt;code&gt;int8&lt;/code&gt; quantized embedding model.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See the full list of fastRAG components 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/fastrag#-components&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;same-accuracy-9x-faster&#34;&gt;Same Accuracy, 9x Faster&lt;/h2&gt;
&lt;p&gt;Maintaining competitive retrieval accuracy is important when optimizing the models. We evaluated the impact of optimization (quantization and calibration) on performance using the Rerank and Retrieval sub-tasks of MTEB with three BGE bi-encoder embedding models. For the 
&lt;a href=&#34;https://huggingface.co/Intel/bge-large-en-v1.5-rag-int8-static&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BGE-large&lt;/a&gt; model, the optimization process marginally altered performance compared to the original model, as shown in the table.&lt;/p&gt;
&lt;div class=&#34;styled-table&#34;&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;&lt;/th&gt;
          &lt;th&gt;int8&lt;/th&gt;
          &lt;th&gt;FP32&lt;/th&gt;
          &lt;th&gt;%diff&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Reranking&lt;/td&gt;
          &lt;td&gt;0.5997&lt;/td&gt;
          &lt;td&gt;0.6003&lt;/td&gt;
          &lt;td&gt;-0.108%&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Retrieval&lt;/td&gt;
          &lt;td&gt;0.5346&lt;/td&gt;
          &lt;td&gt;0.5429&lt;/td&gt;
          &lt;td&gt;-1.53%&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;Results for other BGE models can be found 
&lt;a href=&#34;https://huggingface.co/blog/intel-fast-embedding&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let’s compare encoding random texts as passages using two different models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://huggingface.co/BAAI/bge-large-en-v1.5&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BAAI/bge-large-en-v1.5&lt;/a&gt; (&lt;code&gt;fp32&lt;/code&gt;) with Haystack’s 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;SentenceTransformersDocumentEmbedder&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://huggingface.co/Intel/bge-large-en-v1.5-rag-int8-static&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Intel/bge-large-en-v1.5-rag-int8-static&lt;/a&gt; (&lt;code&gt;int8&lt;/code&gt;) with fastRAG’s 
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/blob/main/fastrag/embedders/ipex_embedder.py#L97&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;IPEXSentenceTransformersDocumentEmbedder&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The script below creates random passages, each passage translates into 256 tokens once encoded with the tokenizer, and encodes 16384 passages using the same model in both &lt;code&gt;fp32&lt;/code&gt; and &lt;code&gt;int8&lt;/code&gt; variants.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;time&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;datasets&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;load_dataset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;fastrag.embedders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;IPEXSentenceTransformersDocumentEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.embedders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;transformers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;AutoTokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;tokenizer&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;AutoTokenizer&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_pretrained&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Intel/bge-large-en-v1.5-rag-int8-static&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;generate_text_for_fixed_length&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;seq_length&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;while&lt;/span&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Tokenize the current text&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;token_ids&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;tokenizer&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;)[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;input_ids&#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Check if the tokenized sequence has reached the desired length&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;len&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;token_ids&lt;/span&gt;) == &lt;span style=&#34;color:#268bd2&#34;&gt;seq_length&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;break&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;elif&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;len&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;token_ids&lt;/span&gt;) &gt; &lt;span style=&#34;color:#268bd2&#34;&gt;seq_length&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;[:-&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Add a random character to the text&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt; &#43;= &lt;span style=&#34;color:#268bd2&#34;&gt;random&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;choice&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;string&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;ascii_letters&lt;/span&gt; &#43; &lt;span style=&#34;color:#268bd2&#34;&gt;string&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;digits&lt;/span&gt; &#43; &lt;span style=&#34;color:#268bd2&#34;&gt;string&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;punctuation&lt;/span&gt; &#43; &lt;span style=&#34;color:#2aa198&#34;&gt;&#34; &#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;seq_length&lt;/span&gt; = &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;256&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generated_texts&lt;/span&gt; = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;_&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;tqdm&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;range&lt;/span&gt;(&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1000&lt;/span&gt;), &lt;span style=&#34;color:#268bd2&#34;&gt;desc&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Generating texts&#34;&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;generated_texts&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;append&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;generate_text_for_fixed_length&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;seq_length&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generated_texts&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;generated_texts&lt;/span&gt; * &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;20&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;generated_texts&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;BATCH_SIZE_LIST&lt;/span&gt; = [&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;4&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;8&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;16&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;32&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;64&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;128&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;256&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;BATCH_SIZE&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;BATCH_SIZE_LIST&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Running with BATCH_SIZE:&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;BATCH_SIZE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;ipex_doc_embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;IPEXSentenceTransformersDocumentEmbedder&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Intel/bge-large-en-v1.5-rag-int8-static&#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;batch_size&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;BATCH_SIZE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_doc_embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;BAAI/bge-large-en-v1.5&#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;batch_size&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;BATCH_SIZE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;ipex_doc_embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;warm_up&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_doc_embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;warm_up&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Measure runtime for SentenceTransformersDocumentEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;start_time&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;time&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;time&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;documents_with_embeddings&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_doc_embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;[:&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;16384&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;end_time&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;time&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;time&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_doc_embedder_runtime&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;end_time&lt;/span&gt; - &lt;span style=&#34;color:#268bd2&#34;&gt;start_time&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Measure runtime for IPEXSentenceTransformersDocumentEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;start_time&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;time&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;time&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;documents_with_embeddings&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ipex_doc_embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;[:&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;16384&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;end_time&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;time&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;time&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;ipex_doc_embedder_runtime&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;end_time&lt;/span&gt; - &lt;span style=&#34;color:#268bd2&#34;&gt;start_time&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Runtime for SentenceTransformersDocumentEmbedder:&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_doc_embedder_runtime&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Runtime for IPEXSentenceTransformersDocumentEmbedder:&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;ipex_doc_embedder_runtime&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The runtime results indicate that using the setup of fastRAG’s components, as demonstrated in the script above, leads to &lt;strong&gt;5.25x to 9.3x speed-ups&lt;/strong&gt; in the embedding process when running on a single socket of a &lt;strong&gt;Gen 4 Xeon CPU (8480&#43;)&lt;/strong&gt; and using 56 cores. We can also translate that to throughput (higher is better) and see the differences in speed-ups as well.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;💡 The difference in speedups compared to the benchmark presented previously is due to the extra processing done in Haystack components, mainly, the tokenization process which was excluded in the previous benchmark.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/cpu-optimized-models-with-fastrag/embedder-benchmarks.png alt=&#34;Throughput comparison of int8 and FP32 variants of the BGE-large model using IPEXSentenceTransformersDocumentEmbedder and SentenceTransformersDocumentEmbedder&#34; class=&#34;medium&#34; /&gt;
        &lt;figcaption&gt;Throughput comparison of int8 and FP32 variants of the BGE-large model using &lt;code&gt;IPEXSentenceTransformersDocumentEmbedder&lt;/code&gt; and &lt;code&gt;SentenceTransformersDocumentEmbedder&lt;/code&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;💡 The performance of the quantized &lt;code&gt;int8&lt;/code&gt; model highly depends on the structure of the data. For best performance, it is advised to use static shapes, meaning, and tokenized sequences of the same length. In addition, batching is highly effective when using CPU backends and it could also be combined with dynamic shapes. It’s a matter of tuning the setup according to the data and hardware.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;💡 We followed the instructions available 
&lt;a href=&#34;https://github.com/IntelLabs/fastRAG/tree/main/scripts/optimizations/embedders#running-instructions&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt; when running the experiments, which includes using &lt;code&gt;numactl&lt;/code&gt; to limit running the processes on a single socket, and TCMalloc. We recommend reading the 
&lt;a href=&#34;https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/tuning_guide.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;performance tuning guide&lt;/a&gt; and 
&lt;a href=&#34;https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;launch script usage&lt;/a&gt; available on the IPEX documentation website.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Read the Intel fastRAG team’s 
&lt;a href=&#34;https://huggingface.co/blog/intel-fast-embedding&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;blog&lt;/a&gt; with additional evaluations and performance benchmarking for more information.&lt;/p&gt;
&lt;h2 id=&#34;rag-with-optimized-embedding-models&#34;&gt;RAG with Optimized Embedding Models&lt;/h2&gt;
&lt;p&gt;In this section, we will explore how to use optimized models within a RAG pipeline. We will use embedder models to create the initial index more quickly than the standard &lt;code&gt;fp32&lt;/code&gt; Hugging Face models. Additionally, we will demonstrate a simple Q&amp;A pipeline that employs an optimized bi-encoder ranker. This ranker re-orders the retrieved documents to enhance the list of documents used in the LLM prompt, thereby improving the overall performance of the retrieval process.&lt;/p&gt;
&lt;h3 id=&#34;installation&#34;&gt;Installation&lt;/h3&gt;
&lt;p&gt;First, install fastRAG, Optimum Intel and Haystack via fastRAG:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pip install fastrag[intel]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;indexing-data&#34;&gt;Indexing Data&lt;/h3&gt;
&lt;p&gt;We will start with initializing an in-memory data store and loading the document embedder component from fastRAG. The &lt;code&gt;IPEXSentenceTransformersDocumentEmbedder&lt;/code&gt; can be seamlessly integrated into a Haystack pipeline, just like any other component.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;fastrag.embedders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;IPEXSentenceTransformersDocumentEmbedder&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;IPEXSentenceTransformersTextEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;doc_embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;IPEXSentenceTransformersDocumentEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Intel/bge-small-en-v1.5-rag-int8-static&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;doc_embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;warm_up&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, let’s load a dataset. We’ll use 
&lt;a href=&#34;https://huggingface.co/datasets/bilgeyucel/seven-wonders&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;bilgeyucel/seven-wonders&lt;/a&gt; dataset that doesn’t need any further processing:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;datasets&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;load_dataset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;dataset&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;load_dataset&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;bilgeyucel/seven-wonders&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;split&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;train&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;content&#34;&lt;/span&gt;], &lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;meta&#34;&lt;/span&gt;]) &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;dataset&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, we embed the documents and write them to the index:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;documents_with_embeddings&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;doc_embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;write_documents&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;documents_with_embeddings&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;documents&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;rag-pipeline&#34;&gt;RAG Pipeline&lt;/h3&gt;
&lt;p&gt;We continue to initialize components required to build a pipeline that represents a simple Q&amp;A RAG example using an embedder, a retriever, a reranker, a prompt template, and a generator. Notably, the &lt;code&gt;IPEXSentenceTransformersTextEmbedder&lt;/code&gt; and &lt;code&gt;IPEXBiEncoderSimilarityRanker&lt;/code&gt; can be seamlessly integrated into a Haystack pipeline alongside other components.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Learn how to create a RAG pipeline with Haystack in 📚
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/27_first_rag_pipeline&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Tutorial: Creating Your First QA Pipeline with Retrieval-Augmentation&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;fastrag.rankers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;IPEXBiEncoderSimilarityRanker&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query_embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;IPEXSentenceTransformersTextEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Intel/bge-small-en-v1.5-rag-int8-static&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;100&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;reranker&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;IPEXBiEncoderSimilarityRanker&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Intel/bge-large-en-v1.5-rag-int8-static&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We create a simple RAG prompt template:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You are a helpful AI assistant. You are given contexts and a question. 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You must answer the question using the information given in the context.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Context:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or document in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    {{ document.content }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Question: {{question}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Answer:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We initialize a 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/huggingfacelocalgenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;HuggingFaceLocalGenerator&lt;/a&gt; with 
&lt;a href=&#34;https://huggingface.co/HuggingFaceTB/SmolLM-1.7B-Instruct&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;HuggingFaceTB/SmolLM-1.7B-Instruct&lt;/a&gt; model (&lt;code&gt;SmolLM-1.7B-Instruct&lt;/code&gt; using a local Hugging Face model) to generate answers:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;HuggingFaceLocalGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;HuggingFaceLocalGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;HuggingFaceTB/SmolLM-1.7B-Instruct&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                      &lt;span style=&#34;color:#268bd2&#34;&gt;task&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text-generation&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                      &lt;span style=&#34;color:#268bd2&#34;&gt;generation_kwargs&lt;/span&gt;={
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;max_new_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;100&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;do_sample&#34;&lt;/span&gt;: &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                        })
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally, we create the pipeline:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;query_embedder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reranker&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;reranker&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reranker.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reranker&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;try-the-pipeline&#34;&gt;Try the Pipeline&lt;/h3&gt;
&lt;p&gt;Let’s try the pipeline with a real question:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;What does Rhodes Statue look like?&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;embedder&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;text&#39;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;reranker&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;query&#39;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;prompt_builder&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;question&#39;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;llm&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;replies&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;The&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;statue&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;was&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;a&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Colossus&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Rhodes&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;a&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;statue&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Greek&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;sun&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;god&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Helios&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;that&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;stood&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;city&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Rhodes&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;was&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;one&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Seven&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Wonders&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Ancient&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;World&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;It&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;said&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;have&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;stood&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;about&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;100&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;feet&lt;/span&gt; (&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;30&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;meters&lt;/span&gt;) &lt;span style=&#34;color:#268bd2&#34;&gt;tall&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;making&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;it&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;tallest&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;statue&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;its&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;time&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;The&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;statue&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;was&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;built&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;by&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Chares&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Lindos&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;between&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;280&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;240&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;BC&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;It&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;was&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;destroyed&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;by&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;an&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;earthquake&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;226&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;summary&#34;&gt;Summary&lt;/h2&gt;
&lt;p&gt;In this short blog, we’ve highlighted the significant advantages of CPU-optimized embedding models in terms of accuracy and performance, demonstrating how seamlessly these components can be incorporated into your Haystack pipeline. At the forefront of these advancements is fastRAG, a research library dedicated to integrating Intel-based optimizations into Haystack.&lt;/p&gt;
&lt;p&gt;The fastRAG team provides in-depth information on the quantization process and extensive benchmarking conducted on 4th Gen Xeon processors. To dive deep into the optimizations, read this detailed 
&lt;a href=&#34;https://huggingface.co/blog/intel-fast-embedding&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;blog post&lt;/a&gt; and join our 
&lt;a href=&#34;https://discord.com/invite/xYvH6drSmA&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Discord community&lt;/a&gt; to explore Haystack.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Mixedbread  🤝 deepset: Announcing our New German/English Embedding Model</title>
      <link>https://haystack.deepset.ai/blog/announcing-our-new-german-embedding-model/</link>
      <pubDate>Thu, 18 Jul 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/announcing-our-new-german-embedding-model/</guid>
      <description>
      
      
      

      
      

      <p>It’s 2024 and yet, most models today are still primarily geared towards English speaking markets. Today, 
<a href="https://deepset.ai" target="_blank" rel="noopener">deepset</a> and 
<a href="https://www.mixedbread.ai/" target="_blank" rel="noopener">Mixedbread</a> are jointly announcing our latest contribution towards changing that landscape: A new open-source German/English embedding model - 
<a href="https://huggingface.co/mixedbread-ai/deepset-mxbai-embed-de-large-v1" target="_blank" rel="noopener">deepset-mxbai-embed-de-large-v1</a>.</p>
<p>Our model is based on 
<a href="https://huggingface.co/intfloat/multilingual-e5-large" target="_blank" rel="noopener">intfloat/multilingual-e5-large</a> and was fine-tuned on 30+ million pairs of German data for retrieval tasks. On the 
<a href="https://www.evidentlyai.com/ranking-metrics/ndcg-metric" target="_blank" rel="noopener">NDCG</a>@10 metric, which compares the list of retrieval results against an ideally ordered list of expected results, our model not only sets a new standard for open-source German embedding models but is also competitive with commercial alternatives.</p>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>Avg. Performance (NDCG@10)</th>
          <th>Binary Support</th>
          <th>MRL Support</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>deepset-mxbai-embed-de-large-v1</strong></td>
          <td><strong>51.7</strong></td>
          <td>✅</td>
          <td>✅</td>
      </tr>
      <tr>
          <td>multilingual-e5-large</td>
          <td>50.5</td>
          <td>❌</td>
          <td>❌</td>
      </tr>
      <tr>
          <td>jina-embeddings-v2-base-de</td>
          <td>50.0</td>
          <td>✅</td>
          <td>❌</td>
      </tr>
      <tr>
          <td>Commercial Models</td>
          <td></td>
          <td></td>
          <td></td>
      </tr>
      <tr>
          <td>Cohere Multilingual v3</td>
          <td><em>52.4</em></td>
          <td>✅</td>
          <td>-</td>
      </tr>
  </tbody>
</table>
<p>Nvidia enabled this work by providing cutting-edge computational resources. All training and evaluation was done on a <strong>Nvidia DGX with 8xA100</strong>, sponsored by Nvidia. We are extremely grateful for their contribution to this project.</p>
<blockquote>
<p>To learn more and get a deeper dive into benchmarks on real-world data, 
<a href="https://www.mixedbread.ai/blog/deepset-mxbai-embed-de-large-v1" target="_blank" rel="noopener">read our full announcement article with Mixedbread</a>. You can find an overview of the benchmarks in this 
<a href="https://docs.google.com/spreadsheets/d/1RIwLk7Ldy5CI03ckqJuOE2BJTYzP4JbVqP6Rz3WO1kw/edit?usp=sharing" target="_blank" rel="noopener">spreadsheet</a>.</p>
</blockquote>
<h2 id="storage-and-inference-efficiency">Storage and Inference Efficiency</h2>
<p>Beyond support for the German language, we also focused on improving the storage and inference efficiency of this new embedding model using the following methods:</p>
<p><strong>Matryoshka Representation Learning (MRL):</strong> 
<a href="https://huggingface.co/blog/matryoshka" target="_blank" rel="noopener">Matryoshka representation learning</a> reduces the number of output dimensions in an embedding model without significant accuracy loss. This is done by modifying the loss function to prioritise the representation of important information in the initial dimensions of the embedding vector, enabling the truncation of later dimensions.</p>
<p><strong>Binary Quantization:</strong> 
<a href="https://huggingface.co/blog/embedding-quantization" target="_blank" rel="noopener">Binary quantization</a>  reduces the size of each dimension by converting float32 values to binary values, significantly enhancing memory and disk space efficiency while retaining high performance during inference.</p>
<h2 id="start-using-it-with-haystack">Start Using it With Haystack</h2>
<p>You can start using deepset-mxbai-embed-de-large-v1 today with the 
<a href="https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder" target="_blank" rel="noopener">SentenceTransformersDocumentEmbedder</a> and 
<a href="https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder" target="_blank" rel="noopener">SentenceTransformersTextEmbedder</a> components in Haystack, as well as the 
<a href="https://haystack.deepset.ai/integrations/mixedbread-ai" target="_blank" rel="noopener">Mixedbread integrations</a> of <code>MixedbreadDocumentEmbedder</code> and <code>MixedbreadTextEmbedder</code>:</p>
<h3 id="use-it-with-the-sentence-transformers-embedders">Use it with the Sentence Transformers Embedders</h3>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.embedders</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">SentenceTransformersTextEmbedder</span>, <span style="color:#268bd2">SentenceTransformersDocumentEmbedder</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">text_embedder</span> = <span style="color:#268bd2">SentenceTransformersTextEmbedder</span>(<span style="color:#268bd2">model</span>=<span style="color:#2aa198">"mixedbread-ai/deepset-mxbai-embed-de-large-v1"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">document_embedder</span> = <span style="color:#268bd2">SentenceTransformersDocumentEmbedder</span>(<span style="color:#268bd2">model</span>=<span style="color:#2aa198">"mixedbread-ai/deepset-mxbai-embed-de-large-v1"</span>)
</span></span></code></pre></div><h3 id="use-it-with-the-mixedbread-embedders">Use it with the Mixedbread Embedders</h3>
<p>To start using this model with the 
<a href="https://haystack.deepset.ai/integrations/mixedbread-ai" target="_blank" rel="noopener">Mixedbread integration for Haystack</a>, install <code>mixedbread-ai-haystack</code> and export your Mixedbread API key to <code>MXBAI_API_KEY</code>.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">mixedbread_ai_haystack</span> <span style="color:#dc322f;font-weight:bold">import</span>  <span style="color:#268bd2">MixedbreadAITextEmbedder</span>, <span style="color:#268bd2">MixedbreadAIDocumentEmbedder</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">mixedbread_ai</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">EncodingFormat</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">text_embedder</span> = <span style="color:#268bd2">MixedbreadAITextEmbedder</span>( <span style="color:#268bd2">model</span>=<span style="color:#2aa198">"mixedbread-ai/deepset-mxbai-embed-de-large-v1"</span>,
</span></span><span style="display:flex;"><span>                                          <span style="color:#268bd2">encoding_format</span>=<span style="color:#268bd2">EncodingFormat</span>.<span style="color:#268bd2">BINARY</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">document_embedder</span> = <span style="color:#268bd2">MixedbreadAIDocumentEmbedder</span>(<span style="color:#268bd2">model</span>=<span style="color:#2aa198">"mixedbread-ai/deepset-mxbai-embed-de-large-v1"</span>,
</span></span><span style="display:flex;"><span>                                                 <span style="color:#268bd2">encoding_format</span>=<span style="color:#268bd2">EncodingFormat</span>.<span style="color:#268bd2">BINARY</span>)
</span></span></code></pre></div><p>Like our influential 
<a href="https://www.deepset.ai/german-bert" target="_blank" rel="noopener">German BERT model</a>, we hope that this state-of-the-art model will enable the German-speaking AI community to build innovative products in the field of retrieval-augmented generation (RAG) and beyond!</p>
<p>Join our 
<a href="https://discord.com/invite/xYvH6drSmA" target="_blank" rel="noopener">Discord community</a> to explore Haystack.</p>

      </description>
    </item>
    
    <item>
      <title>Benchmarking Haystack Pipelines for Optimal Performance</title>
      <link>https://haystack.deepset.ai/blog/benchmarking-haystack-pipelines/</link>
      <pubDate>Mon, 24 Jun 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/benchmarking-haystack-pipelines/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;In this article, we will show you how to use Haystack to evaluate the performance of a RAG pipeline. Note that the code in this article is meant to be illustrative and may not run as is; if you want to run the code, please refer to the 
&lt;a href=&#34;https://github.com/deepset-ai/haystack-evaluation/blob/main/evaluations/evaluation_aragog.py&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;python script&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;introduction&#34;&gt;Introduction&lt;/h2&gt;
&lt;p&gt;This article will guide you through building a Retrieval-Augmented Generation (RAG) pipeline using Haystack, adjusting various parameters, and evaluating it with the ARAGOG dataset. The dataset consists of pairs of questions and answers, and our objective is to assess the RAG pipeline’s efficiency in retrieving the correct context and generating accurate answers. To do this, we will use the following evaluation metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/contextrelevanceevaluator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ContextRelevance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/faithfulnessevaluator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Faithfulness&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/sasevaluator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Semantic Answer Similarity&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We did this experiment by relying on three different Haystack pipelines with different purposes: one pipeline for indexing, another for RAG, and one for evaluation. We describe each of these pipelines in detail and show how to combine them together to evaluate the RAG pipeline.&lt;/p&gt;
&lt;p&gt;The article is organized as follows: we first describe the origin and authorship of the ARAGOG dataset, then we build the pipelines. We then demonstrate how to integrate everything, performing multiple runs over the dataset and adjusting parameters. These parameters were chosen based on feedback from our community, reflecting how users optimize their pipelines:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;top_k&lt;/code&gt;: the maximum number of documents returned by the retriever. For this experiment, we tested our pipeline with &lt;code&gt;top_k&lt;/code&gt; value of &lt;code&gt;[1, 2, 3]&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;embedding_model&lt;/code&gt;: the model used to encode the documents and the question. For this example, we used these sentence-transformers models:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;msmarco-distilroberta-base-v2&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;all-mpnet-base-v2&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;chunk_size&lt;/code&gt;: the number of tokens in the input text that makes up segments of text to be embedded and indexed. For this experiment, we tested our pipeline with &lt;code&gt;chunk_size&lt;/code&gt; of &lt;code&gt;[64, 128, 256]&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We end by discussing the results of the evaluation and sharing some lessons learned.&lt;/p&gt;
&lt;h3 id=&#34;the-aragog-advanced-rag-output-grading-dataset&#34;&gt;The “ARAGOG: Advanced RAG Output Grading” Dataset&lt;/h3&gt;
&lt;p&gt;The knowledge data, as well as the questions and answers, all stem from the 
&lt;a href=&#34;https://arxiv.org/pdf/2404.01037&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ARAGOG: Advanced RAG Output Grading&lt;/a&gt; paper. The data is a subset of the 
&lt;a href=&#34;https://huggingface.co/datasets/jamescalam/ai-arxiv&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;AI ArXiv Dataset&lt;/a&gt; and consists of 423 selected research papers centered around the themes of Transformers and Large Language Models (LLMs).&lt;/p&gt;
&lt;p&gt;The evaluation dataset comprises 107 question-answer pairs (QA) generated with the assistance of GPT-4. Each QA pair is validated and corrected by humans, ensuring that the evaluation is correct and accurately measures the RAG techniques’ performance in real-world applications.&lt;/p&gt;
&lt;p&gt;Within the scope of this article, we only considered 16 papers, the ones from which the questions were drawn, instead of the 423 papers in the original dataset, to reduce the computational cost.&lt;/p&gt;
&lt;h2 id=&#34;the-indexing-pipeline&#34;&gt;The Indexing Pipeline&lt;/h2&gt;
&lt;p&gt;The indexing pipeline is responsible for preprocessing and storing the documents in a 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;DocumentStore&lt;/code&gt;&lt;/a&gt;. We will define a function that wraps a pipeline, takes the embedding model and the chunk size as parameters, and returns a DocumentStore for later use. The pipeline in the function first converts the PDF files into Documents, cleans them, splits them into chunks, and then embeds them using a 
&lt;a href=&#34;https://docs.haystack.deepset.ai/reference/embedders-api#sentencetransformersdocumentembedder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;SentenceTransformers&lt;/code&gt;&lt;/a&gt; model. The embeddings are then stored in an 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/inmemorydocumentstore&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;InMemoryDocumentStore&lt;/code&gt;&lt;/a&gt;. Learn more about creating an indexing pipeline in 📚 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/30_file_type_preprocessing_index_pipeline&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Tutorial: Preprocessing Different File Types&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For this example, we store the documents using the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/inmemorydocumentstore&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;InMemoryDocumentStore&lt;/code&gt;&lt;/a&gt;, but you can use any 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/choosing-a-document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;other document store supported by Haystack&lt;/a&gt;. We split the documents by word, but you can split them by sentence or paragraph by changing the value of &lt;code&gt;split_by&lt;/code&gt; parameter in the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/documentsplitter&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;DocumentSplitter&lt;/code&gt;&lt;/a&gt; component.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We need to pass the parameters &lt;code&gt;embedding_model&lt;/code&gt; and &lt;code&gt;chunk_size&lt;/code&gt; to this indexing pipeline function since we want to experiment with different indexing approaches.&lt;/p&gt;
&lt;p&gt;The indexing pipeline function is defined as follows:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.converters&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PyPDFToDocument&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.embedders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.preprocessors&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentCleaner&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentSplitter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.writers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentWriter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.types&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DuplicatePolicy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;indexing&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;chunk_size&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;int&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;files_path&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;datasets/ARAGOG/papers_for_questions&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;converter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PyPDFToDocument&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cleaner&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentCleaner&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;splitter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentSplitter&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;split_length&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;chunk_size&lt;/span&gt;))  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# default splitting by word&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;writer&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentWriter&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;policy&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;DuplicatePolicy&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;SKIP&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;converter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cleaner&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cleaner&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;splitter&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;splitter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;writer&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pdf_files&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;files_path&lt;/span&gt;&#43;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;/&#34;&lt;/span&gt;&#43;&lt;span style=&#34;color:#268bd2&#34;&gt;f_name&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;f_name&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;listdir&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;files_path&lt;/span&gt;)]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;converter&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sources&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;pdf_files&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;the-rag-pipeline&#34;&gt;The RAG Pipeline&lt;/h2&gt;
&lt;p&gt;We use a simple RAG pipeline composed of a retriever, a prompt builder, a language model, and an answer builder. First, we use the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;SentenceTransformersTextEmbedder&lt;/code&gt;&lt;/a&gt; to embed the query and an 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;InMemoryEmbeddingRetriever&lt;/code&gt;&lt;/a&gt; to retrieve the &lt;code&gt;top-k&lt;/code&gt; documents relevant to the query. We then rely on an LLM to generate an answer based on the context retrieved from the documents and the query question.&lt;/p&gt;
&lt;p&gt;We used the OpenAI API through the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/openaigenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;OpenAIGenerator&lt;/code&gt;&lt;/a&gt; with the &lt;code&gt;gpt-3.5-turbo&lt;/code&gt; model in our implementation. The 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/promptbuilder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;PromptBuilder&lt;/code&gt;&lt;/a&gt; is responsible for building the prompt to be fed to the LLM, using a template that includes the context and the question. Finally, the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/answerbuilder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;AnswerBuilder&lt;/code&gt;&lt;/a&gt; is responsible for extracting the answer from the LLM output and returning it. Learn more about creating a RAG pipeline in 📚 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/27_first_rag_pipeline&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Tutorial: Creating Your First QA Pipeline with Retrieval-Augmentation&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note that we instruct the LLM to explicitly answer &lt;code&gt;&#34;None&#34;&lt;/code&gt; when the context is empty. We do this to avoid the LLM answering the prompt with its own internal knowledge.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;After creating the pipeline, we wrap it with a function to easily initialize it with different parameters. We expect a &lt;code&gt;document_store&lt;/code&gt;, an &lt;code&gt;embedding_model&lt;/code&gt;, and the &lt;code&gt;top_k&lt;/code&gt; for this function.&lt;/p&gt;
&lt;p&gt;The RAG pipeline is defined as follows:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;AnswerBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.embedders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersTextEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        You have to answer the following question based on the given context information only.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        If the context is empty or just a &#39;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\\&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;n&#39; answer with None, example: &#34;None&#34;.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        Context:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        {&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or document in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;            {{ document.content }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        {&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        Question: {{question}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        Answer:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        &#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query_embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersTextEmbedder&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;progress_bar&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-3.5-turbo&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;AnswerBuilder&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query_embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever.query_embedding&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm.replies&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer_builder.replies&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm.meta&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer_builder.meta&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer_builder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;the-evaluation-pipeline&#34;&gt;The Evaluation Pipeline&lt;/h2&gt;
&lt;p&gt;We will also need an evaluation pipeline, which will be responsible for computing the scoring metrics to measure the performance of the RAG pipeline. You can learn how to build an evaluation pipeline in 📚 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/35_evaluating_rag_pipelines&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Tutorial: Evaluating RAG Pipelines&lt;/a&gt;. The evaluation pipeline will include three evaluators:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/contextrelevanceevaluator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ContextRelevanceEvaluator&lt;/a&gt; will assess the relevancy of the retrieved context to answer the query question&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/faithfulnessevaluator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FaithfulnessEvaluator&lt;/a&gt; evaluates whether the generated answer can be derived from the context&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/sasevaluator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SASEvaluator&lt;/a&gt; compares the embedding of a generated answer against a ground-truth answer based on a common embedding model.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This new function returns the evaluation results and the inputs used to run the evaluation. This data is useful for later analysis and understanding the pipeline’s performance in more detail and granularity. We need to pass the &lt;code&gt;questions&lt;/code&gt; and &lt;code&gt;answers&lt;/code&gt; from the dataset to the function, plus the data generated by the RAG pipeline, i.e., &lt;code&gt;retrieved_contexts&lt;/code&gt;, &lt;code&gt;predicted_answers&lt;/code&gt;, and the &lt;code&gt;embedding_model&lt;/code&gt; used for these results.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.evaluators&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ContextRelevanceEvaluator&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;FaithfulnessEvaluator&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;SASEvaluator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;evaluation&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;questions&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;answers&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retrieved_contexts&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;predicted_answers&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;context_relevance&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;ContextRelevanceEvaluator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;raise_on_failure&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;faithfulness&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;FaithfulnessEvaluator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;raise_on_failure&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sas&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;SASEvaluator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline_results&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;context_relevance&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;questions&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;questions&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;contexts&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;retrieved_contexts&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;faithfulness&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;questions&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;questions&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;contexts&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;retrieved_contexts&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;predicted_answers&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;predicted_answers&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sas&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;predicted_answers&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;predicted_answers&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ground_truth_answers&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;answers&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt; = {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;context_relevance&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline_results&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;context_relevance&#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;faithfulness&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline_results&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;faithfulness&#39;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sas&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline_results&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;sas&#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;		&lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt; = {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;				&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;questions&#39;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;sample_questions&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;contexts&#39;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;retrieved_contexts&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;true_answers&#39;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;sample_answers&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;predicted_answers&#39;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;predicted_answers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;putting-it-all-together&#34;&gt;Putting it all together&lt;/h2&gt;
&lt;p&gt;We now have the building blocks to evaluate the RAG pipeline: indexing the knowledge data, generating answers using a RAG architecture, and evaluating the results. However, we still need a method to run the questions over our RAG pipeline and collect all the needed results to perform an evaluation.
We will use a function that wraps up all the interactions with the RAG pipeline. It takes as parameters a &lt;code&gt;document_store&lt;/code&gt;, the &lt;code&gt;questions&lt;/code&gt;, an &lt;code&gt;embedding_model&lt;/code&gt; and the &lt;code&gt;top_k&lt;/code&gt; and returns the retrieved contexts and the predicted answers.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;run_rag&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;sample_questions&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    A function to run the basic rag model on a set of sample questions and answers
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    &#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;predicted_answers&lt;/span&gt; = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;retrieved_contexts&lt;/span&gt; = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;q&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;tqdm&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;sample_questions&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;try&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query_embedder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;q&lt;/span&gt;}, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;q&lt;/span&gt;}, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;q&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;predicted_answers&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;append&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer_builder&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answers&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;retrieved_contexts&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;append&lt;/span&gt;([&lt;span style=&#34;color:#268bd2&#34;&gt;d&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;d&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;answer_builder&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;answers&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;except&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;BadRequestError&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;e&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Error with question: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;q&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;e&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;predicted_answers&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;append&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;error&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;retrieved_contexts&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;append&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;retrieved_contexts&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;retrieved_contexts&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;predicted_answers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Notice that we wrap the call to the RAG pipeline in a try-except block to handle any errors that may occur during the pipeline’s execution. This might happen, for instance, if the prompt is too big—due to large contexts—for the model to generate an answer, if there are network errors, or simply if the model cannot generate an answer for any other reason.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You can decide if the LLM-based evaluators stop immediately if an error is found or if they ignore the evaluation for a particular sample and continue see, for instance in the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/contextrelevanceevaluator#overview&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ContextRelevanceEvaluator&lt;/a&gt;, the &lt;code&gt;raise_on_failure&lt;/code&gt;  parameter.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Finally, we need to run whole query questions through the pipeline over the dataset for each possible combination of the parameters &lt;code&gt;top_k&lt;/code&gt;, &lt;code&gt;embedding_model&lt;/code&gt;, and &lt;code&gt;chunk_size&lt;/code&gt;. That’s handled by the next function.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note that for indexing, we only vary the &lt;code&gt;embedding_model&lt;/code&gt; and &lt;code&gt;chunk_size&lt;/code&gt;, as the &lt;code&gt;top_k&lt;/code&gt; parameter does not affect the indexing.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;parameter_tuning&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;out_path&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;base_path&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;../datasets/ARAGOG/&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;open&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;base_path&lt;/span&gt; &#43; &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;eval_questions.json&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;r&#34;&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;f&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;json&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;load&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;f&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;questions&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;questions&#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;answers&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ground_truths&#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_models&lt;/span&gt; = {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sentence-transformers/all-MiniLM-L6-v2&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sentence-transformers/msmarco-distilroberta-base-v2&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sentence-transformers/all-mpnet-base-v2&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;top_k_values&lt;/span&gt; = [&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;chunk_sizes&lt;/span&gt; = [&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;64&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;128&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;256&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# create results directory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;out_path&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;out_path&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;out_path&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;mkdir&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;exist_ok&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_models&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;chunk_size&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;chunk_sizes&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Indexing documents with &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; model with a chunk_size=&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;chunk_size&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;doc_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;indexing&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;chunk_size&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;top_k_values&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2&#34;&gt;name_params&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;split&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;/&#39;&lt;/span&gt;)[-&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;]&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;__top_k:&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;__chunk_size:&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;chunk_size&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name_params&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Running RAG pipeline&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2&#34;&gt;retrieved_contexts&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;predicted_answers&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;run_rag&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;doc_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;questions&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Running evaluation&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;evaluation&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;questions&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;answers&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retrieved_contexts&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;predicted_answers&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2&#34;&gt;eval_results&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;EvaluationRunResult&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;run_name&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;name_params&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2&#34;&gt;eval_results&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;score_report&lt;/span&gt;().&lt;span style=&#34;color:#268bd2&#34;&gt;to_csv&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;out_path&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;/score_report_&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;name_params&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;.csv&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;index&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2&#34;&gt;eval_results&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;to_pandas&lt;/span&gt;().&lt;span style=&#34;color:#268bd2&#34;&gt;to_csv&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;out_path&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;/detailed_&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;name_params&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;.csv&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;index&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This function will store the results in a directory specified by the &lt;code&gt;out_path&lt;/code&gt; parameter. The results will be stored in &lt;code&gt;.csv&lt;/code&gt; files. For each parameter combination, there will be two files generated, one with the aggregated score report overall questions (e.g.:
&lt;code&gt;score_report_all-MiniLM-L6-v2__top_k:3__chunk_size:128.csv&lt;/code&gt;) and another with the detailed results for each question (e.g.: &lt;code&gt;detailed_all-MiniLM-L6-v2__top_k:3__chunk_size:128.csv&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Note that we make use of the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/reference/evaluation-api#evaluationrunresult&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;EvaluationRunResult&lt;/a&gt; to store the results and generate the score report and the detailed results in the &lt;code&gt;.csv&lt;/code&gt; files.&lt;/p&gt;
&lt;p&gt;In the next section, we will show the evaluation results and discuss the insights gained from the experiment.&lt;/p&gt;
&lt;h2 id=&#34;results-analysis&#34;&gt;Results Analysis&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;You can run 
&lt;a href=&#34;https://github.com/deepset-ai/haystack-evaluation/blob/main/evaluations/analyze_aragog_parameter_search.ipynb&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this notebook&lt;/a&gt; to visualize and analyze the results. All relevant &lt;code&gt;.csv&lt;/code&gt; files can be found in the 
&lt;a href=&#34;https://github.com/deepset-ai/haystack-evaluation/tree/main/evaluations/results/aragog_parameter_search_2024_06_12&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;aragog_parameter_search_2024_06_12 folder&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To make the analysis of the results easier, we will load all the aggregated score reports from the different parameter combinations from multiple &lt;code&gt;.csv&lt;/code&gt; files into a single DataFrame. For that, we use the following code to parse the file content:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;re&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;pandas&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;pd&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;parse_results&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;f_name&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;pattern&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;r&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;score_report_(.*?)__top_k:(&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\\&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;d&#43;)__chunk_size:(&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\\&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;d&#43;)&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\\&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;.csv&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;match&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;re&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;search&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;pattern&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;f_name&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;match&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;embeddings_model&lt;/span&gt; = &lt;span style=&#34;color:#859900&#34;&gt;match&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;group&lt;/span&gt;(&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;int&lt;/span&gt;(&lt;span style=&#34;color:#859900&#34;&gt;match&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;group&lt;/span&gt;(&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;chunk_size&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;int&lt;/span&gt;(&lt;span style=&#34;color:#859900&#34;&gt;match&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;group&lt;/span&gt;(&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;embeddings_model&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;chunk_size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;No match found&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;read_scores&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;path&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;all_scores&lt;/span&gt; = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;root&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;dirs&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;files&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;walk&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;path&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;f_name&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;files&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;f_name&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;startswith&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;score_report&#34;&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#859900&#34;&gt;continue&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;embeddings_model&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;chunk_size&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;parse_results&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;f_name&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;df&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;pd&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;read_csv&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;path&lt;/span&gt;&#43;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;/&#34;&lt;/span&gt;&#43;&lt;span style=&#34;color:#268bd2&#34;&gt;f_name&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;df&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;rename&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;columns&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Unnamed: 0&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;metric&#39;&lt;/span&gt;}, &lt;span style=&#34;color:#268bd2&#34;&gt;inplace&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;df_transposed&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;df&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;T&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;df_transposed&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;columns&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;df_transposed&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;iloc&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;df_transposed&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;df_transposed&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;:]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Add new columns&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;df_transposed&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;embeddings&#39;&lt;/span&gt;] = &lt;span style=&#34;color:#268bd2&#34;&gt;embeddings_model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;df_transposed&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;top_k&#39;&lt;/span&gt;] = &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;df_transposed&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;chunk_size&#39;&lt;/span&gt;] = &lt;span style=&#34;color:#268bd2&#34;&gt;chunk_size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;all_scores&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;append&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;df_transposed&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;df&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;pd&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;concat&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;all_scores&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;df&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;reset_index&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;drop&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;inplace&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;df&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;rename_axis&lt;/span&gt;(&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;axis&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;inplace&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;df&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can then read the scores from the CSV files and analyze the results.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;df&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;read_scores&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;aragog_results/&#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can now analyze the results in a single table:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;context_relevance&lt;/th&gt;
          &lt;th&gt;faithfulness&lt;/th&gt;
          &lt;th&gt;sas&lt;/th&gt;
          &lt;th&gt;embeddings&lt;/th&gt;
          &lt;th&gt;top_k&lt;/th&gt;
          &lt;th&gt;chunk_size&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;0.834891&lt;/td&gt;
          &lt;td&gt;0.738318&lt;/td&gt;
          &lt;td&gt;0.524882&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.869485&lt;/td&gt;
          &lt;td&gt;0.895639&lt;/td&gt;
          &lt;td&gt;0.633806&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.933489&lt;/td&gt;
          &lt;td&gt;0.948598&lt;/td&gt;
          &lt;td&gt;0.65133&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.843447&lt;/td&gt;
          &lt;td&gt;0.831776&lt;/td&gt;
          &lt;td&gt;0.555873&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.912355&lt;/td&gt;
          &lt;td&gt;NaN&lt;/td&gt;
          &lt;td&gt;0.661135&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.94463&lt;/td&gt;
          &lt;td&gt;0.928349&lt;/td&gt;
          &lt;td&gt;0.659311&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.912991&lt;/td&gt;
          &lt;td&gt;0.827103&lt;/td&gt;
          &lt;td&gt;0.574832&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.951702&lt;/td&gt;
          &lt;td&gt;0.925456&lt;/td&gt;
          &lt;td&gt;0.642837&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.909638&lt;/td&gt;
          &lt;td&gt;0.932243&lt;/td&gt;
          &lt;td&gt;0.676347&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.791589&lt;/td&gt;
          &lt;td&gt;0.67757&lt;/td&gt;
          &lt;td&gt;0.480863&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.82648&lt;/td&gt;
          &lt;td&gt;0.866044&lt;/td&gt;
          &lt;td&gt;0.584507&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.901218&lt;/td&gt;
          &lt;td&gt;0.890654&lt;/td&gt;
          &lt;td&gt;0.611468&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.897715&lt;/td&gt;
          &lt;td&gt;0.845794&lt;/td&gt;
          &lt;td&gt;0.538579&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.916422&lt;/td&gt;
          &lt;td&gt;0.892523&lt;/td&gt;
          &lt;td&gt;0.609728&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.948038&lt;/td&gt;
          &lt;td&gt;NaN&lt;/td&gt;
          &lt;td&gt;0.643175&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.867887&lt;/td&gt;
          &lt;td&gt;0.834112&lt;/td&gt;
          &lt;td&gt;0.560079&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.946651&lt;/td&gt;
          &lt;td&gt;0.88785&lt;/td&gt;
          &lt;td&gt;0.639072&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.941952&lt;/td&gt;
          &lt;td&gt;0.91472&lt;/td&gt;
          &lt;td&gt;0.645992&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.909813&lt;/td&gt;
          &lt;td&gt;0.738318&lt;/td&gt;
          &lt;td&gt;0.530884&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.88004&lt;/td&gt;
          &lt;td&gt;0.929907&lt;/td&gt;
          &lt;td&gt;0.600428&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.918135&lt;/td&gt;
          &lt;td&gt;0.934579&lt;/td&gt;
          &lt;td&gt;0.67328&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.885314&lt;/td&gt;
          &lt;td&gt;0.869159&lt;/td&gt;
          &lt;td&gt;0.587424&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.953649&lt;/td&gt;
          &lt;td&gt;0.919003&lt;/td&gt;
          &lt;td&gt;0.664224&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.945016&lt;/td&gt;
          &lt;td&gt;0.936916&lt;/td&gt;
          &lt;td&gt;0.68591&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.949844&lt;/td&gt;
          &lt;td&gt;0.866822&lt;/td&gt;
          &lt;td&gt;0.613355&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.952544&lt;/td&gt;
          &lt;td&gt;0.893769&lt;/td&gt;
          &lt;td&gt;0.662694&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.964182&lt;/td&gt;
          &lt;td&gt;0.943925&lt;/td&gt;
          &lt;td&gt;0.62854&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;blockquote&gt;
&lt;p&gt;We can see some NaN values for the faithfullness scores which is based on an LLM-based evaluator. This was due to network errors when calling the OpenAI API.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Let’s now see which parameter configuration yielded the &lt;strong&gt;best Semantic Similarity Answer&lt;/strong&gt; score&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;df&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;sort_values&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;by&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;sas&#39;&lt;/span&gt;], &lt;span style=&#34;color:#268bd2&#34;&gt;ascending&lt;/span&gt;=[&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;context_relevance&lt;/th&gt;
          &lt;th&gt;faithfulness&lt;/th&gt;
          &lt;th&gt;sas&lt;/th&gt;
          &lt;th&gt;embeddings&lt;/th&gt;
          &lt;th&gt;top_k&lt;/th&gt;
          &lt;th&gt;chunk_size&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;0.945016&lt;/td&gt;
          &lt;td&gt;0.936916&lt;/td&gt;
          &lt;td&gt;0.68591&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.909638&lt;/td&gt;
          &lt;td&gt;0.932243&lt;/td&gt;
          &lt;td&gt;0.676347&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.918135&lt;/td&gt;
          &lt;td&gt;0.934579&lt;/td&gt;
          &lt;td&gt;0.67328&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.953649&lt;/td&gt;
          &lt;td&gt;0.919003&lt;/td&gt;
          &lt;td&gt;0.664224&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.952544&lt;/td&gt;
          &lt;td&gt;0.893769&lt;/td&gt;
          &lt;td&gt;0.662694&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.912355&lt;/td&gt;
          &lt;td&gt;NaN&lt;/td&gt;
          &lt;td&gt;0.661135&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.94463&lt;/td&gt;
          &lt;td&gt;0.928349&lt;/td&gt;
          &lt;td&gt;0.659311&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.933489&lt;/td&gt;
          &lt;td&gt;0.948598&lt;/td&gt;
          &lt;td&gt;0.65133&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.941952&lt;/td&gt;
          &lt;td&gt;0.91472&lt;/td&gt;
          &lt;td&gt;0.645992&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.948038&lt;/td&gt;
          &lt;td&gt;NaN&lt;/td&gt;
          &lt;td&gt;0.643175&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.951702&lt;/td&gt;
          &lt;td&gt;0.925456&lt;/td&gt;
          &lt;td&gt;0.642837&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.946651&lt;/td&gt;
          &lt;td&gt;0.88785&lt;/td&gt;
          &lt;td&gt;0.639072&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.869485&lt;/td&gt;
          &lt;td&gt;0.895639&lt;/td&gt;
          &lt;td&gt;0.633806&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.964182&lt;/td&gt;
          &lt;td&gt;0.943925&lt;/td&gt;
          &lt;td&gt;0.62854&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.949844&lt;/td&gt;
          &lt;td&gt;0.866822&lt;/td&gt;
          &lt;td&gt;0.613355&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.901218&lt;/td&gt;
          &lt;td&gt;0.890654&lt;/td&gt;
          &lt;td&gt;0.611468&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;3&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.916422&lt;/td&gt;
          &lt;td&gt;0.892523&lt;/td&gt;
          &lt;td&gt;0.609728&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.88004&lt;/td&gt;
          &lt;td&gt;0.929907&lt;/td&gt;
          &lt;td&gt;0.600428&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.885314&lt;/td&gt;
          &lt;td&gt;0.869159&lt;/td&gt;
          &lt;td&gt;0.587424&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.82648&lt;/td&gt;
          &lt;td&gt;0.866044&lt;/td&gt;
          &lt;td&gt;0.584507&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;2&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.912991&lt;/td&gt;
          &lt;td&gt;0.827103&lt;/td&gt;
          &lt;td&gt;0.574832&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.867887&lt;/td&gt;
          &lt;td&gt;0.834112&lt;/td&gt;
          &lt;td&gt;0.560079&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;256&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.843447&lt;/td&gt;
          &lt;td&gt;0.831776&lt;/td&gt;
          &lt;td&gt;0.555873&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.897715&lt;/td&gt;
          &lt;td&gt;0.845794&lt;/td&gt;
          &lt;td&gt;0.538579&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;128&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.909813&lt;/td&gt;
          &lt;td&gt;0.738318&lt;/td&gt;
          &lt;td&gt;0.530884&lt;/td&gt;
          &lt;td&gt;msmarco-distilroberta-base-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.834891&lt;/td&gt;
          &lt;td&gt;0.738318&lt;/td&gt;
          &lt;td&gt;0.524882&lt;/td&gt;
          &lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;0.791589&lt;/td&gt;
          &lt;td&gt;0.67757&lt;/td&gt;
          &lt;td&gt;0.480863&lt;/td&gt;
          &lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
          &lt;td&gt;1&lt;/td&gt;
          &lt;td&gt;64&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Focusing on the &lt;strong&gt;Semantic Answer Similarity&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;msmarco-distilroberta-base-v2&lt;/code&gt; embeddings model with a &lt;code&gt;top_k=3&lt;/code&gt; and a &lt;code&gt;chunk_size=128&lt;/code&gt; yields the best results.&lt;/li&gt;
&lt;li&gt;In this evaluation, retrieving documents with &lt;code&gt;top_k=3&lt;/code&gt; will most usually yield a higher semantic similarity score than with &lt;code&gt;top_k=1&lt;/code&gt; or &lt;code&gt;top_k=2&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;It also seems that regardless of the &lt;code&gt;top_k&lt;/code&gt; and &lt;code&gt;chunk_size&lt;/code&gt; the best semantic similarity scores come from using the embedding model &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; and the &lt;code&gt;msmarco-distilroberta-base-v2&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let’s inspect how the scores of each embedding model compare with each other in terms of &lt;strong&gt;Semantic Answer Similarity&lt;/strong&gt;. For that, we will group the results by the embeddings column and plot the scores using box plots&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;matplotlib&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;pyplot&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;plt&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;fig&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;ax&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;plt&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;subplots&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;figsize&lt;/span&gt;=(&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;10&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;6&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;df&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;boxplot&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;column&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;sas&#39;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;by&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;embeddings&#39;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;ax&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;ax&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;plt&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;xlabel&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Embeddings Model&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;plt&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;ylabel&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Semantic Answer Similarity Values&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;plt&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;title&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Boxplots of Semantic Answer Similarity Values Aggregated by Embeddings&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;plt&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;show&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/benchmarking-haystack-pipelines/boxplot.png alt=&#34;Box-plot displaying the Semantic Answer Similarity Values Aggregated by Embeddings&#34; class=&#34;medium&#34; /&gt;
&lt;/p&gt;
&lt;p&gt;The box-plots above show that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; and the &lt;code&gt;msmarco-distilroberta-base-v2&lt;/code&gt; embedding models outperform the &lt;code&gt;all-mpnet-base-v2&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;msmarco-distilroberta-base-v2&lt;/code&gt; scores have less variance, indicating that this model is more stable to &lt;code&gt;top_k&lt;/code&gt; and &lt;code&gt;chunk_size&lt;/code&gt; parameter variations than the other models&lt;/li&gt;
&lt;li&gt;All three embedding models have an outlier corresponding to the highest-scoring and lowest-scoring parameter combination&lt;/li&gt;
&lt;li&gt;Not surprisingly, all the lowest scores outliers correspond to &lt;code&gt;top_k=1&lt;/code&gt; and &lt;code&gt;chunk_size=64&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The highest scores outliers correspond to &lt;code&gt;top_k=3&lt;/code&gt;  and a &lt;code&gt;chunk_size&lt;/code&gt; of &lt;code&gt;128&lt;/code&gt; or &lt;code&gt;256&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since we have the ground truth answers, we focuses on the &lt;strong&gt;Semantic Similarity Answer&lt;/strong&gt;, but let’s also look at the &lt;strong&gt;Faithfulness&lt;/strong&gt; and &lt;strong&gt;Context Relevance&lt;/strong&gt; scores for a few examples. For that, we will need to load the detailed scores:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;detailed_best_sas_df&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;pd&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;read_csv&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;results/aragog_results/detailed_all-MiniLM-L6-v2__top_k:3__chunk_size:128.csv&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;inspect&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;idx&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Question: &#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;detailed_best_sas_df&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;loc&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;idx&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;questions&#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;True Answer:&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;detailed_best_sas_df&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;loc&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;idx&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;true_answers&#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Generated Answer:&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;detailed_best_sas_df&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;loc&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;idx&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;predicted_answers&#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Context Relevance  : &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;detailed_best_sas_df&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;loc&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;idx&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;context_relevance&#39;&lt;/span&gt;]&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Faithfulness       : &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;detailed_best_sas_df&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;loc&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;idx&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;faithfulness&#39;&lt;/span&gt;]&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Semantic Similarity: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;detailed_best_sas_df&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;loc&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;idx&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;sas&#39;&lt;/span&gt;]&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s look at the query question 6:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;inspect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;Question: 
How does BERT&#39;s performance on the GLUE benchmark compare to previous state-of-the-art models?

True Answer:
BERT achieved new state-of-the-art on the GLUE benchmark (80.5%), surpassing the previous best models.

Generated Answer:
BERT&#39;s performance on the GLUE benchmark significantly outperforms previous state-of-the-art models, achieving 4.5% and 7.0% respective average accuracy improvement over the prior state of the art.

Context Relevance  : 1.0
Faithfulness       : 1.0
Semantic Similarity: 0.9051246047019958

Contexts:
recent work in this area.
Since its release, GLUE has been used as a testbed and showcase by the developers of several
inﬂuential models, including GPT (Radford et al., 2018) and BERT (Devlin et al., 2019). As shown
in Figure 1, progress on GLUE since its release has been striking. On GLUE, GPT and BERT
achieved scores of 72.8 and 80.2 respectively, relative to 66.5 for an ELMo-based model (Peters
et al., 2018) and 63.7 for the strongest baseline with no multitask learning or pretraining above the
word level. Recent models (Liu et al., 2019d; Yang et al., 2019) have clearly surpassed estimates of
non-expert human performance on GLUE (Nangia and Bowman, 2019). The success of these models
on GLUE has been driven by ever-increasing model capacity, compute power, and data quantity, as
well as innovations in 
---------
56.0 75.1
BERT BASE 84.6/83.4 71.2 90.5 93.5 52.1 85.8 88.9 66.4 79.6
BERT LARGE 86.7/85.9 72.1 92.7 94.9 60.5 86.5 89.3 70.1 82.1
Table 1: GLUE Test results, scored by the evaluation server ( https://gluebenchmark.com/leaderboard ).
The number below each task denotes the number of training examples. The “Average” column is slightly different
than the ofﬁcial GLUE score, since we exclude the problematic WNLI set.8BERT and OpenAI GPT are single-
model, single task. F1 scores are reported for QQP and MRPC, Spearman correlations are reported for STS-B, and
accuracy scores are reported for the other tasks. We exclude entries that use BERT as one of their components.
We use a batch size of 32 and ﬁne-tune for 3
epochs over the data for all GLUE tasks. For each
task, we selected the best ﬁne-tuning learning rate
(among 5e-5, 
---------
4e-5, 3e-5, and 2e-5) on the Dev set.
Additionally, for BERT LARGE we found that ﬁne-
tuning was sometimes unstable on small datasets,
so we ran several random restarts and selected the
best model on the Dev set. With random restarts,
we use the same pre-trained checkpoint but per-
form different ﬁne-tuning data shufﬂing and clas-
siﬁer layer initialization.9
Results are presented in Table 1. Both
BERT BASE and BERT LARGE outperform all sys-
tems on all tasks by a substantial margin, obtaining
4.5% and 7.0% respective average accuracy im-
provement over the prior state of the art. Note that
BERT BASE and OpenAI GPT are nearly identical
in terms of model architecture apart from the at-
tention masking. For the largest and most widely
reported GLUE task, MNLI, BERT obtains a 4.6%
absolute accuracy improvement. On the ofﬁcial
GLUE leaderboard10, BERT LARGE obtains a score
of 
---------
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In this example, the context relevancy and faithfulness scores are both 1.0. This indicates that the context is relevant to the question and our RAG LLM used this context to generate a semantically similar answer to the correct (ground-truth) answer.&lt;/p&gt;
&lt;p&gt;Let’s take a look at another example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;inspect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;44&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;Question: 
How should future language model benchmarks be structured to ensure a holistic assessment of models&#39; capabilities and knowledge breadth?

True Answer:
Future benchmarks should integrate a broader spectrum of subjects and cognitive skills, emphasizing the inclusion of tasks that test models&#39; ethical reasoning, understanding of human values, and ability to perform complex problem-solving, beyond the mere scale of data and parameters.

Generated Answer:
Future language model benchmarks should be structured to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings across a diverse set of subjects that humans learn. The benchmark should cover a wide range of subjects across STEM, humanities, social sciences, and more, ranging in difficulty from elementary to advanced professional levels. It should test both world knowledge and problem-solving ability, ensuring a holistic assessment of models&#39; capabilities and knowledge breadth.

Context Relevance  : 0.6
Faithfulness       : 1.0
Semantic Similarity: 0.6483339071273804

Contexts:
learning model
usage should be developed for guiding users to learn ‘Dos’
and Dont’ in AI. Detailed policies could also be proposed
to list all user’s responsibilities before the model access.
C. Language Models Beyond ChatGPT
The examination of ethical implications associated with
language models necessitates a comprehensive examina-
tion of the broader challenges that arise within the domainof language models, in light of recent advancements in
the field of artificial intelligence. The last decade has seen
a rapid evolution of AI techniques, characterized by an
exponential increase in the size and complexity of AI
models, and a concomitant scale-up of model parameters.
The scaling laws that govern the development of language
models,asdocumentedinrecentliterature[84,85],suggest
thatwecanexpecttoencounterevenmoreexpansivemod-
els that incorporate multiple modalities in the near future.
Efforts to integrate multiple modalities into a single model
are driven by the ultimate goal of realizing the concept of
foundation models [86]. 
---------
language models are
at learning and applying knowledge from many domains.
To bridge the gap between the wide-ranging knowledge that models see during pretraining and the
existing measures of success, we introduce a new benchmark for assessing models across a diverse
set of subjects that humans learn. We design the benchmark to measure knowledge acquired during
pretraining by evaluating models exclusively in zero-shot and few-shot settings. This makes the
benchmark more challenging and more similar to how we evaluate humans. The benchmark covers
57subjects across STEM, the humanities, the social sciences, and more. It ranges in difﬁculty from
an elementary level to an advanced professional level, and it tests both world knowledge and problem
solving ability. Subjects range from traditional areas, such as mathematics and history, to more
1arXiv:2009.03300v3 [cs.CY] 12 Jan 2021Published as a conference paper at 
---------
a
lack of access to the benefits of these models for people
who speak different languages and can lead to biased or
unfairpredictionsaboutthosegroups[14,15].Toovercome
this, it is crucial to ensure that the training data contains
a substantial proportion of diverse, high-quality corpora
from various languages and cultures.
b) Robustness: Another major ethical consideration
in the design and implementation of language models is
their robustness. Robustness refers to a model’s ability
to maintain its performance when given input that is
semantically or syntactically different from the input it
was trained on.
Semantic Perturbation: Semantic perturbation is a type
of input that can cause a language model to fail [40, 41].
This input has different syntax but is semantically similar
to the input used for training the model. To address this,
it is crucial to ensure that the training data is diverse and
representative of the population it will 
---------
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It seems that for this question, the content is not completely relevant (Context Relevance = 0.6) and only the second context was used to generate the answer.&lt;/p&gt;
&lt;h2 id=&#34;running-your-own-experiments&#34;&gt;Running your own experiments&lt;/h2&gt;
&lt;p&gt;If you want to run this experiment yourself, follow the Python code 
&lt;a href=&#34;https://github.com/deepset-ai/haystack-evaluation/blob/main/evaluations/evaluation_aragog.py&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;evaluation_aragog.py&lt;/code&gt;&lt;/a&gt; in the 
&lt;a href=&#34;https://github.com/deepset-ai/haystack-evaluation&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;haystack-evaluation&lt;/a&gt; repository.&lt;/p&gt;
&lt;p&gt;Start by cloning the repository&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;git clone https://github.com/deepset-ai/haystack-evaluation
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;cd&lt;/span&gt; haystack-evaluation
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;cd&lt;/span&gt; evaluations
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, run the Python script:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;usage: evaluation_aragog.py [-h] --output_dir OUTPUT_DIR [--sample SAMPLE]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can specify the output directory to hold the results and the sample size, i.e.: how many questions to use for the evaluation.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Don’t forget to define your Open AI API key in the environmental variable &lt;code&gt;OPENAI_API_KEY&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OPENAI_API_KEY&lt;/span&gt;=&lt;your_key&gt; python evaluation_aragog.py --output-dir experiment_a --sample &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;execution-time-and-costs&#34;&gt;Execution Time and Costs&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;NOTE: all the numbers reported were run on an Mac Book Pro Apple M3 Pro with 36GB of RAM with Haystack 2.2.1 and Python 3.9&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id=&#34;indexing&#34;&gt;Indexing&lt;/h3&gt;
&lt;p&gt;The Indexing pipeline needs to consider the parameter combinations defined below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;3 different values for &lt;code&gt;embedding_model&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;3 different &lt;code&gt;chunk_size&lt;/code&gt;  values&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, the index &lt;strong&gt;runs 9 times in total.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id=&#34;rag-pipeline&#34;&gt;RAG Pipeline&lt;/h3&gt;
&lt;p&gt;The RAG pipeline needs to run 27 times, since the following parameters affect the retrieval process:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;3 different values for &lt;code&gt;embedding_model&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;3 different &lt;code&gt;top_k&lt;/code&gt; values&lt;/li&gt;
&lt;li&gt;3 different &lt;code&gt;chunk_size&lt;/code&gt;  values&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This needs to run for each of the 107 questions, so in total, the &lt;strong&gt;RAG pipeline will run 2.889 times&lt;/strong&gt; (3 x 3 x 3 x 107) and produce &lt;strong&gt;2889 calls to OpenAI API&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id=&#34;evaluation-pipeline&#34;&gt;Evaluation Pipeline&lt;/h3&gt;
&lt;p&gt;The Evaluation pipeline also runs 27 times since all parameter combinations need to be evaluated for each of the 107 questions. Note, however, that the Evaluation pipeline contains two Evaluators that rely on an LLM through OpenAI API, so this pipeline &lt;strong&gt;runs 2.889 times&lt;/strong&gt;. However, due to the Faithfulness and ContextRelevance evaluators, it will produce &lt;strong&gt;5.778 (2 x 2.889) calls to OpenAI API&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;You can see the detailed running times for each parameter combination in the 
&lt;a href=&#34;https://docs.google.com/spreadsheets/d/1LTogSuZuzCVNDGBl7Jk5XjmaPYnBSWumaiOwn0WCOfc/edit?usp=sharing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Benchmark Times Spreadsheet&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;pricing&#34;&gt;Pricing&lt;/h3&gt;
&lt;p&gt;For detailed pricing information, visit 
&lt;a href=&#34;https://openai.com/api/pricing/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenAI Pricing&lt;/a&gt; 💸&lt;/p&gt;
&lt;h2 id=&#34;lessons-learned&#34;&gt;Lessons Learned&lt;/h2&gt;
&lt;p&gt;In this article, we have shown how to use the Haystack 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/evaluators&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Evaluators&lt;/a&gt; to find the best combination of parameters that yield the best performance of our RAG pipeline, as opposed to using only the default parameters.&lt;/p&gt;
&lt;p&gt;For this ARAGOG dataset, in particular, the best performance is achieved using the &lt;code&gt;msmarco-distilroberta-base-v2&lt;/code&gt; embeddings model instead of the default model (&lt;code&gt;sentence-transformers/all-mpnet-base-v2&lt;/code&gt;), together with a &lt;code&gt;top_k=3&lt;/code&gt; and a &lt;code&gt;chunk_size=128&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A few learnings are important to take:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When using an LLM through an external API, it is important to &lt;strong&gt;account for potential network errors or other issues&lt;/strong&gt;. Ensure that during your experiments, running the questions through the RAG pipeline or evaluating the results doesn’t crash due to an error, for instance, by wrapping the call within a &lt;code&gt;try/except&lt;/code&gt; code block.&lt;/li&gt;
&lt;li&gt;Before starting your experiments, &lt;strong&gt;estimate the costs and time involved&lt;/strong&gt;. If you plan to use an external LLM through an API, calculate approximately how many API calls you will need to run queries through your RAG pipeline and evaluate the results if you use LLM-based evaluators. This will help you understand the total costs and time required for your experiments.&lt;/li&gt;
&lt;li&gt;Depending on your dataset size and running time, &lt;strong&gt;Python notebooks might not be the best approach to run your experiments&lt;/strong&gt;; a Python script is probably a more reliable solution.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Beware of which parameters affect which components&lt;/strong&gt;. For instance, for indexing, only the &lt;code&gt;embedding_model&lt;/code&gt; and the &lt;code&gt;chunk_size&lt;/code&gt; are important - this can reduce the number of experiments you need to carry out.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Explore a variety of evaluation examples tailored to different use cases and datasets by visiting the 
&lt;a href=&#34;https://github.com/deepset-ai/haystack-evaluation&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;haystack-evaluation&lt;/a&gt; repository on GitHub.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>RAG Evaluation with Prometheus 2</title>
      <link>https://haystack.deepset.ai/blog/rag-evaluation-with-prometheus-2/</link>
      <pubDate>Mon, 17 Jun 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/rag-evaluation-with-prometheus-2/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;When building real-world applications based on Language Models (such as RAG), evaluation plays an important role. Recently, evaluating generated answers using powerful proprietary Language Models (such as GPT-4) has become popular and correlates well with human judgment, but it comes with its own limitations and challenges.&lt;/p&gt;
&lt;p&gt;Prometheus 2 is a newly released family of open-source models specifically trained to evaluate the output of other Language Models. In this article (and in the related notebook), we will see how to use Prometheus and we will experiment with it to evaluate the generated responses of a RAG Pipeline using Haystack.&lt;/p&gt;
&lt;h2 id=&#34;language-models-as-evaluators&#34;&gt;Language Models as Evaluators&lt;/h2&gt;
&lt;p&gt;With the rise of Language Models (LMs) demonstrating strong general capabilities across diverse tasks, evaluating answers generated by these models using other generative LMs has become a common and effective approach. Compared to statistical-based evaluation, this technique is convenient as it usually does not require ground truth labels.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Proprietary models&lt;/strong&gt; such as GPT-4 or Claude 3 Opus are frequently chosen for evaluation and have shown a good correlation with human judgment. However, relying on closed models has several limitations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;data privacy: your data exits your machine and is transmitted to the model provider&lt;/li&gt;
&lt;li&gt;transparency: the training data of these models is unknown&lt;/li&gt;
&lt;li&gt;controllability: as these models are accessed via APIs, their behavior can change over time&lt;/li&gt;
&lt;li&gt;price: despite a constant drop in prices, these large models remain expensive. Additionally, the evaluation process typically involves several cycles of testing and refinement, which can significantly increase the overall expense.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On the other hand, using &lt;strong&gt;open models&lt;/strong&gt; for evaluation is an active research area, but their practical use is often limited. They typically do not correlate well with human judgments and lack flexibility (for more details, see the 
&lt;a href=&#34;https://arxiv.org/abs/2405.01535&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Prometheus 2 paper&lt;/a&gt;).&lt;/p&gt;
&lt;h2 id=&#34;-prometheus-2-a-strong-open-source-model-for-evaluation&#34;&gt;🔥 Prometheus 2: a strong open-source model for evaluation&lt;/h2&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/rag-evaluation-with-prometheus-2/prometheus.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://arxiv.org/abs/2405.01535&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Prometheus 2&lt;/a&gt; is a new family of open-source models designed to bridge the gap between proprietary models and open LMs for evaluation.&lt;/p&gt;
&lt;p&gt;The authors unified two different evaluation paradigms: direct assessment (evaluating the quality of an answer generated by a single model based on a specific criterion) and pairwise ranking (choosing the best answer between two, usually produced by different models).&lt;/p&gt;
&lt;p&gt;In particular, for each variant, they started from a MistralAI base model, they trained 2 different models (one for each of the mentioned tasks) on open-source datasets and then merged their weights to create a robust evaluator Language Model.&lt;/p&gt;
&lt;p&gt;The results are impressive:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;two variants: 
&lt;a href=&#34;https://huggingface.co/prometheus-eval/prometheus-7b-v2.0&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;7B&lt;/a&gt; and 
&lt;a href=&#34;https://huggingface.co/prometheus-eval/prometheus-8x7b-v2.0&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;8x7B&lt;/a&gt;, fine-tuned from Mistral-7B and Mixtral8x7B, respectively&lt;/li&gt;
&lt;li&gt;high correlation with human evaluations and proprietary models&lt;/li&gt;
&lt;li&gt;the models are highly flexible: capable of performing direct assessments and pairwise rankings, and allowing custom evaluation criteria definition&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;prompting-prometheus-2&#34;&gt;Prompting Prometheus 2&lt;/h2&gt;
&lt;h3 id=&#34;the-prompt-template&#34;&gt;The prompt template&lt;/h3&gt;
&lt;p&gt;Prometheus 2 models are generative language models trained to perform evaluation. To achieve the best results with these models, we need to follow a precise yet customizable prompt structure. You can find the prompt templates in the paper and 
&lt;a href=&#34;https://github.com/prometheus-eval/prometheus-eval/blob/3042bfc6a263a98cc31a7aa4e704e3784e0c1b8b/libs/prometheus-eval/prometheus_eval/prompts.py&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Since we want to experiment with Prometheus 2 to evaluate a single RAG system, we are primarily interested in the 
&lt;a href=&#34;https://github.com/prometheus-eval/prometheus-eval/blob/3042bfc6a263a98cc31a7aa4e704e3784e0c1b8b/libs/prometheus-eval/prometheus_eval/prompts.py&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Direct Assessment prompt template&lt;/a&gt;, which allows evaluating the quality of an answer based on specific criteria. The following template includes a reference answer; the provided link also contains a version without it.&lt;/p&gt;
&lt;p&gt;Let’s take a look at it.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-markdown&#34; data-lang=&#34;markdown&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;You are a fair judge assistant tasked with providing clear, objective feedback 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;based on specific criteria, ensuring each assessment reflects the absolute 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;standards set for performance.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d33682&#34;&gt;###Task Description:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;An instruction (might include an Input inside it), a response to evaluate, a 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;reference answer that gets a score of 5, and a score rubric representing a 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;evaluation criteria are given.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;1.&lt;/span&gt; Write a detailed feedback that assess the quality of the response strictly 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;based on the given score rubric, not evaluating in general.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;2.&lt;/span&gt; After writing a feedback, write a score that is an integer between 1 and 5. 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;You should refer to the score rubric.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;3.&lt;/span&gt; The output format should look as follows: \\&#34;Feedback: (write a feedback for 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;criteria) [RESULT] (an integer number between 1 and 5)\\&#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;4.&lt;/span&gt; Please do not generate any other opening, closing, and explanations.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d33682&#34;&gt;###The instruction to evaluate:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{instruction}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d33682&#34;&gt;###Response to evaluate:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{response}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d33682&#34;&gt;###Reference Answer (Score 5):
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{reference_answer}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d33682&#34;&gt;###Score Rubrics:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{score_rubric}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d33682&#34;&gt;###Feedback:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this prompt template, the only parts to be customized are those enclosed in curly brackets.&lt;/p&gt;
&lt;p&gt;We should provide:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the instruction to use for evaluation, which may include an input (e.g., a user question, if evaluating a RAG pipeline)&lt;/li&gt;
&lt;li&gt;the LLM response to evaluate&lt;/li&gt;
&lt;li&gt;the reference answer: a perfect answer, scoring 5 according to the score rubric&lt;/li&gt;
&lt;li&gt;a score rubric with scores from 1 to 5, accurately describing when the response qualifies for each score.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When provided with such a prompt, the model will generate two outputs: detailed &lt;code&gt;feedback&lt;/code&gt; and a score from 1 to 5.&lt;/p&gt;
&lt;h3 id=&#34;an-example&#34;&gt;An example&lt;/h3&gt;
&lt;p&gt;Let’s say we want to evaluate the correctness of a generated answer. In this case we will have a ground truth answer, but it is not mandatory.&lt;/p&gt;
&lt;p&gt;Question: “Who won the 2022 World Cup?”
Generated Answer: “Argentina won the 2022 FIFA World Cup. France won the 2018 FIFA World Cup.”
Reference Answer: “Argentina won the 2022 FIFA World Cup.”&lt;/p&gt;
&lt;p&gt;In this case, our prompt might look like the following:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[...]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;###The instruction to evaluate:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Your&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;task&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;evaluate&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;generated&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;against&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;reference&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Who&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;won&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2022&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;World&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Cup&lt;/span&gt;?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;###Response to evaluate:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Argentina&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;won&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2022&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;FIFA&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;World&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Cup&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;France&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;won&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2018&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;FIFA&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;World&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Cup&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;###Reference Answer (Score 5):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Argentina&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;won&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2022&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;FIFA&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;World&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Cup&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;###Score Rubrics:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Score&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;The&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;relevant&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;does&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;align&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;reference&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Score&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;The&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;relevant&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;but&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;deviates&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;significantly&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;reference&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Score&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;The&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;relevant&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;generally&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;aligns&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;reference&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;but&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;has&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;errors&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;or&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;omissions&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Score&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;4&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;The&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;relevant&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;closely&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;matches&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;reference&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;but&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;less&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;concise&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;or&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;clear&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Score&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;The&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;highly&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;relevant&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;fully&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;accurate&lt;/span&gt;, &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;matches&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;reference&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;both&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;clarity&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;###Feedback:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Calling the model with this prompt should return a result similar to this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Both the generated response and the reference answer identify Argentina as the winner of the 2022 FIFA World Cup. This shows that the generated answer is accurate and relevant to the user’s query about who won the 2022 World Cup. However, the inclusion of an unrelated fact about France winning the 2018 World Cup does not enhance the relevance or accuracy of the response, making it somewhat less concise than the reference answer. While the information about France is not incorrect, it does not pertain to the original query about the 2022 World Cup. Thus, despite being generally correct, the additional detail in the generated response deviates from the concise nature of the reference answer, and it could be perceived as slightly less effective. Therefore, according to the score rubric, which places emphasis on relevance, conciseness, and accuracy, the generated response is not as well aligned with the reference answer as it could be. So the score is 4. [RESULT] 4&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;experimenting-with-prometheus-2-in-haystack&#34;&gt;Experimenting with Prometheus 2 in Haystack&lt;/h2&gt;
&lt;p&gt;Haystack is an orchestration framework for building and evaluating applications based on LLMs. It comes with 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/evaluation&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;its own set of evaluators and integrates with different evaluation libraries&lt;/a&gt;. It is easy to extend Haystack functionalities, by creating custom components, so we can try to incorporate Prometheus 2.&lt;/p&gt;
&lt;p&gt;This is the plan:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;build and run an Indexing pipeline&lt;/li&gt;
&lt;li&gt;build and run a RAG pipeline to evaluate&lt;/li&gt;
&lt;li&gt;implement a PrometheusLLMEvaluator component&lt;/li&gt;
&lt;li&gt;create different PrometheusLLMEvaluators&lt;/li&gt;
&lt;li&gt;build and run an evaluation pipeline with different PrometheusLLMEvaluators&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this blog post, we summarize these steps, but you can find the full implementation in the accompanying notebook.&lt;/p&gt;
&lt;h3 id=&#34;indexing-pipeline&#34;&gt;Indexing pipeline&lt;/h3&gt;
&lt;p&gt;Before running a RAG pipeline, we need to index some data.&lt;/p&gt;
&lt;p&gt;We will be using 
&lt;a href=&#34;https://huggingface.co/datasets/vblagoje/PubMedQA_instruction/viewer/default/train?row=0&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;a labeled PubMed dataset&lt;/a&gt; with questions, contexts and answers. This allows us to use the contexts as Documents and provides the necessary labeled data for some of the evaluation metrics we will define.&lt;/p&gt;
&lt;p&gt;For simplicity, we will use the &lt;code&gt;InMemoryDocumentStore&lt;/code&gt;. Our indexing pipeline will include a &lt;code&gt;DocumentEmbedder&lt;/code&gt; (embedding model: &lt;code&gt;sentence-transformers/all-MiniLM-L6-v2&lt;/code&gt;) and a &lt;code&gt;DocumentWriter&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;See the accompanying notebook for the full code on building a running an indexing pipeline.&lt;/p&gt;
&lt;h3 id=&#34;rag-pipeline&#34;&gt;RAG pipeline&lt;/h3&gt;
&lt;p&gt;Now that we have our data ready, we can create a simple RAG pipeline.&lt;/p&gt;
&lt;p&gt;Our RAG pipeline will include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;InMemoryEmbeddingRetriever&lt;/code&gt; to retrieve the relevant documents for the query (based on the same embedding model as before)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PromptBuilder&lt;/code&gt; to dynamically create prompts&lt;/li&gt;
&lt;li&gt;&lt;code&gt;HuggingFaceLocalGenerator&lt;/code&gt; with &lt;code&gt;google/gemma-1.1-2b-it&lt;/code&gt; to generate answers to queries. It is a small model, and later we will evaluate the quality of the generated responses based on custom criteria.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;AnswerBuilder&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let’s run our RAG pipeline with a set of questions and save the data we need for evaluation: questions, ground truth answers, and generated answers.&lt;/p&gt;
&lt;h3 id=&#34;implement-a-prometheusllmevaluator-component&#34;&gt;Implement a PrometheusLLMEvaluator component&lt;/h3&gt;
&lt;p&gt;To perform evaluation, we create 
&lt;a href=&#34;https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/prometheus2_evaluation.ipynb#scrollTo=XVWnGSIJekey&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;a custom Haystack Evaluator component&lt;/a&gt;, based on Prometheus 2.&lt;/p&gt;
&lt;p&gt;This component allows you to develop a diverse range of evaluators.&lt;/p&gt;
&lt;p&gt;You can find the implementation in the accompanying notebook. Let’s take a high-level overview of this component:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;init parameters
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;template&lt;/code&gt;: A Jinja2 prompt template that adheres to the Prometheus 2 prompt structure, with placeholders for input data that we want to pass at runtime (e.g., &lt;code&gt;question&lt;/code&gt;, &lt;code&gt;generated_answer&lt;/code&gt;, &lt;code&gt;ground_truth_answer&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;inputs&lt;/code&gt;: A list of tuples in the format (&lt;code&gt;input_name&lt;/code&gt;, &lt;code&gt;input_type&lt;/code&gt;). These are the inputs that the evaluator expects and uses for evaluation. They should match those defined in the template.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;generator&lt;/code&gt;: (hacky) Allows passing different types of Haystack generators to use the Prometheus 2 model. Examples: &lt;code&gt;HuggingFaceLocalGenerator&lt;/code&gt;, &lt;code&gt;LlamaCPPGenerator&lt;/code&gt;, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;run&lt;/code&gt; method: for each example to evaluate, the inputs are validated, integrated into the prompt and passed to the model. The model output is parsed to extract score and feedback. This method returns a dictionary containing an aggregate &lt;code&gt;score&lt;/code&gt;, &lt;code&gt;individual_scores&lt;/code&gt; and &lt;code&gt;feedbacks&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;create-different-evaluators&#34;&gt;Create different Evaluators&lt;/h3&gt;
&lt;p&gt;Let’s see how we can use the &lt;code&gt;PrometheusLLMEvaluator&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We start by creating a Correctness Evaluator, similar to the above example.&lt;/p&gt;
&lt;p&gt;First, we initialize a generator to load the Prometheus 2 model; in particular, we are using the small variant (7B).&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;HuggingFaceLocalGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;HuggingFaceLocalGenerator&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prometheus-eval/prometheus-7b-v2.0&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;task&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text2text-generation&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;warm_up&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this example, we are using the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/huggingfacelocalgenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;HuggingFaceLocalGenerator&lt;/code&gt;&lt;/a&gt;, which can run on the free GPU provided by Colab, but there are several other options, depending on your environment: 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/llamacppgenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;LlamaCPPGenerator&lt;/code&gt;&lt;/a&gt; for resource-constrained environments (even without a GPU); TGI (via 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;HuggingFaceAPIGenerator&lt;/a&gt;) and 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/vllm&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;vLLM&lt;/a&gt; for production environments with available GPU resources.&lt;/p&gt;
&lt;p&gt;Next, let’s prepare the prompt template for the Correctness evaluator. Note that we are inserting placeholders for &lt;code&gt;query&lt;/code&gt;, &lt;code&gt;generated_answer&lt;/code&gt; and &lt;code&gt;reference_answer&lt;/code&gt;. These fields will be populated dynamically based on RAG results and ground truth answers.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;correctness_prompt_template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;###The instruction to evaluate:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Your task is to evaluate the generated answer against the reference answer for the question: {{query}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;###Response to evaluate:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;generated answer: {{generated_answer}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;###Reference Answer (Score 5): {{reference_answer}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;###Score Rubrics:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Score 1: The answer is not relevant to the question and does not align with the reference answer.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Score 2: The answer is relevant to the question but deviates significantly from the reference answer.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Score 3: The answer is relevant to the question and generally aligns with the reference answer but has errors or omissions.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Score 4: The answer is relevant to the question and closely matches the reference answer but is less concise or clear.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Score 5: The answer is highly relevant, fully accurate, and matches the reference answer in both content and clarity.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;###Feedback:&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally, let’s initialize our evaluator, specifying which inputs it should expect at runtime (they should match the placeholders of the above prompt template).&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;correctness_evaluator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PrometheusLLMEvaluator&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;correctness_prompt_template&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt;=[
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        (&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        (&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generated_answer&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        (&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reference_answer&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;]),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Similarly, we can create other evaluators:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Response Relevance: Evaluates the generated answer in terms of its relevance to the user’s question.&lt;/li&gt;
&lt;li&gt;Logical Robustness: Evaluates the logical organization and progression of the response.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These evaluators do not require ground truth labels. For details about prompt templates and required inputs, refer to the accompanying notebook.&lt;/p&gt;
&lt;h3 id=&#34;evaluation-pipeline&#34;&gt;Evaluation pipeline&lt;/h3&gt;
&lt;p&gt;We can now put our evaluators in a pipeline, and run it to see how our small model performs.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;correctness_evaluator&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;correctness_evaluator&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;response_relevance_evaluator&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;response_relevance_evaluator&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;logical_robustness_evaluator&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;logical_robustness_evaluator&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;eval_results&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;eval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;correctness_evaluator&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;questions&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generated_answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;rag_answers&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reference_answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;ground_truth_answers&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;response_relevance_evaluator&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;questions&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generated_answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;rag_answers&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;logical_robustness_evaluator&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;questions&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generated_answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;rag_answers&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once we’ve run our evaluation pipeline, we can also create a full evaluation report. Haystack provides an &lt;code&gt;EvaluationRunResult&lt;/code&gt; which we can use to display a &lt;code&gt;score_report&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.evaluation.eval_run_result&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;EvaluationRunResult&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt; = {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;questions&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;ground_truth_answers&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;predicted_answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;rag_answers&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;evaluation_result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;EvaluationRunResult&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;run_name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;pubmed_rag_pipeline&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;eval_results&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;evaluation_result&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;score_report&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In our experiment (involving a small sample of 10 examples), we get the following results:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Evaluation&lt;/th&gt;
          &lt;th&gt;Score&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;correctness_evaluator&lt;/td&gt;
          &lt;td&gt;3.9&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;response_relevance_evaluator&lt;/td&gt;
          &lt;td&gt;4.3&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;logical_robustness_evaluator&lt;/td&gt;
          &lt;td&gt;3.5&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Gemma-1.1-2b-it seems to generate relevant answers, but the responses differ from ground truth answers and the logical organization is not optimal.&lt;/p&gt;
&lt;p&gt;To inspect these results in more detail, we can convert the &lt;code&gt;evaluation_result&lt;/code&gt; to a Pandas dataframe and also look at the individual feedbacks of each evaluator for each example.&lt;/p&gt;
&lt;h2 id=&#34;wrapping-it-up&#34;&gt;Wrapping it up&lt;/h2&gt;
&lt;p&gt;In this post, you have learned about Prometheus 2: a new family of SOTA open-source models for evaluation.&lt;/p&gt;
&lt;p&gt;After introducing the models and their specific usage, we have put them in action in Haystack and created different Evaluators to assess the quality of the responses produced by a RAG pipeline, along several axes.&lt;/p&gt;
&lt;p&gt;The results of our experiments are interesting and promising. However, before using these models for real-world applications, you should assess them for your specific use case. Moreover, in this rapidly changing world, perhaps the day is not far off when general-purpose open models can be used effectively for evaluation.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Building RAG Applications with NVIDIA NIM and Haystack on K8s</title>
      <link>https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide/</link>
      <pubDate>Sun, 02 Jun 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Retrieval-augmented generation (RAG) systems combine generative AI with information retrieval for contextualized answer generation. Building reliable and performant RAG applications at scale is challenging. In this blog, we show how to use Haystack and NVIDIA NIM to create a RAG solution which is easy to deploy/maintain, standardized and enterprise-ready, that can run on-prem as well as on cloud native environments. This recipe is applicable in the cloud, on-premise or even in air-gapped environments.&lt;/p&gt;
&lt;h2 id=&#34;about-haystack&#34;&gt;About Haystack&lt;/h2&gt;
&lt;p&gt;
&lt;a href=&#34;https://haystack.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt;, by 
&lt;a href=&#34;https://www.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset&lt;/a&gt;, is an open source framework for building production-ready LLM applications, RAG pipelines and state-of-the-art search systems that work intelligently over large document collections.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide/nvidia-image-1.png alt=&#34;Diagram of a Haystack RAG pipeline&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Haystack’s 
&lt;a href=&#34;https://haystack.deepset.ai/integrations&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;growing ecosystem of community integrations&lt;/a&gt; provide tooling for evaluation, monitoring, transcription, data ingestion and more. The 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/nvidia&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA Haystack integration&lt;/a&gt; allows using NVIDIA models and NIMs in Haystack pipelines, 
&lt;a href=&#34;https://haystack.deepset.ai/blog/haystack-nvidia-integration&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;giving the flexibility to pivot from prototyping in the cloud to deploying on-prem&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;about-nvidia-nim&#34;&gt;About NVIDIA NIM&lt;/h2&gt;
&lt;p&gt;NVIDIA NIM is a collection of containerized microservices designed for optimized inference of state-of-the-art  AI models.  The container uses a variety of components to serve AI models and exposes them via standard API. Models are optimized using 
&lt;a href=&#34;https://developer.nvidia.com/tensorrt&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;TensorRT&lt;/a&gt; or 
&lt;a href=&#34;https://github.com/NVIDIA/TensorRT-LLM&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;TensorRT-LLM&lt;/a&gt; (depending on the type of the model), applying procedures such as quantization, model distribution, optimized kernel/runtimes and inflight- or continuous batching among others allowing even further optimization if needed. Learn more about NIM 
&lt;a href=&#34;https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This tutorial shows how to build a Haystack RAG pipeline leveraging NIMs hosted on the 
&lt;a href=&#34;https://build.nvidia.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA API catalog&lt;/a&gt;. Then, we provide instructions on deploying NIMs
on your infrastructure in a Kubernetes environment for self-hosting 
&lt;a href=&#34;https://www.nvidia.com/en-us/ai-data-science/foundation-models/?_gl=1*3m0pk5*_gcl_au*ODg0NTI0MDQ3LjE3MTczMTI1MDE.&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;AI foundation models&lt;/a&gt;. Note that hosting NIMs requires 
&lt;a href=&#34;https://www.nvidia.com/en-us/data-center/products/ai-enterprise/?_gl=1*1crq8g6*_gcl_au*NzMwODYxMzc1LjE3MTczMTIxMzg.&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA AI Enterprise license&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;build-a-haystack-rag-pipeline-with-nvidia-nims-hosted-on-the-nvidia-api-catalog&#34;&gt;Build a Haystack RAG Pipeline with NVIDIA NIMs hosted on the NVIDIA API Catalog&lt;/h2&gt;
&lt;p&gt;For RAG pipelines, Haystack provides 3 components that can be connected with NVIDIA NIMs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/nvidiagenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NvidiaGenerator&lt;/a&gt;: Text generation with LLM NIM.&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/nvidiadocumentembedder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NvidiaDocumentEmbedder&lt;/a&gt;: Document embedding with 
&lt;a href=&#34;https://build.nvidia.com/nvidia/nv-embedqa-e5-v5&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA NeMo Retriever Text Embedding NIM&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/nvidiatextembedder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NvidiaTextEmbedder&lt;/a&gt;: Query embedding with  NVIDIA NeMo Retriever Text Embedding NIM.







    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide/nvidia-image-2.png alt=&#34;Fig. 1 -  Haystack Indexing and RAG pipeline with NVIDIA NIMs&#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;&lt;em&gt;Fig. 1 - Haystack Indexing and RAG pipelines with NVIDIA NIMs&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For this section, we have provided scripts and instructions for building a RAG pipeline leveraging NIMs hosted on the 
&lt;a href=&#34;https://build.nvidia.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA API catalog&lt;/a&gt; as part of the 
&lt;a href=&#34;https://github.com/deepset-ai/nvidia-haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GitHub repository&lt;/a&gt;. We also provide a 
&lt;a href=&#34;https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/rag-with-nims.ipynb&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Jupyter Notebook&lt;/a&gt; for building the same RAG pipeline using NIMs deployed on your infrastructure in a Kubernetes environment.&lt;/p&gt;
&lt;h2 id=&#34;vectorize-documents-with-haystack-indexing-pipelines&#34;&gt;Vectorize Documents with Haystack Indexing Pipelines&lt;/h2&gt;
&lt;p&gt;Our indexing pipeline implementation is available in the 
&lt;a href=&#34;https://github.com/deepset-ai/nvidia-haystack/blob/main/indexing.py&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;indexing tutorial&lt;/a&gt;. Haystack provides several 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/preprocessors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;preprocessing&lt;/a&gt; components for document cleaning, splitting, 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/converters&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;embedders&lt;/a&gt;, as well as 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/converters&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;converters&lt;/a&gt; extracting data from files in different formats. In this tutorial, we will store PDF files in a &lt;code&gt;QdrantDocumentStore&lt;/code&gt;. &lt;code&gt;NvidiaDocumentEmbedder&lt;/code&gt; is used to connect with NIMs hosted on the 
&lt;a href=&#34;https://build.nvidia.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA API catalog&lt;/a&gt;. Below is an example of how to initialize the embedder component with the 
&lt;a href=&#34;https://build.nvidia.com/snowflake/arctic-embed-l&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;snowflake/arctic-embed-l&lt;/code&gt;&lt;/a&gt; NIM hosted on the NVIDIA API catalog.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.utils.auth&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Secret&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.embedders.nvidia&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaDocumentEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaDocumentEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;snowflake/arctic-embed-l&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                  &lt;span style=&#34;color:#268bd2&#34;&gt;api_url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://ai.api.nvidia.com/v1/retrieval/snowflake/arctic-embed-l&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                  &lt;span style=&#34;color:#268bd2&#34;&gt;batch_size&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;creating-the-haystack-rag-pipeline&#34;&gt;Creating the Haystack RAG Pipeline&lt;/h2&gt;
&lt;p&gt;In our example, we will create a simple question/answering RAG pipeline using both NVIDIA NeMo Retriever Text Embedding NIM and LLM NIM. For this pipeline, we use the &lt;code&gt;NvidiaTextEmbedder&lt;/code&gt; to embed the query for retrieval, and the &lt;code&gt;NvidiaGenerator&lt;/code&gt; to generate a response. Example below shows how to instantiate the generator using 
&lt;a href=&#34;https://build.nvidia.com/meta/llama3-70b&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;meta/llama3-70b-instruct&lt;/code&gt;&lt;/a&gt; LLM NIM hosted on the NVIDIA API catalog.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaGenerator&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;meta/llama3-70b-instruct&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;api_url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://integrate.api.nvidia.com/v1&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model_arguments&lt;/span&gt;={
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;max_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1024&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We use Haystack pipelines to connect various components of this RAG pipeline including query embedders and LLM generators. Below is an example of a RAG pipeline:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.utils.auth&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Secret&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.embedders.nvidia&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaTextEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.generators.nvidia&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.retrievers.qdrant&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;QdrantEmbeddingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.document_stores.qdrant&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;QdrantDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;QdrantDocumentStore&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_dim&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1024&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;host&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;qdrant&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaTextEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;snowflake/arctic-embed-l&#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                  &lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;Secret&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_env_var&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;NVIDIA_EMBEDDINGS_KEY&#34;&lt;/span&gt;), 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                  &lt;span style=&#34;color:#268bd2&#34;&gt;api_url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://ai.api.nvidia.com/v1/retrieval/snowflake/arctic-embed-l&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;QdrantEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;Answer the question given the context.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Question: {{ query }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Context:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or document in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    {{ document.content }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Answer:&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaGenerator&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;meta/llama3-70b-instruct&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;api_url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://integrate.api.nvidia.com/v1&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model_arguments&lt;/span&gt;={
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;max_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1024&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder.embedding&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever.query_embedding&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever.documents&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;indexing-files-and-deploying-the-haystack-rag-pipeline&#34;&gt;Indexing Files and Deploying the Haystack RAG Pipeline&lt;/h2&gt;
&lt;p&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/hayhooks&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hayhooks&lt;/a&gt; allows the deployment of RAG pipelines in a containerized environment. In our example, we have provided a 
&lt;a href=&#34;https://github.com/deepset-ai/nvidia-haystack/blob/main/docker-compose.yml&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;docker-compose file&lt;/a&gt; to setup both the Qdrant database, and the RAG pipeline. As we are leveraging NIMs hosted on the 
&lt;a href=&#34;https://build.nvidia.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA API catalog&lt;/a&gt;, we need to set the API keys for the NIMs in the &lt;code&gt;.env&lt;/code&gt; file. The instructions below expect &lt;code&gt;NVIDIA_API_KEY&lt;/code&gt; (for &lt;code&gt;NvidiaGenerator&lt;/code&gt;) and &lt;code&gt;NVIDIA_EMBEDDINGS_KEY&lt;/code&gt; (for &lt;code&gt;NvidiaDocumentEmbedder&lt;/code&gt; and &lt;code&gt;NvidiaTextEmbedder&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Executing &lt;code&gt;docker-compose up&lt;/code&gt; will launch &lt;code&gt;3&lt;/code&gt; containers: &lt;strong&gt;qdrant&lt;/strong&gt;, &lt;strong&gt;hayhooks&lt;/strong&gt; and &lt;strong&gt;qdrant-setup&lt;/strong&gt; (which will run our indexing pipeline and stop). The Qdrant database will be deployed on the localhost and exposed at port &lt;code&gt;6333&lt;/code&gt;. The Qdrant dashboard allows users to inspect the vectorized documents at 
&lt;a href=&#34;localhost:6333/dashboard&#34;&gt;localhost:6333/dashboard&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;serializing-pipelines&#34;&gt;Serializing Pipelines&lt;/h3&gt;
&lt;p&gt;Haystack pipelines defined in Python can be serialized to YAML by calling &lt;code&gt;dump()&lt;/code&gt; on the pipeline object, as shown in our 
&lt;a href=&#34;https://github.com/deepset-ai/nvidia-haystack/blob/77cc316193e718de51b8a56e756749604b8032e9/rag.py#L44C1-L45C16&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;RAG pipeline tutorial&lt;/a&gt;. The 
&lt;a href=&#34;https://github.com/deepset-ai/nvidia-haystack/blob/main/rag.yaml&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;YAML&lt;/a&gt; definition is as follows:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;components&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;embedder&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;haystack_integrations.components.embedders.nvidia.text_embedder.NvidiaTextEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;generator&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;init_parameters&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;api_key&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;haystack_integrations.components.generators.nvidia.generator.NvidiaGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;prompt&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;init_parameters&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;template&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Answer the question given the context.\nQuestion: {{ query }}\nContext:\n\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        {% for document in documents %}\n    {{ document.content }}\n{% endfor %}\n\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        Answer:&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;haystack.components.builders.prompt_builder.PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;retriever&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;init_parameters&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;document_store&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;init_parameters&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#2aa198&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;haystack_integrations.document_stores.qdrant.document_store.QdrantDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#2aa198&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;haystack_integrations.components.retrievers.qdrant.retriever.QdrantEmbeddingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;connections&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;receiver&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;retriever.query_embedding&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;sender&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;embedder.embedding&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;receiver&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;prompt.documents&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;sender&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;retriever.documents&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;receiver&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;generator.prompt&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;sender&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;prompt.prompt&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;max_loops_allowed&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metadata&lt;/span&gt;: {}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;deploy-the-rag-pipeline&#34;&gt;Deploy the RAG Pipeline&lt;/h3&gt;
&lt;p&gt;To deploy the RAG pipeline, execute &lt;code&gt;hayhooks deploy rag.yaml&lt;/code&gt; which will expose the pipeline on 
&lt;a href=&#34;http://localhost:1416/rag&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;http://localhost:1416/rag&lt;/a&gt; by default. You can then visit 
&lt;a href=&#34;http://localhost:1416/docs&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;http://localhost:1416/docs&lt;/a&gt; for the API docs and try out the pipeline.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide/nvidia-image-3.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide/nvidia-image-4.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;For production, Haystack provides Helm charts and 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/kubernetes&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;instructions&lt;/a&gt; to create services running Hayhooks with a container orchestrator like Kubernetes.&lt;/p&gt;
&lt;p&gt;In the next sections, we will show how to deploy, monitor and autoscale NIMs
on your infrastructure in a Kubernetes environment for self-hosting 
&lt;a href=&#34;https://www.nvidia.com/en-us/ai-data-science/foundation-models/?_gl=1*3m0pk5*_gcl_au*ODg0NTI0MDQ3LjE3MTczMTI1MDE.&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;AI foundation models&lt;/a&gt;. Finally, we will provide instructions on how to use them in the Haystack RAG pipeline.&lt;/p&gt;
&lt;h2 id=&#34;self-hosting-nvidia-nims-on-a-kubernetes-cluster&#34;&gt;Self-hosting NVIDIA NIMs on a Kubernetes cluster&lt;/h2&gt;
&lt;h3 id=&#34;kubernetes-cluster-environment&#34;&gt;Kubernetes Cluster Environment&lt;/h3&gt;
&lt;p&gt;In this tutorial, the setup environment consists of a DGX H100 with 8 H100 GPUs each having 80GB of memory as host and with Ubuntu as the operating system. Docker is used as the container runtime. Kubernetes is deployed on it using 
&lt;a href=&#34;https://minikube.sigs.k8s.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Minikube&lt;/a&gt;. To enable GPU utilization in Kubernetes, we install essential NVIDIA software components using the 
&lt;a href=&#34;https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GPU Operator&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;nvidia-nims-deployment&#34;&gt;NVIDIA NIMs Deployment&lt;/h3&gt;
&lt;p&gt;As part of this setup, we deploy following NVIDIA NIMs into the Kubernetes cluster using Helm charts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The LLM NIM, which uses the model 
&lt;a href=&#34;https://build.nvidia.com/meta/llama3-8b&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;llama3-8b-instruct&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;The NeMo Retriever Text Embedding NIM, which uses the model 
&lt;a href=&#34;https://build.nvidia.com/nvidia/nv-embedqa-e5-v5&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;nvidia/nv-embedqa-e5-v5&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Helm chart for the LLM NIM is located in 
&lt;a href=&#34;https://github.com/NVIDIA/nim-deploy&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GitHub&lt;/a&gt; whereas the helm chart for NVIDIA NeMo Retriever Embedding NIM can be fetched from the the 
&lt;a href=&#34;https://helm.ngc.nvidia.com/nim/nvidia/charts/text-embedding-nim-1.0.0.tgz&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NGC registry&lt;/a&gt;.  Figure 3 illustrates the deployment of NIMs on a Kubernetes cluster running on a DGX H100. The GPU Operator components are deployed via its Helm chart and are part of the GPU Operator stack. Prometheus and Grafana are deployed via Helm charts for monitoring the Kubernetes cluster and the NIM.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide/nvidia-image-5.png alt=&#34;Figure 3 - NVIDIA NIMs and  other components deployment on a Kubernetes cluster &#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;&lt;em&gt;Figure 3 - NVIDIA NIMs and  other components deployment on a Kubernetes cluster&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;The LLM NIM Helm chart contains the LLM NIM container, which runs within a pod and references the model via 
&lt;a href=&#34;https://kubernetes.io/docs/concepts/storage/persistent-volumes/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Persistent Volume&lt;/a&gt; (PV) and 
&lt;a href=&#34;https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Persistent Volume Claim&lt;/a&gt; (PVC). The LLM NIM pods are autoscaled using the 
&lt;a href=&#34;https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Horizontal Pod Autoscaler&lt;/a&gt; (HPA) based on custom metrics and are exposed via Kubernetes 
&lt;a href=&#34;https://kubernetes.io/docs/concepts/services-networking/service/#type-clusterip&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ClusterIP&lt;/a&gt; service. To access the LLM NIM, we deploy an 
&lt;a href=&#34;https://kubernetes.io/docs/concepts/services-networking/ingress/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ingress&lt;/a&gt; and expose it at the &lt;code&gt;/llm&lt;/code&gt; endpoint.&lt;/p&gt;
&lt;p&gt;Similarly, the NeMo Retriever Text Embedding NIM Helm chart includes the NeMo Retriever Text Embedding NIM container, which runs within a pod and references the model on the host via PV and PVC. The NeMo Retriever Text Embedding NIM pods are also autoscaled via HPA and are exposed via Kubernetes ClusterIP service. To access the NeMo Retriever Text Embedding NIM, we deploy an ingress and expose it at the &lt;code&gt;/embedding&lt;/code&gt; endpoint.&lt;/p&gt;
&lt;p&gt;Users and other applications can access the exposed NVIDIA NIMs via the ingress.
The vector database Qdrant is deployed using this 
&lt;a href=&#34;https://qdrant.tech/documentation/guides/installation/#kubernetes&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;helm chart&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now, let’s take a closer look at the deployment process for each NIM:&lt;/p&gt;
&lt;h3 id=&#34;llm-nim-deployment&#34;&gt;LLM NIM deployment&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Create the namespace, if it is not already created yet:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;kubectl create namespace nim-llm&lt;/code&gt;&lt;/p&gt;
&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;Add a Docker registry secret that will be used for pulling NIM containers from NGC and replace &lt;code&gt;&lt;ngc-cli-api-key&gt;&lt;/code&gt; with the API key from NGC. Follow this 
&lt;a href=&#34;https://docs.nvidia.com/ngc/gpu-cloud/ngc-user-guide/index.html#generating-api-key&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;link&lt;/a&gt; for generating an API key in NGC.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;kubectl create secret -n nim-llm docker-registry nvcrimagepullsecret \
    --docker-server=nvcr.io \
    --docker-username=&#39;$oauthtoken&#39; --docker-password=&lt;ngc-cli-api-key&gt;  
&lt;/code&gt;&lt;/pre&gt;&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;Create a generic secret &lt;code&gt;ngc-api&lt;/code&gt;, which is used to pull the model within the NIM container.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;kubectl create secret -n nim-llm generic ngc-api \
    --from-literal=NGC_CLI_API_KEY=&lt;ngc-cli-api-key&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;ol start=&#34;4&#34;&gt;
&lt;li&gt;Create &lt;code&gt;nim-llm-values.yaml&lt;/code&gt; file with the below content. Adjust &lt;code&gt;repository&lt;/code&gt; and &lt;code&gt;tag&lt;/code&gt; values depending on your environment.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;image&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Adjust to the actual location of the image and version you want&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;repository&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nvcr.io/nim/meta/llama3-8b-instruct&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;tag&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1.0.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;imagePullSecrets&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nvcrimagepullsecret&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;model&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;meta/llama3-8b-instruct&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;ngcAPISecret&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;ngc-api&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;persistence&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;enabled&lt;/span&gt;: &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;size&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;30Gi&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;annotations&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;helm.sh/resource-policy&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;keep&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;statefulSet&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;enabled&lt;/span&gt;: &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;false&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;resources&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;limits&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;nvidia.com/gpu&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start=&#34;5&#34;&gt;
&lt;li&gt;We assume that the helm chart for the LLM NIM is located here: &lt;code&gt;./nims/helm/nim-llm/&lt;/code&gt;. You can change the command accordingly depending on where the helm chart is located.  Deploy the LLM NIM by running the following command:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;helm -n nim-llm install nim-llm -f ./nims/helm/nim-llm/ nim-llm-values.yaml&lt;/code&gt;&lt;/p&gt;
&lt;ol start=&#34;6&#34;&gt;
&lt;li&gt;The deployment takes a few minutes to start the containers, download models, and become ready. You can monitor the pods with the below command:&lt;/li&gt;
&lt;/ol&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;kubectl get pods -n nim-llm
&lt;/code&gt;&lt;/pre&gt;&lt;h4 id=&#34;example-output&#34;&gt;Example Output&lt;/h4&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;NAME        READY   STATUS    RESTARTS   AGE
nim-llm-0   1/1     Running   0          8m21s
&lt;/code&gt;&lt;/pre&gt;&lt;ol start=&#34;7&#34;&gt;
&lt;li&gt;Install an 
&lt;a href=&#34;https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ingress controller&lt;/a&gt;, if it has not been installed already. Then, create a file &lt;code&gt;ingress-nim-llm.yaml&lt;/code&gt; with the below content to create the ingress for the LLM NIM. Make sure to change the host (here nims.example.com) with your fully qualified domain name.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;kind&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Ingress&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim-llm-ingress&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;namespace&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim-llm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;annotations&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;nginx.ingress.kubernetes.io/use-regex&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;true&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;nginx.ingress.kubernetes.io/rewrite-target&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;/$2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;rules&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;host&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nims.example.com&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;http&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;paths&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;path&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;/llm(/|$)(.*)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;pathType&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;ImplementationSpecific&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;backend&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;service&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim-llm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;port&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;number&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;8000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Deploy the ingress with the below command:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;kubectl apply -f ingress-nim-llm.yaml
&lt;/code&gt;&lt;/pre&gt;&lt;ol start=&#34;8&#34;&gt;
&lt;li&gt;Access the exposed service by making a curl request for testing (replace &lt;code&gt;nims.example.com&lt;/code&gt; with you own fully qualified domain name)&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;curl -X &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;POST&#39;&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;http://nims.example.com/llm/v1/chat/completions&#39;&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  -H &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;accept: application/json&#39;&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  -H &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Content-Type: application/json&#39;&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  -d &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  &#34;messages&#34;: [
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;      &#34;content&#34;: &#34;You are a polite and respectful chatbot helping people plan a vacation.&#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;      &#34;role&#34;: &#34;system&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    },
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;      &#34;content&#34;: &#34;What shall i do in France in one line?&#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;      &#34;role&#34;: &#34;user&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  ],
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  &#34;model&#34;: &#34;meta/llama3-8b-instruct&#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  &#34;temperature&#34;: 0.5,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  &#34;max_tokens&#34;: 1024,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  &#34;top_p&#34;: 1,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  &#34;stream&#34;: false
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id=&#34;example-output-1&#34;&gt;Example output:&lt;/h4&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;id&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cmpl-44c301e7f12942fb830fc53a58e98e2a&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;object&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;chat.completion&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;created&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1724399020&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;model&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;meta/llama3-8b-instruct&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;choices&#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;index&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;message&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;role&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;assistant&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;content&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Indulge in the rich culture and history of France by visiting iconic landmarks like the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum, and savor the country&#39;s renowned cuisine and wine in charming cities like Paris, Lyon, and Bordeaux.&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;logprobs&#34;&lt;/span&gt;: &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;finish_reason&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;stop&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;stop_reason&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;128009&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;usage&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;prompt_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;39&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;total_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;94&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;completion_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;55&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, we have the LLM NIM up and running.&lt;/p&gt;
&lt;h3 id=&#34;nemo-retriever-text-embedding-nim-deployment&#34;&gt;NeMo Retriever Text Embedding NIM deployment&lt;/h3&gt;
&lt;p&gt;The deployment of the NeMo Retriever Text Embedding NIM is similar to the LLM NIM.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Follow steps 1 - 3 as LLM NIM deployment but replace namespace with &lt;code&gt;nim-embedding&lt;/code&gt; in the commands.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create &lt;code&gt;nim-embedding-values.yaml&lt;/code&gt; file with the below content. Adjust &lt;code&gt;repository&lt;/code&gt; and &lt;code&gt;tag&lt;/code&gt; values depending on your environment.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;image&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;repository&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nvcr.io/nim/nvidia/nv-embedqa-e5-v5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;tag&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1.0.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;pullPolicy&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;IfNotPresent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;resources&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;limits&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;ephemeral-storage&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;30Gi&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;nvidia.com/gpu&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;memory&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;12Gi&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;cpu&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;16000m&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;requests&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;ephemeral-storage&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;3Gi&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;nvidia.com/gpu&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;memory&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;8Gi&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;cpu&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;4000m&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metrics&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;enabled&lt;/span&gt;: &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;Fetch the NeMo Retriever Text Embedding NIM Helm chart, by running following commands (you would need to create NGC CLI API key, check 
&lt;a href=&#34;https://docs.nvidia.com/ngc/gpu-cloud/ngc-user-guide/index.html#generating-api-key&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt; on how to create it):&lt;/li&gt;
&lt;/ol&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;helm fetch https://helm.ngc.nvidia.com/nim/nvidia/charts/text-embedding-nim-1.0.0.tgz --username=&#39;$oauthtoken&#39; --password=&lt;NGC_CLI_API_KEY&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;ol start=&#34;4&#34;&gt;
&lt;li&gt;We assume that the helm chart for the NeMo Retriever Text Embedding NIM is located here: &lt;code&gt;./nims/helm/nim-embedding/&lt;/code&gt;. You can change the command accordingly depending on where the helm chart is located. Deploy the NeMo Retriever Text Embedding NIM by running the following command&lt;/li&gt;
&lt;/ol&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;helm -n nim-embedding install nim-embedding -f embedding-nim-values.yaml /nims/helm/nim-embedding/text-embedding-nim-1.0.0.tgz
&lt;/code&gt;&lt;/pre&gt;&lt;ol start=&#34;4&#34;&gt;
&lt;li&gt;The deployment takes a few minutes to start the container, download model, and become ready. You can monitor the pod with the below command:&lt;/li&gt;
&lt;/ol&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;kubectl get pods -n nim-embedding
&lt;/code&gt;&lt;/pre&gt;&lt;h4 id=&#34;example-output-2&#34;&gt;Example Output&lt;/h4&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;NAME                                 READY   STATUS     RESTARTS   AGE
nemo-embedding-ms-58dd974469-tpdvn   1/1     Running   0          5m
&lt;/code&gt;&lt;/pre&gt;&lt;ol start=&#34;5&#34;&gt;
&lt;li&gt;Create a file &lt;code&gt;ingress-nim-embedding.yaml&lt;/code&gt; similar to the LLM NIM ingress with service name &lt;code&gt;nemo-embedding-ms&lt;/code&gt;, port &lt;code&gt;8080&lt;/code&gt;, and path &lt;code&gt;/embedding(/|$)(.*)&lt;/code&gt; as below:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;kind&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Ingress&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim-embedding-ingress&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;namespace&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim-embedding&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;annotations&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;nginx.ingress.kubernetes.io/use-regex&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;true&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;nginx.ingress.kubernetes.io/rewrite-target&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;/$2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;rules&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;host&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nims.example.com&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;http&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;paths&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;path&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;/embedding(/|$)(.*)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;pathType&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;ImplementationSpecific&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;backend&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;service&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nemo-embedding-ms&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;port&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;number&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;8080&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start=&#34;6&#34;&gt;
&lt;li&gt;Access the exposed service by making a curl request for testing (replace in below the &lt;code&gt;nims.example.com&lt;/code&gt; with your fully qualified domain name).&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;curl &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;GET&#39;&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;http://nims.example.com/embedding/v1/models&#39;&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  -H &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;accept: application/json&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id=&#34;example-output-3&#34;&gt;Example output:&lt;/h4&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;object&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;list&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;data&#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;id&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;nvidia/nv-embedqa-e5-v5&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;created&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;object&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;model&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;owned_by&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;organization-owner&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;7 You can test the NeMo Retriever Text Embedding NIM by creating embeddings for sample text as below:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;curl -X &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;POST&#34;&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;http://nims.example.com/embedding/v1/embeddings&#34;&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  -H &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;accept: application/json&#39;&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  -H &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Content-Type: application/json&#39;&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  -d &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;input&#34;: [&#34;Hello world&#34;],
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;model&#34;: &#34;nvidia/nv-embedqa-e5-v5&#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;input_type&#34;: &#34;query&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Example output:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;object&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;list&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;data&#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;index&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;embedding&#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;-0.0003485679626464844&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;-0.017822265625&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.0262298583984375&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.0222015380859375&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ...
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;-0.00823974609375&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;object&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedding&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;model&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;nvidia/nv-embedqa-e5-v5&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;usage&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;prompt_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;6&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;&#34;total_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, we have the NeMo Retriever Text Embedding NIM up and running.&lt;/p&gt;
&lt;p&gt;Once the above procedure is completed, you will have API endpoints of LLM NIM and  NeMo Retriever Text Embedding NIM.&lt;/p&gt;
&lt;h2 id=&#34;operational-considerations&#34;&gt;Operational Considerations&lt;/h2&gt;
&lt;p&gt;Monitoring and autoscaling are essential for deployed NVIDIA NIMs to ensure efficient, effective, and reliable operation. Monitoring tracks performance metrics, detects errors, and optimizes resource utilization, while autoscaling dynamically adjusts resources to match changing workloads, ensuring the NVIDIA NIMs can handle sudden spikes or dips in demand. This enable NVIDIA NIMs to provide accurate and timely responses, even under heavy loads, while optimizing costs and maintaining high availability. In this section, we will delve into details of deploying monitoring and enabling autoscaling for NVIDIA NIMs.&lt;/p&gt;
&lt;h3 id=&#34;monitoring&#34;&gt;Monitoring&lt;/h3&gt;
&lt;p&gt;NVIDIA NIM metrics are collected with the open-source tool 
&lt;a href=&#34;https://prometheus.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Prometheus&lt;/a&gt; and visualized with the 
&lt;a href=&#34;https://grafana.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Grafana&lt;/a&gt; dashboards. NVIDIA 
&lt;a href=&#34;https://docs.nvidia.com/datacenter/cloud-native/kubernetes/dcgme2e.html#gpu-telemetry&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dcgm-exporter&lt;/a&gt; is the preferred tool to collect GPU telemetry. We follow the instructions from 
&lt;a href=&#34;https://docs.nvidia.com/datacenter/cloud-native/gpu-telemetry/latest/kube-prometheus.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt; for the deployment of Prometheus and Grafana.&lt;/p&gt;
&lt;h4 id=&#34;visualizing-nvidia-nim-metrics&#34;&gt;Visualizing NVIDIA NIM Metrics&lt;/h4&gt;
&lt;h5 id=&#34;llm-nim&#34;&gt;LLM NIM&lt;/h5&gt;
&lt;p&gt;By default, NVIDIA LLM NIM metrics are exposed at 
&lt;a href=&#34;http://localhost:8000/metrics&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;http://localhost:8000/metrics&lt;/a&gt; by the LLM NIM container. All the exposed metrics are listed 
&lt;a href=&#34;https://docs.nvidia.com/nim/large-language-models/latest/observability.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;. Using a Prometheus ServiceMonitor they can be published to Prometheus and be viewed in the Grafana. The 
&lt;a href=&#34;https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/running-exporters.md#the-goal-of-servicemonitors&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Prometheus ServiceMonitor&lt;/a&gt; is used to define application to scrape metrics from within Kubernetes cluster.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a file &lt;code&gt;service-monitor-nim-llm.yaml&lt;/code&gt; with the below content.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;monitoring.coreos.com/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;kind&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;ServiceMonitor&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim-llm-sm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;namespace&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim-llm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;endpoints&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;interval&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;30s&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;targetPort&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;8000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;path&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;/metrics&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;namespaceSelector&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;matchNames&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    - &lt;span style=&#34;color:#2aa198&#34;&gt;nim-llm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;selector&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;matchLabels&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;app.kubernetes.io/name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim-llm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;Create a Prometheus ServiceMonitor using the below command:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f service-monitor-nim-llm.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In the prometheus UI under &lt;code&gt;Status -&gt; Targets&lt;/code&gt;, you will see the below ServiceMonitor once it’s deployed.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide/nvidia-image-6.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;Let’s check some inference metrics on the Prometheus UI. Figure 4 shows the stacked graph for &lt;code&gt;request_success_total&lt;/code&gt; NIM metric.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide/nvidia-image-7.png alt=&#34;Figure 4 - Prometheus UI showing the plot of request_success_total metric indicating number of finished requests. &#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;&lt;em&gt;Figure 4 - Prometheus UI showing the plot of request_success_total metric indicating number of finished requests.&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h5 id=&#34;nemo-retriever-text-embedding-nim&#34;&gt;NeMo Retriever Text Embedding NIM&lt;/h5&gt;
&lt;p&gt;NeMo Retriever Text Embedding NIM exposes underlying 
&lt;a href=&#34;https://developer.nvidia.com/triton-inference-server&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Triton&lt;/a&gt; metrics on port 8002. All the exposed metrics are listed 
&lt;a href=&#34;https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;. We will create a service monitor to expose these metrics so as to publish them in Prometheus and visualize in Grafana.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a file &lt;code&gt;service-monitor-nim-embedding.yaml&lt;/code&gt; with the below content.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;monitoring.coreos.com/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;kind&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;ServiceMonitor&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim-embedding-sm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;namespace&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim-embedding&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;endpoints&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;interval&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;30s&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;targetPort&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;8002&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;path&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;/metrics&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;namespaceSelector&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;matchNames&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    - &lt;span style=&#34;color:#2aa198&#34;&gt;nim-embedding&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;selector&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;matchLabels&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;app.kubernetes.io/name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;text-embedding-nim&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;Create a Prometheus ServiceMonitor using the below command:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f service-monitor-nim-embedding.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In the prometheus UI under &lt;code&gt;Status -&gt; Targets&lt;/code&gt;, you will see the below ServiceMonitor once it’s deployed.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide/nvidia-image-8.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;We can check some triton metrics on the Prometheus UI. Figure 5 shows the stacked graph for &lt;code&gt;nv_inference_count&lt;/code&gt; metric.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide/nvidia-image-9.png alt=&#34;Figure 5 - Prometheus UI showing the stacked plot of nv_inference_count metric indicating number of inferences performed. &#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;&lt;em&gt;Figure 5 - Prometheus UI showing the stacked plot of nv_inference_count metric indicating number of inferences performed.&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h3 id=&#34;autoscaling-nvidia-nim&#34;&gt;Autoscaling NVIDIA NIM&lt;/h3&gt;
&lt;p&gt;In this tutorial, we use the 
&lt;a href=&#34;https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Kubernetes Horizontal Pod Autoscaler&lt;/a&gt; - HPA to adjust the scaling of the NIM pods. We have defined custom metrics to monitor the average GPU usage of each NVIDIA NIM and used by the Horizontal Pod Autoscaler (HPA) to dynamically adjust the number of NIM pods. See the metric definition below:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Metrics&lt;/th&gt;
          &lt;th&gt;Expressions&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;nim_llm_gpu_avg&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;avg by (kubernetes_node, pod, namespace, gpu) (DCGM_FI_DEV_GPU_UTIL{pod=~&#34;nim-llm-.*&#34;})&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;nim_embedding_gpu_avg&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;avg by (kubernetes_node, pod, namespace, gpu) (DCGM_FI_DEV_GPU_UTIL{pod=~&#34;nemo-embedding-ms-.*&#34;})&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These metrics are example metrics and one should adjust them based on their environment.&lt;/p&gt;
&lt;p&gt;Let’s deploy the HPA.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a file with the name &lt;code&gt;prometheus_rule_nims.yaml&lt;/code&gt; with the below content to create  the prometheus rules for the above custom metric. Adjust the labels (app, other prometheus labels) according to the current deployed prometheus instance.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;monitoring.coreos.com/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;kind&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;PrometheusRule&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;labels&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;app&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;kube-prometheus-stack&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;app.kubernetes.io/instance&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;kube-prometheus-stack-1710254997&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;app.kubernetes.io/managed-by&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Helm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;app.kubernetes.io/part-of&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;kube-prometheus-stack&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;app.kubernetes.io/version&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;56.8.2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;chart&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;kube-prometheus-stack-56.8.2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;heritage&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Helm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;release&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;kube-prometheus-stack-1710254997&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;kube-prometheus-stack-1709-gpu.rules&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;namespace&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;prometheus&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;groups&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;gpu.rules&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;rules&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;expr&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;avg by (kubernetes_node, pod, namespace, gpu) (DCGM_FI_DEV_GPU_UTIL{pod=~&#34;nim-llm-.*&#34;})&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;record&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim_llm_gpu_avg&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;expr&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;avg by (kubernetes_node, pod, namespace, gpu) (DCGM_FI_DEV_GPU_UTIL{pod=~&#34;nemo-embedding-ms-.*&#34;})&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;record&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim_embedding_gpu_avg&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;Create custom prometheus recording rules by running the below command:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f prometheus_rule_nims.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;In prometheus UI, under &lt;code&gt;Status -&gt; Rules&lt;/code&gt;, you can see the above two created rules as shown in Figure 6.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide/nvidia-image-10.png alt=&#34;Figure 6 - Prometheus rules tab showing the created custom rules to record GPU usage by NVIDIA NIM. &#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;&lt;em&gt;Figure 6 - Prometheus rules tab showing the created custom rules to record GPU usage by NVIDIA NIM.&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;ol start=&#34;4&#34;&gt;
&lt;li&gt;Install 
&lt;a href=&#34;https://github.com/kubernetes-sigs/prometheus-adapter&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;prometheus-adapter&lt;/a&gt; to query the custom metrics based on the custom recording rules created above and register them to the custom metrics API for HPA to fetch. Replace in below command &lt;code&gt;&lt;prometheus-service-name&gt;&lt;/code&gt; with the name of the prometheus service in Kubernetes.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;helm upgrade --install prometheus-adapter prometheus-community/prometheus-adapter --set prometheus.url=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;http://&lt;prometheus-service-name&gt;.prometheus.svc.cluster.local&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start=&#34;5&#34;&gt;
&lt;li&gt;Query the custom metrics API to see if the metrics have been registered using the below command:&lt;/li&gt;
&lt;/ol&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq -r . | grep llms
&lt;/code&gt;&lt;/pre&gt;&lt;h4 id=&#34;example-output-4&#34;&gt;Example Output:&lt;/h4&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&#34;name&#34;: &#34;pods/nim_embedding_gpu_avg&#34;,
&#34;name&#34;: &#34;namespaces/nim_embedding_gpu_avg&#34;,
&#34;name&#34;: &#34;pods/nim_llm_gpu_avg&#34;,
&#34;name&#34;: &#34;namespaces/nim_llm_gpu_avg&#34;,
&lt;/code&gt;&lt;/pre&gt;&lt;ol start=&#34;6&#34;&gt;
&lt;li&gt;A separate HPA definition is created for the two NVIDIA NIMs. Within this definition, we specify the minimum and maximum number of replicas, the metric to monitor, and the target value for that metric. Below is the definition for the LLM NIM HPA and you can create the similar for the NeMo Retriever Text Embedding NIM using &lt;code&gt;nim_embedding_gpu_avg&lt;/code&gt; metric.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;LLM NIM HPA file:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;autoscaling/v2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;kind&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;HorizontalPodAutoscaler&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim-llm-hpa&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;namespace&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim-llm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;scaleTargetRef&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;apps/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;kind&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Deployment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim-llm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;minReplicas&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;maxReplicas&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metrics&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Pods&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;pods&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metric&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;nim_llm_gpu_avg&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;target&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;AverageValue&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;averageValue&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;30&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start=&#34;7&#34;&gt;
&lt;li&gt;Create the two HPAs using the below commands:&lt;/li&gt;
&lt;/ol&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;kubectl apply -f hpa_nim_llm.yaml
kubectl apply -f hpa_nim_embedding.yaml
&lt;/code&gt;&lt;/pre&gt;&lt;ol start=&#34;8&#34;&gt;
&lt;li&gt;Check the status of HPAs:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;kubectl get hpa -A&lt;/code&gt;&lt;/p&gt;
&lt;h4 id=&#34;example-output-5&#34;&gt;Example Output:&lt;/h4&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;NAMESPACE       NAME                REFERENCE                      TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
nim-embedding   nim-embedding-hpa   Deployment/nemo-embedding-ms   0/30           1         2         1          80s
nim-llm         nim-llm-hpa         Deployment/nim-llm             0/30           1         2         1          2m11s
&lt;/code&gt;&lt;/pre&gt;&lt;ol start=&#34;9&#34;&gt;
&lt;li&gt;Send some requests to LLM NIM and see the LLM NIM pod getting scaled as shown below:&lt;/li&gt;
&lt;/ol&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;NAME        READY   STATUS    RESTARTS   AGE
nim-llm-0   1/1     Running   0          3h47m
nim-llm-1   1/1     Running   0          3m30s
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Also, Figure 6 shows the prometheus graph showing the scaling of LLM NIM.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide/nvidia-image-11.png alt=&#34;Figure 7 - Prometheus graph showing the scaling of LLM NIM. &#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;&lt;em&gt;Figure 7 - Prometheus graph showing the scaling of LLM NIM.&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;We have now deployed NVIDIA NIMs on your infrastructure in a scalable fashion. We can now use them in the RAG pipeline. The next section provides the details for the same.&lt;/p&gt;
&lt;h2 id=&#34;use-self-hosted-nvidia-nims-in-the-rag-pipeline&#34;&gt;Use Self-hosted NVIDIA NIMs in the RAG Pipeline&lt;/h2&gt;
&lt;p&gt;This section provides instructions to use previously deployed NVIDIA NIMs on your infrastructure in a Kubernetes cluster for &lt;code&gt;NvidiaTextEmbedder&lt;/code&gt;, &lt;code&gt;NvidiaDocumentEmbedder&lt;/code&gt; and &lt;code&gt;NvidiaGenerator&lt;/code&gt; in the Haystack RAG pipeline, replacing  &lt;code&gt;&lt;self-hosted-emedding-nim-url&gt;&lt;/code&gt; with the endpoint of the NeMo Retriever Text Embedding NIM and &lt;code&gt;&lt;self-hosted-llm-nim-url&gt;&lt;/code&gt; with the LLM NIM. The provided 
&lt;a href=&#34;https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/rag-with-nims.ipynb&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;notebook&lt;/a&gt; in the repository has examples of how to use the self-hosted NVIDIA NIMs.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;NvidiaDocumentEmbedder&lt;/em&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaDocumentEmbedder&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_nim_model&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;api_url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;http://&lt;self-hosted-emedding-nim-url&gt;/v1&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;em&gt;NvidiaTextEmbedder&lt;/em&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# initialize NvidiaTextEmbedder with the self-hosted NeMo Retriever Text Embedding NIM URL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaTextEmbedder&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_nim_model&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;api_url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;http://&lt;self-hosted-embedding-nim-url&gt;/v1&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;em&gt;NvidiaGenerator&lt;/em&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# initialize NvidiaGenerator with the self-hosted LLM NIM URL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;NvidiaGenerator&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;llm_nim_model_name&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;api_url&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;http://&lt;self-hosted-llm-nim-url&gt;/v1&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;model_arguments&lt;/span&gt;={
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;temperature&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.5&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;top_p&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.7&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;max_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2048&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;summary&#34;&gt;Summary&lt;/h2&gt;
&lt;p&gt;In this blog, we provide a comprehensive walkthrough for building robust and scalable RAG applications using Haystack and NVIDIA NIMs. We cover building the RAG pipeline by leveraging NVIDIA NIMs hosted on the 
&lt;a href=&#34;https://build.nvidia.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NVIDIA API catalog&lt;/a&gt; and also using self-hosted NVIDIA NIMs deployed on your infrastructure in a Kubernetes environment. Our step-by-step instructions detail how to deploy NVIDIA NIMs in a Kubernetes cluster, monitor their performance, and scale them as needed.&lt;/p&gt;
&lt;p&gt;By leveraging proven deployment patterns, our architecture ensures a responsive user experience and predictable query times, even in the face of high or bursty user queries and document indexing workloads. Moreover, our deployment recipe is flexible, allowing for easy implementation in cloud, on-premise, or air-gapped environments. With this guide, we aim to provide a resource for anyone looking to build reliable and performant RAG applications at scale.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Monitor and trace your Haystack pipelines with Langfuse</title>
      <link>https://haystack.deepset.ai/blog/langfuse-integration/</link>
      <pubDate>Fri, 17 May 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/langfuse-integration/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Getting your LLM application into production is a huge milestone, but that’s only the beginning. It’s critical to monitor how your pipeline is performing in the real world so you can keep improving performance and cost, and proactively address any issues that might arise.&lt;/p&gt;
&lt;p&gt;With the new 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/langfuse&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack Langfuse integration&lt;/a&gt;, it’s now easier than ever to have visibility into your pipelines. In this post, we’ll explain more about Langfuse, and demonstrate how to trace an end to end request to a Haystack pipeline.&lt;/p&gt;
&lt;h3 id=&#34;what-is-langfuse&#34;&gt;What is Langfuse?&lt;/h3&gt;
&lt;p&gt;Langfuse is an open source LLM engineering platform. It offers a ton of features to help you understand how your LLM application is performing under the hood.&lt;/p&gt;
&lt;h3 id=&#34;langfuse-features-and-benefits&#34;&gt;Langfuse features and benefits&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Track model usage and cost&lt;/li&gt;
&lt;li&gt;Collect user feedback&lt;/li&gt;
&lt;li&gt;Identify low-quality outputs&lt;/li&gt;
&lt;li&gt;Build fine-tuning and testing datasets&lt;/li&gt;
&lt;li&gt;Open source 💙&lt;/li&gt;
&lt;li&gt;Self-hosted version available&lt;/li&gt;
&lt;li&gt;Frequent releases with new features and improvements&lt;/li&gt;
&lt;li&gt;as of the time of this writing, free to try out 🤑&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;getting-started&#34;&gt;Getting started&lt;/h2&gt;
&lt;p&gt;In order to use this integration, you’ll need to 
&lt;a href=&#34;https://langfuse.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;sign up for a Langfuse account&lt;/a&gt;. See 
&lt;a href=&#34;https://langfuse.com/docs&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;the Langfuse docs for the most up-to-date information&lt;/a&gt; about features and pricing.&lt;/p&gt;
&lt;h3 id=&#34;prerequisites&#34;&gt;Prerequisites&lt;/h3&gt;
&lt;p&gt;First, 
&lt;a href=&#34;https://langfuse.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;sign up for an account on the Langfuse website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;On the Langfuse dashboard, make a note of your &lt;code&gt;LANGFUSE_SECRET_KEY&lt;/code&gt; and &lt;code&gt;LANGFUSE_PUBLIC_KEY&lt;/code&gt;.  Set them as environment variables. While you’re at it, set the &lt;code&gt;HAYSTACK_CONTENT_TRACING_ENABLED&lt;/code&gt; environment variable to &lt;code&gt;true&lt;/code&gt; in order to enable Haystack tracing in your pipeline.&lt;/p&gt;
&lt;p&gt;The following code examples also require an &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; environment variable to be set. Haystack is model-agnostic and you can use 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/generators&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;any model provider we support&lt;/a&gt;, by changing the generator in the code samples below.&lt;/p&gt;
&lt;h3 id=&#34;installation&#34;&gt;Installation&lt;/h3&gt;
&lt;p&gt;To install the integration run the following command in your terminal:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pip install langfuse-haystack
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To use Langfuse in a pipeline you’ll need a few additional dependencies:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pip install sentence-transformers datasets
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;use-langfuse-in-a-rag-pipeline&#34;&gt;Use Langfuse in a RAG pipeline&lt;/h2&gt;
&lt;p&gt;First, import all the modules you’ll need.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;datasets&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;load_dataset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.embedders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersTextEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.connectors.langfuse&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;LangfuseConnector&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, write a function that takes a &lt;code&gt;DocumentStore&lt;/code&gt; and returns a Haystack RAG pipeline. Add the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/langfuseconnector&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;LangfuseConnector&lt;/code&gt;&lt;/a&gt; to your pipeline, but don’t connect it to any other component in the pipeline.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;get_pipeline&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    Given the following information, answer the question.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    Context:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    {&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or document in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        {{ document.content }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    {&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    Question: {{question}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    Answer:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    &#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Add components to your pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;tracer&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;LangfuseConnector&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Basic RAG Pipeline&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text_embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersTextEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sentence-transformers/all-MiniLM-L6-v2&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-3.5-turbo&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;generation_kwargs&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;n&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt;}))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Now, connect the components to each other&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# NOTE: the tracer component doesn&#39;t need to be connected to anything in order to work&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text_embedder.embedding&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever.query_embedding&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;basic_rag_pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, instantiate the pipeline using an &lt;code&gt;InMemoryDocumentStore&lt;/code&gt; to keep things simple. Generate some embeddings based on the 
&lt;a href=&#34;https://huggingface.co/datasets/bilgeyucel/seven-wonders&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;7 wonders of the world dataset&lt;/a&gt;, and populate them into our document store. If you were running this code in production, you’d probably want to use an indexing pipeline to load the data into the store, but for demo purposes this approach reduces complexity.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;dataset&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;load_dataset&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;bilgeyucel/seven-wonders&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;split&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;train&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sentence-transformers/all-MiniLM-L6-v2&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;warm_up&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docs_with_embeddings&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;([&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(**&lt;span style=&#34;color:#268bd2&#34;&gt;ds&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ds&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;dataset&lt;/span&gt;]).&lt;span style=&#34;color:#268bd2&#34;&gt;get&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;documents&#34;&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;or&lt;/span&gt; []  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# type: ignore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;write_documents&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;docs_with_embeddings&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Run the pipeline and ask it a question.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;get_pipeline&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What does Rhodes Statue look like?&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text_embedder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Setting the &lt;code&gt;HAYSTACK_CONTENT_TRACING_ENABLED&lt;/code&gt; environment variable automatically traces every request that the pipeline runs. If all goes well you should receive something like the following output:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# {&#39;tracer&#39;: {&#39;name&#39;: &#39;Basic RAG Pipeline&#39;, &#39;trace_url&#39;: &#39;https://cloud.langfuse.com/trace/3d52b8cc-87b6-4977-8927-5e9f3ff5b1cb&#39;}, &#39;llm&#39;: {&#39;replies&#39;: [&#39;The Rhodes Statue was described as being about 105 feet tall, with iron tie bars and brass plates forming the skin. It was built on a white marble pedestal near the Rhodes harbour entrance. The statue was filled with stone blocks as construction progressed.&#39;, &#39;The Rhodes Statue was described as being about 32 meters (105 feet) tall, built with iron tie bars, brass plates for skin, and filled with stone blocks. It stood on a 15-meter-high white marble pedestal near the Rhodes harbor entrance.&#39;], &#39;meta&#39;: [{&#39;model&#39;: &#39;gpt-3.5-turbo-0125&#39;, &#39;index&#39;: 0, &#39;finish_reason&#39;: &#39;stop&#39;, &#39;usage&#39;: {&#39;completion_tokens&#39;: 100, &#39;prompt_tokens&#39;: 453, &#39;total_tokens&#39;: 553}}, {&#39;model&#39;: &#39;gpt-3.5-turbo-0125&#39;, &#39;index&#39;: 1, &#39;finish_reason&#39;: &#39;stop&#39;, &#39;usage&#39;: {&#39;completion_tokens&#39;: 100, &#39;prompt_tokens&#39;: 453, &#39;total_tokens&#39;: 553}}]}}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Dumping tracing output in the terminal, is pretty cool, but the integration also sends the info to Langfuse. The Langfuse dashboard has a much more comprehensive and beautiful UI so you can make sense of your pipeline. Let’s hop over there and take a look.&lt;/p&gt;
&lt;h2 id=&#34;use-langfuse-in-a-rag-pipeline-with-chat&#34;&gt;Use Langfuse in a RAG pipeline with chat&lt;/h2&gt;
&lt;p&gt;Agent and chat use cases are rising in popularity. If you wanted to use the integration to trace a pipeline that includes a chat generator component, here’s an example of how to do so.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DynamicChatPromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators.chat&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIChatGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.dataclasses&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.connectors.langfuse&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;LangfuseConnector&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;tracer&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;LangfuseConnector&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Chat example&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;DynamicChatPromptBuilder&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIChatGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-3.5-turbo&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm.messages&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt; = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_system&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Always respond in German even if some input data is in other languages.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Tell me about {{location}}&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;template_variables&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;location&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Berlin&#34;&lt;/span&gt;}, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_source&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;}}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;tracer&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;trace_url&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# ChatMessage(content=&#39;Berlin ist die Hauptstadt von Deutschland und zugleich eines der bekanntesten kulturellen Zentren Europas. Die Stadt hat eine faszinierende Geschichte, die bis in die Zeiten des Zweiten Weltkriegs und des Kalten Krieges zurückreicht. Heute ist Berlin für seine vielfältige Kunst- und Musikszene, seine historischen Stätten wie das Brandenburger Tor und die Berliner Mauer sowie seine lebendige Street-Food-Kultur bekannt. Berlin ist auch für seine grünen Parks und Seen beliebt, die den Bewohnern und Besuchern Raum für Erholung bieten.&#39;, role=&lt;ChatRole.ASSISTANT: &#39;assistant&#39;&gt;, name=None, meta={&#39;model&#39;: &#39;gpt-3.5-turbo-0125&#39;, &#39;index&#39;: 0, &#39;finish_reason&#39;: &#39;stop&#39;, &#39;usage&#39;: {&#39;completion_tokens&#39;: 137, &#39;prompt_tokens&#39;: 29, &#39;total_tokens&#39;: 166}})&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# https://cloud.langfuse.com/trace/YOUR_UNIQUE_IDENTIFYING_STRING&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;explore-the-langfuse-dashboard&#34;&gt;Explore the Langfuse dashboard&lt;/h2&gt;
&lt;p&gt;Once you’ve run these code samples, 
&lt;a href=&#34;https://langfuse.com/docs/demo&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;head over to the Langfuse dashboard&lt;/a&gt; to see and interact with traces. As of the time of this writing, the demo is free to try.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/langfuse-integration/langfuse-tracing-dashboard.png alt=&#34;Screenshot of the Langfuse dashboard showing Traces, Scores, Model Cost, Model Usage.&#34;  /&gt;
&lt;/p&gt;
&lt;h3 id=&#34;trace-detail&#34;&gt;Trace Detail&lt;/h3&gt;
&lt;p&gt;Trace details show  cost and latency for a specific end-to-end request. This data is helpful for estinating usage and cost of a RAG application in production. For example, here is the trace detail for the text embedder step of the pipeline we just ran. For 
&lt;a href=&#34;https://langfuse.com/docs/tracing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;a comprehensive explanation of LLM tracing, see the Langfuse docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/langfuse-integration/langfuse-embedder-span.png alt=&#34;Screenshot of the Langfuse dashboard showing a trace detail including inputs, outputs, and metadata for the text_embedder component of a Haystack RAG pipeline.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;The right sidebar shows latency for every step of the pipeline, which helps to pinpoint performance bottlenecks.&lt;/p&gt;
&lt;p&gt;Trace details that are tagged “generation” also show the monetary cost of the request.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/langfuse-integration/langfuse-generation-span.png alt=&#34;Screenshot of the Langfuse dashboard showing a trace detail including inputs, outputs, cost, and metadata for the generator component of a Haystack RAG pipeline.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Traces can also be segmented by user or by session, so you can more granularly understand the user’s journey.&lt;/p&gt;
&lt;h3 id=&#34;evaluation&#34;&gt;Evaluation&lt;/h3&gt;
&lt;p&gt;Evaluation helps us understand the quality of the results the LLM application is returning to the end user. There are currently 4 ways to add scores into Langfuse:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Manual evaluation&lt;/li&gt;
&lt;li&gt;User feedback&lt;/li&gt;
&lt;li&gt;Model-based evaluation&lt;/li&gt;
&lt;li&gt;Custom via SDKs/API&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For the sake of time, this post will only cover manual evaluation, but 
&lt;a href=&#34;https://langfuse.com/docs/scores/overview&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;see the Langfuse docs for comprehensive info on all the evaluation methods&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Clicking on a trace, you can manually add a score to note the quality of that specific request.&lt;/p&gt;
&lt;p&gt;For this trace, the &lt;code&gt;input&lt;/code&gt; shows us our prompt, interpolated with the actual context that was passed to the LLM. Cool!&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;Input:

    Given the following information, answer the question.
    Context:
    
        Within it, too, are to be seen large masses of rock, by the weight of which the artist steadied it while erecting it.[22][23]
Destruction of the remains[edit]
The ultimate fate of the remains of the statue is uncertain. Rhodes has two serious earthquakes per century, owing to its location on the seismically unstable Hellenic Arc. Pausanias tells us, writing ca. 174, how the city was so devastated by an earthquake that the Sibyl oracle foretelling its destruction was considered fulfilled.[24] This means the statue could not have survived for long if it was ever repaired. By the 4th century Rhodes was Christianized, meaning any further maintenance or rebuilding, if there ever was any before, on an ancient pagan statue is unlikely. The metal would have likely been used for coins and maybe also tools by the time of the Arab wars, especially during earlier conflicts such as the Sassanian wars.[9]
The onset of Islamic naval incursions against the Byzantine empire gave rise to a dramatic account of what became of the Colossus. 
    
        Construction[edit]
Timeline and map of the Seven Wonders of the Ancient World, including the Colossus of Rhodes
Construction began in 292 BC. Ancient accounts, which differ to some degree, describe the structure as being built with iron tie bars to which brass plates were fixed to form the skin. The interior of the structure, which stood on a 15-metre-high (49-foot) white marble pedestal near the Rhodes harbour entrance, was then filled with stone blocks as construction progressed.[14] Other sources place the Colossus on a breakwater in the harbour. According to most contemporary descriptions, the statue itself was about 70 cubits, or 32 metres (105 feet) tall.[15] Much of the iron and bronze was reforged from the various weapons Demetrius&#39;s army left behind, and the abandoned second siege tower may have been used for scaffolding around the lower levels during construction.

    
    Question: What does Rhodes Statue look like?
    Answer:
    
&lt;/code&gt;&lt;/pre&gt;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;Output:
The Rhodes Statue was described as being about 105 feet tall, with iron tie bars and brass plates forming the skin. It was built on a white marble pedestal near the Rhodes harbour entrance. The statue was filled with stone blocks as construction progressed.&#34;
1: &#34;The Rhodes Statue was described as being about 32 meters (105 feet) tall, built with iron tie bars, brass plates for skin, and filled with stone blocks. It stood on a 15-meter-high white marble pedestal near the Rhodes harbor entrance.&#34;
]
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This seems like a decent quality response, based on the inputs and outputs. Click on the “Add score” button and give it a score of 1. The score is even editable, in case you make a mistake.&lt;/p&gt;
&lt;p&gt;Now clicking on the “Scores” section, the score we added is visible. Over time, this data helps build a comprehensive picture of the quality of our LLM application.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/langfuse-integration/langfuse-score.png alt=&#34;Screenshot of the Langfuse dashboard showing a manually added score for the Haystack demo RAG pipeline.&#34;  /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;wrapping-it-up&#34;&gt;Wrapping it up&lt;/h2&gt;
&lt;p&gt;If you’ve been following along, today you’ve learned:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How Langfuse can help give you better visibility into your Haystack pipelines, giving you confidence to ship into production&lt;/li&gt;
&lt;li&gt;How to integrate Langfuse into Haystack RAG and chat pipelines&lt;/li&gt;
&lt;li&gt;The basics of LLM tracing and evaluation with the Langfuse dashboard&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For a small team, Langfuse ships new features with incredible velocity. We can’t wait to see what they build next. To stay in the loop for future updates, be sure to follow 
&lt;a href=&#34;https://x.com/langfuse&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Langfuse&lt;/a&gt; and 
&lt;a href=&#34;https://twitter.com/haystack_ai&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt; on Twitter. Thanks for reading!&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Advanced Retrieval: Extract Metadata from Queries to Improve Retrieval</title>
      <link>https://haystack.deepset.ai/blog/extracting-metadata-filter/</link>
      <pubDate>Mon, 13 May 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/extracting-metadata-filter/</guid>
      <description>
      
      
      

      
      

      <blockquote>
<p>This is part one of the <strong>Advanced Use Cases</strong> series:</p>
<p>1️⃣ <strong>Extract Metadata from Queries to Improve Retrieval</strong></p>
<p>2️⃣ 
<a href="/blog/query-expansion">Query Expansion</a></p>
<p>3️⃣ 
<a href="/blog/query-decomposition">Query Decomposition</a></p>
<p>4️⃣ 
<a href="/cookbook/metadata_enrichment">Automated Metadata Enrichment</a></p>
</blockquote>
<p>In Retrieval-Augmented Generation (RAG) applications, the retrieval step, which provides relevant context to your large language model (LLM), is vital for generating high-quality responses. There are possible ways of improving retrieval and <strong>metadata filtering</strong> is one of the easiest ways. 
<a href="https://docs.haystack.deepset.ai/docs/metadata-filtering" target="_blank" rel="noopener">Metadata filtering</a>, the approach of limiting the search space based on some concrete metadata,  can really enhance the quality of the retrieved documents. Here are some advantages of using metadata filtering:</p>
<ol>
<li><strong>Relevance</strong>: Metadata filtering narrows down the information being retrieved. This ensures that the generated responses align with the specific query or topic.</li>
<li><strong>Accuracy</strong>: Filtering based on metadata such as domain, source, date, or topic guarantees that the information used for generation is accurate and trustworthy. This is particularly important for applications where accuracy is paramount. For instance, if you need information about a specific year, using the year as a metadata filter will retrieve only pertinent data.</li>
<li><strong>Efficiency</strong>: Eliminating irrelevant or low-quality information boosts the efficiency of your RAG application, reduces the amount of processing needed, and speeds up retrieval response times.</li>
</ol>
<p>You have two options for applying the metadata filter: you can either specify it directly when running the pipeline or, you can extract it from the query itself. In this article, we’ll focus on extracting  filters from a query to improve the quality of generated responses in RAG applications. Let’s get started.</p>
<h2 id="introduction-to-metadata-filters">Introduction to Metadata Filters</h2>
<p>First things first, what is metadata? Metadata (or meta tag) is actually data about your data, used to categorize, sort, and filter information based on various attributes such as date, topic, source, or any other information that you find relevant. After incorporating meta information into your data, you can apply filters to queries used with 
<a href="https://docs.haystack.deepset.ai/docs/retrievers" target="_blank" rel="noopener">Retrievers</a> to limit the scope of your search based on this metadata and ensure that your answers come from a specific slice of your data.</p>
<p>Imagine that you have following Documents in your document store:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">documents</span> = [
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">content</span>=<span style="color:#2aa198">"Some text about revenue increase"</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">meta</span>={<span style="color:#2aa198">"year"</span>: <span style="color:#2aa198;font-weight:bold">2022</span>, <span style="color:#2aa198">"company"</span>: <span style="color:#2aa198">"Nvidia"</span>, <span style="color:#2aa198">"name"</span>:<span style="color:#2aa198">"A"</span>}),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">content</span>=<span style="color:#2aa198">"Some text about revenue increase"</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">meta</span>={<span style="color:#2aa198">"year"</span>: <span style="color:#2aa198;font-weight:bold">2023</span>, <span style="color:#2aa198">"company"</span>: <span style="color:#2aa198">"Nvidia"</span>, <span style="color:#2aa198">"name"</span>:<span style="color:#2aa198">"B"</span>}),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">content</span>=<span style="color:#2aa198">"Some text about revenue increase"</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">meta</span>={<span style="color:#2aa198">"year"</span>: <span style="color:#2aa198;font-weight:bold">2022</span>, <span style="color:#2aa198">"company"</span>: <span style="color:#2aa198">"BMW"</span>, <span style="color:#2aa198">"name"</span>:<span style="color:#2aa198">"C"</span>}),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">content</span>=<span style="color:#2aa198">"Some text about revenue increase"</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">meta</span>={<span style="color:#2aa198">"year"</span>: <span style="color:#2aa198;font-weight:bold">2023</span>, <span style="color:#2aa198">"company"</span>: <span style="color:#2aa198">"BMW"</span>, <span style="color:#2aa198">"name"</span>:<span style="color:#2aa198">"D"</span>}),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">content</span>=<span style="color:#2aa198">"Some text about revenue increase"</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">meta</span>={<span style="color:#2aa198">"year"</span>: <span style="color:#2aa198;font-weight:bold">2022</span>, <span style="color:#2aa198">"company"</span>: <span style="color:#2aa198">"Mercedes"</span>, <span style="color:#2aa198">"name"</span>:<span style="color:#2aa198">"E"</span>}),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">content</span>=<span style="color:#2aa198">"Some text about revenue increase"</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">meta</span>={<span style="color:#2aa198">"year"</span>: <span style="color:#2aa198;font-weight:bold">2023</span>, <span style="color:#2aa198">"company"</span>: <span style="color:#2aa198">"Mercedes"</span>, <span style="color:#2aa198">"name"</span>:<span style="color:#2aa198">"F"</span>}),
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p>When the query is “<em>Causes of the revenue increase</em>”, the retriever returns all documents as they all contain some information about revenue. However, the metadata filter below ensures that any returned document by the retriever has a value of <code>2022</code> in the <code>year</code> metadata field and either <code>BMW</code> or <code>Mercedes</code> in the <code>company</code> metadata field. So, only documents with name “<strong>C</strong>” and “<strong>E</strong>” are retrieved.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">run</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">data</span>={
</span></span><span style="display:flex;"><span>        <span style="color:#2aa198">"retriever"</span>:{
</span></span><span style="display:flex;"><span>            <span style="color:#2aa198">"query"</span>: <span style="color:#2aa198">"Causes of the revenue increase"</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#2aa198">"filters"</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#2aa198">"operators"</span>: <span style="color:#2aa198">"AND"</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#2aa198">"conditions"</span>: [
</span></span><span style="display:flex;"><span>                    {<span style="color:#2aa198">"field"</span>: <span style="color:#2aa198">"meta.year"</span>, <span style="color:#2aa198">"operator"</span>: <span style="color:#2aa198">"=="</span>, <span style="color:#2aa198">"value"</span>: <span style="color:#2aa198">"2022"</span>},
</span></span><span style="display:flex;"><span>                    {<span style="color:#2aa198">"field"</span>: <span style="color:#2aa198">"meta.company"</span>, <span style="color:#2aa198">"operator"</span>: <span style="color:#2aa198">"in"</span>, <span style="color:#2aa198">"value"</span>: [<span style="color:#2aa198">"BMW"</span>, <span style="color:#2aa198">"Mercedes"</span>]}
</span></span><span style="display:flex;"><span>                ]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><p>In this example, we pass the filter explicitly, but sometimes, the query itself might contain information that can be used as a metadata filter during the querying process. In this case, we need to <em>preprocess</em> the query to extract filters before we use it with a retriever.</p>
<h2 id="extracting-metadata-filters-from-a-query">Extracting Metadata Filters from a Query</h2>
<p>In LLM-based applications, queries are written in natural language. From time to time, they include valuable hints that can be used as metadata filters to improve the retrieval. We can extract these hints, formulate them as metadata filters and use them with the retriever alongside the query. For instance, when the query is “<em>What was the revenue of Nvidia in 2022?</em>”, we can extract <code>2022</code> as <code>years</code> and <code>Nvidia</code> as <code>companies</code>. Based on this information, formulated metadata filter to use with a retriever should look like:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#2aa198">"filters"</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#2aa198">"operators"</span>: <span style="color:#2aa198">"AND"</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#2aa198">"conditions"</span>: [
</span></span><span style="display:flex;"><span>        {<span style="color:#2aa198">"field"</span>: <span style="color:#2aa198">"meta.years"</span>, <span style="color:#2aa198">"operator"</span>: <span style="color:#2aa198">"=="</span>, <span style="color:#2aa198">"value"</span>: <span style="color:#2aa198">"2022"</span>},
</span></span><span style="display:flex;"><span>        {<span style="color:#2aa198">"field"</span>: <span style="color:#2aa198">"meta.companies"</span>, <span style="color:#2aa198">"operator"</span>: <span style="color:#2aa198">"=="</span>, <span style="color:#2aa198">"value"</span>: <span style="color:#2aa198">"Nvidia"</span>}
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Thankfully, LLMs are highly capable of extracting structured information from unstructured text. Let’s see step-by-step how we can implement a custom component that uses an LLM to extract keywords, phrases, or entities from the query and formulate the metadata filter.</p>
<h2 id="implementing-querymetadataextractor">Implementing <code>QueryMetadataExtractor</code></h2>
<blockquote>
<p>🧑‍🍳 You can find and run all the code in our cookbook 
<a href="https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/extracting_metadata_filters_from_a_user_query.ipynb" target="_blank" rel="noopener">Extrating Metadata Filter from a Query</a></p>
</blockquote>
<p>We start by creating a 
<a href="https://docs.haystack.deepset.ai/docs/custom-components" target="_blank" rel="noopener">custom component</a>, <code>QueryMetadataExtractor</code>, which takes <code>query</code> and <code>metadata_fields</code> as inputs and outputs <code>filters</code>. This component encapsulates a generative pipeline, made up of 
<a href="https://docs.haystack.deepset.ai/docs/promptbuilder" target="_blank" rel="noopener"><code>PromptBuilder</code></a> and 
<a href="https://docs.haystack.deepset.ai/docs/openaigenerator" target="_blank" rel="noopener"><code>OpenAIGenerator</code></a>. The pipeline instructs the LLM to extract keywords, phrases, or entities from a given query which can then be used as metadata filters. In the prompt, we include instructions to ensure the output format is in JSON and provide <code>metadata_fields</code> along with the <code>query</code> to ensure the correct entities are extracted from the query.</p>
<p>Once the pipeline is initialized in the <code>init</code> method of the component, we post-process the LLM output in the <code>run</code> method. This step ensures the extracted metadata is correctly formatted to be used as a metadata filter.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">json</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">typing</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Dict</span>, <span style="color:#268bd2">List</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>, <span style="color:#268bd2">component</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.builders</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">PromptBuilder</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.generators</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">OpenAIGenerator</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">@component</span>()
</span></span><span style="display:flex;"><span><span style="color:#859900">class</span> <span style="color:#cb4b16">QueryMetadataExtractor</span>:
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#859900">def</span> <span style="color:#268bd2">__init__</span>(<span style="color:#cb4b16">self</span>):
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">prompt</span> = <span style="color:#2aa198">"""
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        You are part of an information system that processes users queries.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Given a user query you extract information from it that matches a given list of metadata fields.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        The information to be extracted from the query must match the semantics associated with the given metadata fields.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        The information that you extracted from the query will then be used as filters to narrow down the search space
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        when querying an index.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Just include the value of the extracted metadata without including the name of the metadata field.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        The extracted information in 'Extracted metadata' must be returned as a valid JSON structure.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        ###
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Example 1:
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Query: "What was the revenue of Nvidia in 2022?"
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Metadata fields: {"company", "year"}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Extracted metadata fields: {"company": "nvidia", "year": 2022}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        ###
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Example 2:
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Query: "What were the most influential publications in 2023 regarding Alzheimer's disease?"
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Metadata fields: {"disease", "year"}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Extracted metadata fields: {"disease": "Alzheimer", "year": 2023}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        ###
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Example 3:
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Query: "{{query}}"
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Metadata fields: "{{metadata_fields}}"
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        Extracted metadata fields:
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">        """</span>
</span></span><span style="display:flex;"><span>        <span style="color:#cb4b16">self</span>.<span style="color:#268bd2">pipeline</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span>        <span style="color:#cb4b16">self</span>.<span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">name</span>=<span style="color:#2aa198">"builder"</span>, <span style="color:#268bd2">instance</span>=<span style="color:#268bd2">PromptBuilder</span>(<span style="color:#268bd2">prompt</span>))
</span></span><span style="display:flex;"><span>        <span style="color:#cb4b16">self</span>.<span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">name</span>=<span style="color:#2aa198">"llm"</span>, <span style="color:#268bd2">instance</span>=<span style="color:#268bd2">OpenAIGenerator</span>(<span style="color:#268bd2">model</span>=<span style="color:#2aa198">"gpt-3.5-turbo"</span>))
</span></span><span style="display:flex;"><span>        <span style="color:#cb4b16">self</span>.<span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"builder"</span>, <span style="color:#2aa198">"llm"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">@component.output_types</span>(<span style="color:#268bd2">filters</span>=<span style="color:#268bd2">Dict</span>[<span style="color:#cb4b16">str</span>, <span style="color:#cb4b16">str</span>])
</span></span><span style="display:flex;"><span>    <span style="color:#859900">def</span> <span style="color:#268bd2">run</span>(<span style="color:#cb4b16">self</span>, <span style="color:#268bd2">query</span>: <span style="color:#cb4b16">str</span>, <span style="color:#268bd2">metadata_fields</span>: <span style="color:#268bd2">List</span>[<span style="color:#cb4b16">str</span>]):
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">result</span> = <span style="color:#cb4b16">self</span>.<span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">run</span>({<span style="color:#2aa198">'builder'</span>: {<span style="color:#2aa198">'query'</span>: <span style="color:#268bd2">query</span>, <span style="color:#2aa198">'metadata_fields'</span>: <span style="color:#268bd2">metadata_fields</span>}})
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">metadata</span> = <span style="color:#268bd2">json</span>.<span style="color:#268bd2">loads</span>(<span style="color:#268bd2">result</span>[<span style="color:#2aa198">'llm'</span>][<span style="color:#2aa198">'replies'</span>][<span style="color:#2aa198;font-weight:bold">0</span>])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#93a1a1;font-style:italic"># this can be done with specific data structures and in a more sophisticated way</span>
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">filters</span> = []
</span></span><span style="display:flex;"><span>        <span style="color:#859900">for</span> <span style="color:#268bd2">key</span>, <span style="color:#268bd2">value</span> <span style="color:#859900">in</span> <span style="color:#268bd2">metadata</span>.<span style="color:#268bd2">items</span>():
</span></span><span style="display:flex;"><span>            <span style="color:#268bd2">field</span> = <span style="color:#2aa198">f</span><span style="color:#2aa198">"meta.</span><span style="color:#2aa198">{</span><span style="color:#268bd2">key</span><span style="color:#2aa198">}</span><span style="color:#2aa198">"</span>
</span></span><span style="display:flex;"><span>            <span style="color:#268bd2">filters</span>.<span style="color:#268bd2">append</span>({<span style="color:#2aa198">f</span><span style="color:#2aa198">"field"</span>: <span style="color:#268bd2">field</span>, <span style="color:#2aa198">"operator"</span>: <span style="color:#2aa198">"=="</span>, <span style="color:#2aa198">"value"</span>: <span style="color:#268bd2">value</span>})
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#859900">return</span> {<span style="color:#2aa198">"filters"</span>: {<span style="color:#2aa198">"operator"</span>: <span style="color:#2aa198">"AND"</span>, <span style="color:#2aa198">"conditions"</span>: <span style="color:#268bd2">filters</span>}}
</span></span></code></pre></div><p>First, let’s test the <code>QueryMetadataExtractor</code> in isolation, passing a query and a list of metadata fields.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">extractor</span> = <span style="color:#268bd2">QueryMetadataExtractor</span>()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">query</span> = <span style="color:#2aa198">"What were the most influential publications in 2022 regarding Parkinson's disease?"</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">metadata_fields</span> = {<span style="color:#2aa198">"disease"</span>, <span style="color:#2aa198">"year"</span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">result</span> = <span style="color:#268bd2">extractor</span>.<span style="color:#268bd2">run</span>(<span style="color:#268bd2">query</span>, <span style="color:#268bd2">metadata_fields</span>)
</span></span><span style="display:flex;"><span><span style="color:#cb4b16">print</span>(<span style="color:#268bd2">result</span>)
</span></span></code></pre></div><p>The result should look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>{<span style="color:#2aa198">'filters'</span>: {<span style="color:#2aa198">'operator'</span>: <span style="color:#2aa198">'AND'</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#2aa198">'conditions'</span>: [
</span></span><span style="display:flex;"><span>    {<span style="color:#2aa198">'field'</span>: <span style="color:#2aa198">'meta.disease'</span>, <span style="color:#2aa198">'operator'</span>: <span style="color:#2aa198">'=='</span>, <span style="color:#2aa198">'value'</span>: <span style="color:#2aa198">'Alzheimers'</span>},
</span></span><span style="display:flex;"><span>    {<span style="color:#2aa198">'field'</span>: <span style="color:#2aa198">'meta.year'</span>, <span style="color:#2aa198">'operator'</span>: <span style="color:#2aa198">'=='</span>, <span style="color:#2aa198">'value'</span>: 2023}
</span></span><span style="display:flex;"><span>  ]}
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Notice that the <code>QueryMetadataExtractor</code> has extracted the metadata fields from the query and returned them in a format that can be used as filters passed directly to a <code>Retriever</code>. By default, the <code>QueryMetadataExtractor</code> will use all metadata fields as conditions together with an <code>AND</code> operator.</p>
<h2 id="using-querymetadataextractor-in-a-pipeline">Using <code>QueryMetadataExtractor</code> in a Pipeline</h2>
<p>Now, let’s plug the <code>QueryMetadataExtractor</code> into a <code>Pipeline</code> with a <code>Retriever</code> connected to a <code>DocumentStore</code> to see how it works in practice.</p>
<p>We start by creating a 
<a href="https://docs.haystack.deepset.ai/docs/inmemorydocumentstore" target="_blank" rel="noopener"><code>InMemoryDocumentStore</code></a> and adding some documents to it. We include info about “year” and “disease” in the “meta” field of each document.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Document</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.document_stores.in_memory</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">InMemoryDocumentStore</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.document_stores.types</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">DuplicatePolicy</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">documents</span> = [
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">content</span>=<span style="color:#2aa198">"some publication about Alzheimer prevention research done over 2023 patients study"</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">meta</span>={<span style="color:#2aa198">"year"</span>: <span style="color:#2aa198;font-weight:bold">2022</span>, <span style="color:#2aa198">"disease"</span>: <span style="color:#2aa198">"Alzheimer"</span>, <span style="color:#2aa198">"author"</span>: <span style="color:#2aa198">"Michael Butter"</span>}),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">content</span>=<span style="color:#2aa198">"some text about investigation and treatment of Alzheimer disease"</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">meta</span>={<span style="color:#2aa198">"year"</span>: <span style="color:#2aa198;font-weight:bold">2023</span>, <span style="color:#2aa198">"disease"</span>: <span style="color:#2aa198">"Alzheimer"</span>, <span style="color:#2aa198">"author"</span>: <span style="color:#2aa198">"John Bread"</span>}),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">content</span>=<span style="color:#2aa198">"A study on the effectiveness of new therapies for Parkinson's disease"</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">meta</span>={<span style="color:#2aa198">"year"</span>: <span style="color:#2aa198;font-weight:bold">2022</span>, <span style="color:#2aa198">"disease"</span>: <span style="color:#2aa198">"Parkinson"</span>, <span style="color:#2aa198">"author"</span>: <span style="color:#2aa198">"Alice Smith"</span>}
</span></span><span style="display:flex;"><span>    ),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">content</span>=<span style="color:#2aa198">"An overview of the latest research on the genetics of Parkinson's disease and its implications for treatment"</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">meta</span>={<span style="color:#2aa198">"year"</span>: <span style="color:#2aa198;font-weight:bold">2023</span>, <span style="color:#2aa198">"disease"</span>: <span style="color:#2aa198">"Parkinson"</span>, <span style="color:#2aa198">"author"</span>: <span style="color:#2aa198">"David Jones"</span>}
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">document_store</span> = <span style="color:#268bd2">InMemoryDocumentStore</span>(<span style="color:#268bd2">bm25_algorithm</span>=<span style="color:#2aa198">"BM25Plus"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">document_store</span>.<span style="color:#268bd2">write_documents</span>(<span style="color:#268bd2">documents</span>=<span style="color:#268bd2">documents</span>, <span style="color:#268bd2">policy</span>=<span style="color:#268bd2">DuplicatePolicy</span>.<span style="color:#268bd2">OVERWRITE</span>)
</span></span></code></pre></div><p>We then create a pipeline consisting of the <code>QueryMetadataExtractor</code> and a 
<a href="https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever" target="_blank" rel="noopener"><code>InMemoryBM25Retriever</code></a> connected to the <code>InMemoryDocumentStore</code> created above.</p>
<blockquote>
<p>Learn about connecting components and creating pipelines in 
<a href="https://docs.haystack.deepset.ai/docs/creating-pipelines" target="_blank" rel="noopener">Docs: Creating Pipelines</a>.</p>
</blockquote>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>, <span style="color:#268bd2">Document</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.retrievers.in_memory</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">InMemoryBM25Retriever</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">retrieval_pipeline</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">metadata_extractor</span> = <span style="color:#268bd2">QueryMetadataExtractor</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">retriever</span> = <span style="color:#268bd2">InMemoryBM25Retriever</span>(<span style="color:#268bd2">document_store</span>=<span style="color:#268bd2">document_store</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">retrieval_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">metadata_extractor</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"metadata_extractor"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">retrieval_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">retriever</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"retriever"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">retrieval_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"metadata_extractor.filters"</span>, <span style="color:#2aa198">"retriever.filters"</span>)
</span></span></code></pre></div><p>Now define a query and metadata fields and pass them to the pipeline:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">query</span> = <span style="color:#2aa198">"publications 2023 Alzheimer's disease"</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">metadata_fields</span> = {<span style="color:#2aa198">"year"</span>, <span style="color:#2aa198">"author"</span>, <span style="color:#2aa198">"disease"</span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">retrieval_pipeline</span>.<span style="color:#268bd2">run</span>(<span style="color:#268bd2">data</span>={<span style="color:#2aa198">"metadata_extractor"</span>: {<span style="color:#2aa198">"query"</span>: <span style="color:#268bd2">query</span>, <span style="color:#2aa198">"metadata_fields"</span>: <span style="color:#268bd2">metadata_fields</span>}, <span style="color:#2aa198">"retriever"</span>:{<span style="color:#2aa198">"query"</span>: <span style="color:#268bd2">query</span>}})
</span></span></code></pre></div><p>This returns only documents whose metadata field <code>year = 2023</code> and <code>disease = Alzheimer</code></p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>{<span style="color:#2aa198">'documents'</span>: 
</span></span><span style="display:flex;"><span> [<span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>     <span style="color:#cb4b16">id</span>=<span style="color:#268bd2">e3b0bfd497a9f83397945583e77b293429eb5bdead5680cc8f58dd4337372aa3</span>, 
</span></span><span style="display:flex;"><span>     <span style="color:#268bd2">content</span>: <span style="color:#2aa198">'some text about investigation and treatment of Alzheimer disease'</span>, 
</span></span><span style="display:flex;"><span>     <span style="color:#268bd2">meta</span>: {<span style="color:#2aa198">'year'</span>: <span style="color:#2aa198;font-weight:bold">2023</span>, <span style="color:#2aa198">'disease'</span>: <span style="color:#2aa198">'Alzheimer'</span>, <span style="color:#2aa198">'author'</span>: <span style="color:#2aa198">'John Bread'</span>}, 
</span></span><span style="display:flex;"><span>     <span style="color:#268bd2">score</span>: <span style="color:#2aa198;font-weight:bold">2.772588722239781</span>)]
</span></span><span style="display:flex;"><span>     }
</span></span></code></pre></div><h2 id="conclusion">Conclusion</h2>
<p>Metadata filtering stands out as a powerful technique for improving the relevance and accuracy of retrieved documents, thus enabling the generation of high-quality responses in RAG applications. Using the custom component <code>QueryMetadataExtractor</code> we implemented, we can extract filters from user queries and directly use them with Retrievers.</p>
<p>This article was part one of the <strong>Advanced Use Cases</strong> series. If you want to stay on top of the latest Haystack developments, you can 
<a href="https://landing.deepset.ai/haystack-community-updates" target="_blank" rel="noopener">subscribe to our newsletter</a> or 
<a href="https://discord.gg/DzJEUKkuHp" target="_blank" rel="noopener">join our Discord community</a> 💙</p>

      </description>
    </item>
    
    <item>
      <title>Chatting with SQL Databases 3 Ways</title>
      <link>https://haystack.deepset.ai/blog/chatting-with-sql-databases-3-ways/</link>
      <pubDate>Mon, 22 Apr 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/chatting-with-sql-databases-3-ways/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Let’s talk about how we build AI applications that can interact with, even chat to SQL databases. Heads-up that this is a mini project that I tried out, and it’s objectively a simple approach to interacting with SQL with natural language. I’m &lt;em&gt;certain&lt;/em&gt; that this is not a golden bullet that works with all types of SQL tables etc. But, nonetheless, it’s cool, it works, and you can try it along with me.&lt;/p&gt;
&lt;p&gt;My aim setting off with this project was to be able to ask questions in natural language, and get back an answer based on the contents in a SQL database. For example, assuming we have a database that stores information on employees and their time spent at work; “&lt;em&gt;How many people spend more than 8 hours at work a day?&lt;/em&gt;”, “&lt;em&gt;A total of 20 employees spend more than 8 hours at work a day.&lt;/em&gt;”.&lt;/p&gt;
&lt;p&gt;A few struggles I knew I might get with a system based on a generative LLM:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What if the question is &lt;em&gt;unrelated&lt;/em&gt; to the contents in the SQL database?&lt;/li&gt;
&lt;li&gt;I can get the LLM to maybe produce a SQL query, but how do I actually query the database?&lt;/li&gt;
&lt;li&gt;What if the SQL query is just &lt;em&gt;wrong?&lt;/em&gt; 🙂&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, here are 4 things I did and that I’m showing you here. Not all solve all of the problems above, but they’re a good starting point. And as a heads up, I used Haystack for this project, so the following walk-through has some Haystack terminology.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I created &lt;strong&gt;a component that can query a SQL database&lt;/strong&gt; given a query. I used SQLite for this.&lt;/li&gt;
&lt;li&gt;I created a &lt;strong&gt;pipeline using an LLM that can ingest a question, produce a SQL query based on the structure of my database, and query it&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Next, to resolve the problem of ‘what if the query is unrelated’, I “upgraded” my pipeline to use &lt;em&gt;&lt;strong&gt;conditional routing.&lt;/strong&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Finally, so as to have a proper chat experience, I changed tactics and defined everything as a function and used the &lt;em&gt;&lt;strong&gt;function-calling&lt;/strong&gt;&lt;/em&gt; capabilities of the latest LLMs. I then wrapped that into a mini Gradio app that you can run from within the Cookbook I’ve provided.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Before we dive in, a quick note on the data at hand. I won’t show all the code to get that into the database here because, frankly, it’s the most boring step… A few bullet points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I used a CSV file called ‘Absenteeism_at_work’ that the JupySQL team uses to show their examples. Very handy! I took inspiration from 
&lt;a href=&#34;https://ploomber.io/blog/jupysql-haystack-agent/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;the first demo we created with them&lt;/a&gt; and wrote this table to my SQLite database.&lt;/li&gt;
&lt;li&gt;The resulting table has the following columns:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;ID;Reason_for_absence;Month_of_absence;Day_of_the_week;Seasons;Transportation_expense;Distance_from_Residence_to_Work;Service_time;Age;Work_load_Average_day_;Hit_target;Disciplinary_failure;Education;Son;Social_drinker;Social_smoker;Pet;Weight;Height;Body_mass_index;Absenteeism_time_in_hours
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ul&gt;
&lt;li&gt;I’ve populated a database called &lt;code&gt;absenteeism.db&lt;/code&gt; that represents this table, which we will be using here on out.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;🧑‍🍳 You can see and run all the code required for the SQL database setup in the 
&lt;a href=&#34;https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/chat_with_SQL_3_ways.ipynb&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;provided Cookbook&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;
&lt;a href=&#34;#part-1-query-sql-with-natural-language&#34;&gt;Part 1: Query SQL with Natural Language&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;#part-2-skip-unrelated-questions-with-conditional-routing&#34;&gt;Part 2: Skip Unrelated Questions with Conditional Routing&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;#part-3-build-a-chat-app-with-function-calling--gradio&#34;&gt;Part 3: Build a Chat App with Function Calling (&amp; Gradio)&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;part-1-query-sql-with-natural-language&#34;&gt;Part 1: Query SQL with Natural Language&lt;/h2&gt;
&lt;p&gt;First, let’s tackle the easiest of the lot. Let’s create a pipeline that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Accepts a question&lt;/li&gt;
&lt;li&gt;Creates a SQL query&lt;/li&gt;
&lt;li&gt;Actually queries our DB.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To this end, here’s something we’re going to need. A &lt;code&gt;SQLQuery&lt;/code&gt; component which Haystack does not have. But Haystack &lt;em&gt;does&lt;/em&gt; provide a uniform interface for component creation. And here’s what mine looks like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;sqlite3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;typing&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;component&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;@component&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;SQLQuery&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;__init__&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;sql_database&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connection&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;sqlite3&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;sql_database&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;check_same_thread&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;@component.output_types&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;], &lt;span style=&#34;color:#268bd2&#34;&gt;queries&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;queries&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;]):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt; = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;queries&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;pd&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;read_sql&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;, &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connection&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;append&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;results&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;queries&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;queries&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now I have an &lt;code&gt;SQLQuery&lt;/code&gt; component, I can initialize it with &lt;code&gt;sql_query = SQLQuery(&#39;absenteeism.db)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To test it, I can run the following SQL query with this component:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;sql_query&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;queries&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;SELECT Age, SUM(Absenteeism_time_in_hours) as Total_Absenteeism_Hours FROM absenteeism WHERE Disciplinary_failure = 0 GROUP BY Age ORDER BY Total_Absenteeism_Hours DESC LIMIT 3;&#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;results&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Resulting in:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   Age  Total_Absenteeism_Hours
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;   &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;28&lt;/span&gt;                      &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;651&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;   &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;33&lt;/span&gt;                      &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;538&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt;   &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;38&lt;/span&gt;                      &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;482&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;build-the-sql-query-pipeline&#34;&gt;Build the SQL Query Pipeline&lt;/h3&gt;
&lt;p&gt;Now that we have a component that can ingest SQL queries and query our desired database, let’s add it as a step in a full AI application. We will build a Haystack pipeline that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Uses a prompt that instructs an LLM (as an example here, we’re using GPT-4, if you want to stick to this one you’ll have to provide your API key as the &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; environment variable) to generate a SQL query based on the &lt;code&gt;columns&lt;/code&gt; present in our database and the natural language &lt;code&gt;question&lt;/code&gt; that we provide&lt;/li&gt;
&lt;li&gt;Forwards the generated SQL query to our newly creates &lt;code&gt;SQLQuery&lt;/code&gt; component&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators.openai&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;Please generate an SQL query. The query should answer the following Question: {{question}};
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;            The query is to be answered for the table is called &#39;absenteeism&#39; with the following
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;            Columns: {{columns}};
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;            Answer:&#34;&#34;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;sql_query&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SQLQuery&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;absenteeism.db&#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-4&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;sql_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sql_querier&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;sql_query&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm.replies&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sql_querier.queries&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The resulting &lt;code&gt;sql_pipeline&lt;/code&gt; looks like this (produces with &lt;code&gt;sql_pipeline.show()&lt;/code&gt; in the cookbook) 👇&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/chatting-with-sql-databases-3-ways/pipeline1.png alt=&#34;Simple Pipeline&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Let’s try running it, and here’s what we get 👇&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;On which days of the week does the average absenteeism time exceed 4 hours?&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;columns&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;columns&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sql_querier&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;results&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#   Day_of_the_week&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#0                2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#1                3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#2                4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#3                5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#4                6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;part-2-skip-unrelated-questions-with-conditional-routing&#34;&gt;Part 2: Skip Unrelated Questions with Conditional Routing&lt;/h2&gt;
&lt;p&gt;Next, let’s add a simple improvement. What if the provided &lt;code&gt;question&lt;/code&gt; is unrelated? It’s better if we skip it, rather than querying the database with a SQL query that’s going to result in an error.&lt;/p&gt;
&lt;p&gt;For this step, I used the &lt;code&gt;[ConditionalRouter](https://docs.haystack.deepset.ai/docs/conditionalrouter)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To start with, let’s have an LLM reply with &lt;code&gt;no_answer&lt;/code&gt; if the provided question cannot be answered with the SQL table we have:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators.openai&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;Please generate an SQL query. The query should answer the following Question: {{question}};
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;            If the question cannot be answered given the provided table and columns, return &#39;no_answer&#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;            The query is to be answered for the table is called &#39;absenteeism&#39; with the following
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;            Columns: {{columns}};
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;            Answer:&#34;&#34;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-4&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, let’s define some routes, and our &lt;code&gt;ConditionalRouter&lt;/code&gt;. What we want to accomplish here is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If the reply from our LLM is &lt;code&gt;no_answer&lt;/code&gt; we want to go down the &lt;code&gt;go_to_fallback&lt;/code&gt; branch of our pipeline&lt;/li&gt;
&lt;li&gt;If there’s no &lt;code&gt;no_answer&lt;/code&gt;, we want to go down the &lt;code&gt;sql&lt;/code&gt; route.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.routers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ConditionalRouter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;routes&lt;/span&gt; = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;condition&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;{{&#39;no_answer&#39; not in replies[0]}}&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;output&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;{{replies}}&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;output_name&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sql&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;output_type&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;condition&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;{{&#39;no_answer&#39; in replies[0]}}&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;output&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;{{question}}&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;output_name&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;go_to_fallback&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;output_type&#34;&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;router&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ConditionalRouter&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;routes&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally, let’s decide on what to do if we want to go down the fallback route. For this example, I’ve simply created a &lt;code&gt;fallback_llm&lt;/code&gt; that is going to tell the user &lt;em&gt;why&lt;/em&gt; the question cannot be answered:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;fallback_prompt&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;User entered a query that cannot be answerwed with the given table.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;                                            The query was: {{question}} and the table had columns: {{columns}}.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;                                            Let the user know why the question cannot be answered&#34;&#34;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;fallback_llm&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-4&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When we add all of these, and our &lt;code&gt;sql_query&lt;/code&gt; component into a pipeline, it looks like this 👇&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conditional_sql_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conditional_sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conditional_sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conditional_sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;router&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;router&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conditional_sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;fallback_prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;fallback_prompt&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conditional_sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;fallback_llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;fallback_llm&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conditional_sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sql_querier&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;sql_query&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conditional_sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conditional_sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm.replies&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;router.replies&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conditional_sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;router.sql&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sql_querier.queries&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conditional_sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;router.go_to_fallback&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;fallback_prompt.question&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conditional_sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;fallback_prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;fallback_llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/chatting-with-sql-databases-3-ways/pipeline2.png alt=&#34;Pipeline with Conditional Routing&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;If I ask the question “On which days of the week does the average absenteeism time exceed 4 hours?” to this pipeline I get:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;On which days of the week does the average absenteeism time exceed 4 hours?&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;conditional_sql_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                                  &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;columns&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;columns&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                       &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;router&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                       &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;fallback_prompt&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;columns&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;columns&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;sql_querier&#39;&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;sql_querier&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;results&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;elif&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;fallback_llm&#39;&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;fallback_llm&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;replies&#39;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#   Day_of_the_week&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#0                2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#1                3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#2                4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#3                5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#4                6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;But if I ask “When is my birthday?” I get the following reply:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The query cannot be answered as the provided table does not contain information regarding the user’s personal data such as birthdays. The table primarily focuses on absence-related data for presumably work or similar situations. Please provide the relevant data to get the accurate answer.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;part-3-build-a-chat-app-with-function-calling--gradio&#34;&gt;Part 3: Build a Chat App with Function Calling (&amp; Gradio)&lt;/h2&gt;
&lt;p&gt;The last thing I tried was to use function calling instead of conditional routing. The aim here is to have a human-like discussion with our SQL database. By doing this, we allow the LLM to &lt;em&gt;decide&lt;/em&gt; whether the tool (our function) should be selected to resolve a question. The resulting system has the following flow:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;User asks question&lt;/li&gt;
&lt;li&gt;LLM decides whether any of the tools (functions) provided to it can resolve the question&lt;/li&gt;
&lt;li&gt;The LLM generates the inputs for the tool it has selected to use. These inputs should be what the tool needs to actually run&lt;/li&gt;
&lt;li&gt;We run the tool with the generated inputs.&lt;/li&gt;
&lt;li&gt;The LLM then generates a human-like answer based on the returned result from the tool.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I already have a &lt;code&gt;SQLQuery&lt;/code&gt; component that is pretty straightforward, so here’s how I wrapped that into a function:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;sql_query&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SQLQuery&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;absenteeism.db&#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;sql_query_func&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;queries&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;]):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;try&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;sql_query&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;queries&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reply&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;results&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;]}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;except&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Exception&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;e&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2&#34;&gt;reply&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;There was an error running the SQL Query = &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;queries&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;              The error is &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;              You should probably try again.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;              &#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;reply&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;reply&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Since I’m using GPT-4 as my demo LLM here, I’ll be providing this function as a tool:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt; = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;type&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;function&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;function&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;name&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sql_query_func&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;description&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;This a tool useful to query a SQL table called &#39;absenteeism&#39; with the following Columns: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;columns&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;parameters&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;type&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;object&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;properties&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;queries&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;type&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;array&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;description&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;The query to use in the search. Infer this from the user&#39;s message. It should be a question or a statement&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;items&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;type&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;string&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;required&#34;&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once I had this in place, I simply added this as a tool to an &lt;code&gt;OpenAIChatGenerator&lt;/code&gt; and followed it up by creating a mini app that keeps track of conversation history. I used Gradio (thank you for the example in the 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/40_building_chat_application_with_function_calling&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Building a Chat Application Tutorial&lt;/a&gt; Bilge!!) to wrap this into a demo app.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;See the cookbook for the full code example&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;demo&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;gr&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;ChatInterface&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;fn&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;chatbot_with_fc&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;examples&lt;/span&gt;=[
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Find the top 3 ages with the highest total absenteeism hours, excluding disciplinary failures&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;On which days of the week does the average absenteeism time exceed 4 hours?&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Who lives in London?&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;title&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Chat with your SQL Database&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;demo&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;launch&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/chatting-with-sql-databases-3-ways/chat.gif alt=&#34;Gradio Chat App&#34;  /&gt;
&lt;/p&gt;
&lt;h3 id=&#34;wrapping-it-up-chatting-with-sql-and-haystack-three-ways&#34;&gt;Wrapping it up: Chatting with SQL and Haystack three ways&lt;/h3&gt;
&lt;p&gt;In this post, you’ve learned a few approaches to chat with your SQL databases using Haystack 👇&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Building a Haystack custom component to serve as a SQL query interface, and using that component in a pipeline.&lt;/li&gt;
&lt;li&gt;Improving the pipeline by skipping irrelevant queries with conditional routing.&lt;/li&gt;
&lt;li&gt;Using function calling, tools, and Gradio to build a full stack text to SQL app 🚀&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Thanks for following along!&lt;/p&gt;
&lt;p&gt;If you’re hungry to learn more about SQL &#43; LLMs, check out our post on &lt;strong&gt;
&lt;a href=&#34;https://haystack.deepset.ai/blog/business-intelligence-sql-queries-llm&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Using Generative AI to Query Large BI Tables&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Level up Your RAG Application with Speaker Diarization</title>
      <link>https://haystack.deepset.ai/blog/level-up-rag-with-speaker-diarization/</link>
      <pubDate>Thu, 21 Mar 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/level-up-rag-with-speaker-diarization/</guid>
      <description>
      
      
      

      
      

      <p>LLMs work wonders on text data. Using LLMs, you can get answers to complex questions on long documents without having to read the document or even do a CTRL+F search. But what if you work with audio or video recordings?</p>
<p>The easiest way is to provide the LLM with the transcription of the recording. That way you can capture everything that is uttered in the audio or video. But what about information that was not spoken?</p>
<p>On audio or video recordings with multiple speakers, only transcribing the recording will not provide all there is to know to the LLM. The data on how many people are speaking, what each of them says is lost in written format.</p>
<p>So is it not possible to use LLMs to their full potential with multi-speaker recordings? Of course it is with <strong>Speaker Diarization</strong>!</p>
<h2 id="introduction-to-speaker-diarization">Introduction to Speaker Diarization</h2>
<p>With the 
<a href="https://pypi.org/project/assemblyai-haystack/" target="_blank" rel="noopener">assemblyai-haystack Python package</a> by 
<a href="https://www.assemblyai.com/" target="_blank" rel="noopener">AssemblyAI</a>, this would take you only 10 minutes to build. The Speaker Diarization model lets you detect multiple speakers in an audio file and what each speaker said. Thanks to the built-in speaker diarization feature, you can get the transcript of your audio or video recording in plain format and also speaker annotated format where each utterance in the recording is attributed to a speaker. Here is what it looks like:</p>
<blockquote>
<p>…</p>
<p><strong>Speaker B</strong> : <em>Good morning.</em></p>
<p><strong>Speaker A</strong>: <em>So what is it about the conditions right now that have caused this round of wildfires to affect so many people so far away?</em></p>
<p><strong>Speaker B</strong>: <em>Well, there’s a couple of things. The season has been pretty dry already, and then the fact that we’re getting hit in the US. Is because there’s a couple of weather systems that are essentially channeling the smoke from those Canadian wildfires through Pennsylvania into the Mid Atlantic and the Northeast and kind of just dropping the smoke there.</em></p>
<p><strong>Speaker A</strong>: <em>So what is it in this haze that makes it harmful? And I’m assuming it is.</em></p>
<p>…</p>
</blockquote>
<p>Let’s first see the benefits of passing the speaker annotated transcription to the LLM with examples. And later, we will go through the code that you can use to build with speaker labeled transcription.</p>
<p>For the comparison, we ran two RAG applications, one using the transcription of a recording and another speaker-labeled transcription to compare their responses. We’ll call them transcription application and speaker application from now on.</p>
<blockquote>
<p>Here are the YouTube videos used in the examples:</p>
<ul>
<li>Panel discussion on AI example audio: 
<a href="https://www.youtube.com/watch?v=z86LGEFyQpo" target="_blank" rel="noopener">https://www.youtube.com/watch?v=z86LGEFyQpo</a></li>
<li>Netflix earnings call: 
<a href="https://www.youtube.com/watch?v=v5EjcFtq_bc" target="_blank" rel="noopener">https://www.youtube.com/watch?v=v5EjcFtq_bc</a></li>
<li>Working from home debate: 
<a href="https://youtu.be/w8aOZ8rBZ-E?si=pVbwDZoDLpBEcPlc" target="_blank" rel="noopener">https://youtu.be/w8aOZ8rBZ-E?si=pVbwDZoDLpBEcPlc</a></li>
</ul>
</blockquote>
<h3 id="1-using-the-speaker-labels-we-can-get-each-persons-opinion-separately">#1: Using the speaker labels, we can get each person’s opinion separately</h3>
<p>In this panel discussion, the speakers are talking about AI technology and how they leverage it in their own organizations. Around the midpoint of this discussion, they answer the moderator’s question on what they think about building AI in-house vs. using third party providers.</p>
<p>The transcription application can only comment on the general thoughts that were conveyed during the panel. It cannot make the distinction between different speakers’ opinions. Whereas the speaker application reports on the opinion of each speaker.</p>
<p><strong>Audio file: Panel</strong> 👇</p>
<p><strong>Question: “What are each speakers’ opinions on building in-house or using third parties?”</strong></p>
<table>
  <thead>
      <tr>
          <th>Transcription application</th>
          <th>Speaker application</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>“The speakers believe that there is no one-size-fits-all approach to building in-house or using third parties. They take a mixed approach, considering factors such as core IP, feasibility, and value. They emphasize the importance of being clear about how to use time and resources, and of collaboration and partnerships. They also acknowledge that there may be friction between different teams, but this can be mitigated through change management.”</td>
          <td>“Speaker A is interested in understanding how companies decide between building in-house solutions or using third parties. Speaker C believes that the decision depends on whether the component is part of the company’s core IP and that a combination of market standards and in-house solutions is often used. Speaker D takes a mixed approach, using open source and partnering, and emphasizes the importance of embedding AI into the business. Speaker C thinks that the decision should be based on feasibility and value.”</td>
      </tr>
  </tbody>
</table>
<p>In this other recording, there is a quick exchange of opinions on the matter of working from home versus returning to the office. The presenters in the studio are of the opinion that working from home is just as productive as working in an office but the guest on the phone line disagrees with them.</p>
<p>The transcription application is able to identify opposing views yet can only name the people holding an opinion if they are explicitly named in the recording. By using speaker labels, you can clearly identify all the views in a discussion as well as the speaker who voiced it.</p>
<p><strong>Audio file: Working from home debate 👇</strong></p>
<p><strong>Question: “What are the two opposing opinions and how many people are on each side?”</strong></p>
<table>
  <thead>
      <tr>
          <th>Transcription application</th>
          <th>Speaker application</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>The opposing opinions in the transcription are between those who believe that returning to the office full-time is necessary for collaboration and productivity, and those who believe that remote work can be just as effective and offers more flexibility. The first side, represented by Jim, is skeptical about returning to the office full-time and is concerned about the environmental impact of commuting. The second side, represented by the speaker, acknowledges the importance of collaboration but believes that a hybrid approach of remote and in-office work can be beneficial.</td>
          <td>The opposing opinions are about the necessity of returning to the office and the balance of power between employees and managers. <strong>Speaker A and Speaker C seem to be on the side of flexibility and remote work, while Speaker B is more open to the idea of returning to the office but acknowledges the need for flexibility.</strong></td>
      </tr>
  </tbody>
</table>
<h3 id="2-using-the-speaker-labels-we-can-extract-practical-information-from-the-file">#2: Using the speaker labels, we can extract practical information from the file</h3>
<p>One of the important things you might want to analyze in recordings with multiple speakers is how many speakers there are or what their role is in this recording.</p>
<p>Unless it is explicitly mentioned in the recording, the transcription application cannot tell the number of speakers. It can, at best, identify that there are multiple people in this recording:</p>
<p><strong>Audio file: Panel 👇</strong></p>
<p><strong>Question: “How many people are speaking in this recording?”</strong></p>
<table>
  <thead>
      <tr>
          <th>Transcription application</th>
          <th>Speaker application</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>There are multiple people speaking in this recording, but the exact number is not provided in the transcription.</td>
          <td>There are three people speaking in this recording: Speaker A, Speaker B, and Speaker D.</td>
      </tr>
  </tbody>
</table>
<p><strong>Audio file: Netflix 👇</strong></p>
<p><strong>Question: “How many speakers and moderators are in this call?”</strong></p>
<table>
  <thead>
      <tr>
          <th>Transcription application</th>
          <th>Speaker application</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>There are four speakers and one moderator on this call.</td>
          <td>There are three speakers in this call: Ted Sorandos, Greg Peters, and Spence Newman. Spencer Wong is the moderator.</td>
      </tr>
  </tbody>
</table>
<p><strong>Audio file: Working from home debate 👇</strong></p>
<p><strong>Question: “How many people are speaking in this recording?”</strong></p>
<table>
  <thead>
      <tr>
          <th>Transcription application</th>
          <th>Speaker application</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>The transcription does not provide enough information to determine the exact number of people speaking in this recording.</td>
          <td>There are three people speaking in this recording.</td>
      </tr>
  </tbody>
</table>
<h2 id="code-explanation">Code explanation</h2>
<p>Let’s now see how to build an application with speaker labels using Haystack and AssemblyAI. In this 
<a href="https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/using_speaker_diarization_with_assemblyai.ipynb" target="_blank" rel="noopener">Colab notebook</a>, you will find the code if you’d like to get started quickly.</p>
<p>
<a href="https://github.com/deepset-ai/haystack" target="_blank" rel="noopener">Haystack</a> is an open-source Python framework for building production-ready LLM applications. It provides the tools needed to implement an end-to-end application and comes with the 
<a href="https://haystack.deepset.ai/integrations/assemblyai" target="_blank" rel="noopener">AssemblyAI integration</a> that allows you to incorporate transcription, summarization, and speaker labeling for audio data into your pipeline. For more, check out 
<a href="https://docs.haystack.deepset.ai/docs/intro" target="_blank" rel="noopener">Haystack documentation</a>.</p>
<p>We will use haystack as well as 
<a href="https://pypi.org/project/assemblyai-haystack/" target="_blank" rel="noopener">assemblyai-haystack package</a> for this application. Install all required packages using pip.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>pip install haystack
</span></span><span style="display:flex;"><span>pip install assemblyai-haystack
</span></span><span style="display:flex;"><span>pip install <span style="color:#2aa198">"sentence-transformers>=2.2.0"</span>
</span></span></code></pre></div><p>Next, we will set up the pipeline with the <code>AssemblyAITranscriber</code> to generate the speaker-labeled transcription of any audio file we pass.</p>
<p>Check out the 
<a href="https://www.assemblyai.com/docs/integrations/haystack" target="_blank" rel="noopener">documentation</a> to find out more about the assemblyai-haystack Python package.</p>
<p>To use this piece of code, make sure you create an 
<a href="https://www.assemblyai.com/" target="_blank" rel="noopener">AssemblyAI account</a> and get your free API key. You need an 
<a href="https://huggingface.co/settings/tokens" target="_blank" rel="noopener">access token from Hugging Face</a> to access the free Inference API as well.</p>
<p>First, initialize the required Haystack components to index documents with embeddings:</p>
<ul>
<li>
<a href="https://docs.haystack.deepset.ai/docs/inmemorydocumentstore" target="_blank" rel="noopener"><code>InMemoryDocumentStore</code></a>: to store your documents without external dependencies or extra setup</li>
<li>
<a href="https://haystack.deepset.ai/integrations/assemblyai" target="_blank" rel="noopener"><code>AssemblyAITranscriber</code></a>: to create speaker_labels for the given audio file and convert them into documents</li>
<li>
<a href="https://docs.haystack.deepset.ai/docs/documentsplitter" target="_blank" rel="noopener"><code>DocumentSplitter</code></a>: to split your documents into smaller chunks</li>
<li>
<a href="https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder" target="_blank" rel="noopener"><code>SentenceTransformersDocumentEmbedder</code></a>: to create embeddings for each document using sentence-transformers models</li>
<li>
<a href="https://docs.haystack.deepset.ai/docs/documentwriter" target="_blank" rel="noopener"><code>DocumentWriter</code></a>: to write these documents into your document store</li>
</ul>
<p>After creating all these components, initialize a Pipeline object, add these components to your pipeline, and <code>connect()</code> the components by indicating which component should be connected to which component next.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.writers</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">DocumentWriter</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.preprocessors</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">DocumentSplitter</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.embedders</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">SentenceTransformersDocumentEmbedder</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.document_stores.in_memory</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">InMemoryDocumentStore</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">assemblyai_haystack.transcriber</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">AssemblyAITranscriber</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.document_stores.types</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">DuplicatePolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.utils</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">ComponentDevice</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">speaker_document_store</span> = <span style="color:#268bd2">InMemoryDocumentStore</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">transcriber</span> = <span style="color:#268bd2">AssemblyAITranscriber</span>(<span style="color:#268bd2">api_key</span>=<span style="color:#268bd2">ASSEMBLYAI_API_KEY</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">speaker_splitter</span> = <span style="color:#268bd2">DocumentSplitter</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">split_by</span> = <span style="color:#2aa198">"sentence"</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">split_length</span> = <span style="color:#2aa198;font-weight:bold">10</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">split_overlap</span> = <span style="color:#2aa198;font-weight:bold">1</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">speaker_embedder</span> = <span style="color:#268bd2">SentenceTransformersDocumentEmbedder</span>(<span style="color:#268bd2">device</span>=<span style="color:#268bd2">ComponentDevice</span>.<span style="color:#268bd2">from_str</span>(<span style="color:#2aa198">"cuda:0"</span>))
</span></span><span style="display:flex;"><span><span style="color:#268bd2">speaker_writer</span> = <span style="color:#268bd2">DocumentWriter</span>(<span style="color:#268bd2">speaker_document_store</span>, <span style="color:#268bd2">policy</span>=<span style="color:#268bd2">DuplicatePolicy</span>.<span style="color:#268bd2">SKIP</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">transcriber</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"transcriber"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">speaker_splitter</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"speaker_splitter"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">speaker_embedder</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"speaker_embedder"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">speaker_writer</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"speaker_writer"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"transcriber.speaker_labels"</span>, <span style="color:#2aa198">"speaker_splitter"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"speaker_splitter"</span>, <span style="color:#2aa198">"speaker_embedder"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"speaker_embedder"</span>, <span style="color:#2aa198">"speaker_writer"</span>)
</span></span></code></pre></div><p>And then we’re ready to run this pipeline with any audio file. Make sure to set <code>speaker_labels</code> to True. It is also possible to get a summary of the contents of the audio file but we will not use it in this tutorial. We can set it to None or False.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">run</span>(
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#2aa198">"transcriber"</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#2aa198">"file_path"</span>: <span style="color:#2aa198">"/content/Netflix_Q4_2023_Earnings_Interview.mp3"</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#2aa198">"summarization"</span>: <span style="color:#859900;font-weight:bold">None</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#2aa198">"speaker_labels"</span>: <span style="color:#859900;font-weight:bold">True</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><p>Next, it is time to set up the retrieval augmentation (RAG) pipeline for speaker labels. For a RAG pipeline, we need:</p>
<ul>
<li>
<a href="https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder" target="_blank" rel="noopener"><code>SentenceTransformersTextEmbedder</code></a>: To create an embedding for the user query using sentence-transformers models</li>
<li>
<a href="https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever" target="_blank" rel="noopener"><code>InMemoryEmbeddingRetriever</code></a>: to retrieve <code>top_k</code> relevant documents to the user query</li>
<li>
<a href="https://docs.haystack.deepset.ai/docs/promptbuilder" target="_blank" rel="noopener"><code>PromptBuilder</code></a>: to provide a RAG prompt template with instructions to be filled with retrieved documents and the user query</li>
<li>
<a href="https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator" target="_blank" rel="noopener"><code>HuggingFaceAPIGenerator</code></a>: to infer models served through Hugging Face free Serverless Inference API or Hugging Face TGI</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.builders.prompt_builder</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">PromptBuilder</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.generators</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">HuggingFaceAPIGenerator</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.embedders</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">SentenceTransformersTextEmbedder</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.retrievers.in_memory</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">InMemoryEmbeddingRetriever</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.utils</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">ComponentDevice</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">open_chat_prompt</span> = <span style="color:#2aa198">"""
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">GPT4 Correct User: You will be provided with a transcription of a recording with each sentence or group of sentences attributed to a Speaker by the word "Speaker" followed by a letter representing the person uttering that sentence. Answer the given question based on the given context.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">If you think that given transcription is not enough to answer the question, say so.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">Transcription:
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% f</span><span style="color:#2aa198">or doc in documents %}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">  {</span><span style="color:#2aa198">% i</span><span style="color:#2aa198">f doc.meta["speaker"] %} Speaker {{doc.meta["speaker"]}}: {</span><span style="color:#2aa198">% e</span><span style="color:#2aa198">ndif %}{{doc.content}}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% e</span><span style="color:#2aa198">ndfor %}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">Question: {{ question }}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198"><|end_of_turn|>
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">GPT4 Correct Assistant:
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">"""</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">retriever</span> = <span style="color:#268bd2">InMemoryEmbeddingRetriever</span>(<span style="color:#268bd2">speaker_document_store</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">text_embedder</span> = <span style="color:#268bd2">SentenceTransformersTextEmbedder</span>(<span style="color:#268bd2">device</span>=<span style="color:#268bd2">ComponentDevice</span>.<span style="color:#268bd2">from_str</span>(<span style="color:#2aa198">"cuda:0"</span>))
</span></span><span style="display:flex;"><span><span style="color:#268bd2">answer_generator</span> = <span style="color:#268bd2">HuggingFaceAPIGenerator</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">api_type</span>=<span style="color:#2aa198">"serverless_inference_api"</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">api_params</span>={<span style="color:#2aa198">"model"</span>: <span style="color:#2aa198">"openchat/openchat-3.5-0106"</span>},
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">generation_kwargs</span>={<span style="color:#2aa198">"max_new_tokens"</span>:<span style="color:#2aa198;font-weight:bold">500</span>})
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_builder</span> = <span style="color:#268bd2">PromptBuilder</span>(<span style="color:#268bd2">template</span>=<span style="color:#268bd2">open_chat_prompt</span>)
</span></span></code></pre></div><p>After creating all these components, as you did for the first pipeline, initialize a Pipeline object, add these components to your pipeline, and <code>connect()</code> the components by indicating which component should be connected to which component next. As you create the connections, sometimes, you need to be more explicit about the output and input names. Explicitly connect the “documents”  output of <code>retriever</code> with “documents” input of the <code>prompt_builder</code> to make the connection obvious as <code>prompt_builder</code> has two inputs (“documents” and “question” variables we defined in the prompt template).</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">speaker_rag_pipe</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">speaker_rag_pipe</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#2aa198">"text_embedder"</span>, <span style="color:#268bd2">text_embedder</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">speaker_rag_pipe</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#2aa198">"retriever"</span>, <span style="color:#268bd2">retriever</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">speaker_rag_pipe</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#2aa198">"prompt_builder"</span>, <span style="color:#268bd2">prompt_builder</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">speaker_rag_pipe</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#2aa198">"llm"</span>, <span style="color:#268bd2">answer_generator</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">speaker_rag_pipe</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"text_embedder.embedding"</span>, <span style="color:#2aa198">"retriever.query_embedding"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">speaker_rag_pipe</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"retriever.documents"</span>, <span style="color:#2aa198">"prompt_builder.documents"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">speaker_rag_pipe</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"prompt_builder.prompt"</span>, <span style="color:#2aa198">"llm.prompt"</span>)
</span></span></code></pre></div><p>Once it’s done, you can use the following code to test the results on your own examples. Change the <code>top_k</code> value based on how many relevant documents you’d like to provide to your LLM:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">question</span> = <span style="color:#2aa198">"Who are the speakers in this recording?"</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">result</span> = <span style="color:#268bd2">speaker_rag_pipe</span>.<span style="color:#268bd2">run</span>({
</span></span><span style="display:flex;"><span>    <span style="color:#2aa198">"prompt_builder"</span>:{<span style="color:#2aa198">"question"</span>: <span style="color:#268bd2">question</span>},
</span></span><span style="display:flex;"><span>    <span style="color:#2aa198">"text_embedder"</span>:{<span style="color:#2aa198">"text"</span>: <span style="color:#268bd2">question</span>},
</span></span><span style="display:flex;"><span>    <span style="color:#2aa198">"retriever"</span>:{<span style="color:#2aa198">"top_k"</span>: <span style="color:#2aa198;font-weight:bold">10</span>}
</span></span><span style="display:flex;"><span>})
</span></span><span style="display:flex;"><span><span style="color:#268bd2">result</span>[<span style="color:#2aa198">"llm"</span>][<span style="color:#2aa198">"replies"</span>][<span style="color:#2aa198;font-weight:bold">0</span>]
</span></span></code></pre></div><h2 id="conclusion">Conclusion</h2>
<p>Thanks for reading! By combining the transcription capabilities of AssemblyAI with the power of Haystack, you can enhance your RAG systems with speaker labels, ensuring a more comprehensive and accurate understanding of the content.</p>
<p>If you want to stay on top of the latest Haystack developments, you can 
<a href="https://landing.deepset.ai/haystack-community-updates" target="_blank" rel="noopener">subscribe to our newsletter</a> or 
<a href="https://discord.com/invite/haystack" target="_blank" rel="noopener">join our Discord community</a>. Don’t forget to 
<a href="https://www.assemblyai.com/" target="_blank" rel="noopener">get your free API key</a> from AssemblyAI and 
<a href="https://www.youtube.com/@AssemblyAI" target="_blank" rel="noopener">subscribe to AssemblyAI’s YouTube channel</a> for weekly videos and tutorials on the latest developments in the AI world.</p>

      </description>
    </item>
    
    <item>
      <title>Hosted or self-hosted RAG? Full flexibility with NVIDIA NIM integration in Haystack</title>
      <link>https://haystack.deepset.ai/blog/haystack-nvidia-integration/</link>
      <pubDate>Mon, 18 Mar 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/haystack-nvidia-integration/</guid>
      <description>
      
      
      

      
      

      <h2 id="iteration-speed-across-design-options-matters">Iteration speed across design options matters</h2>
<p>Retrieval-augmented generation (RAG) is one of the most common architectures today for
building LLM applications. With RAG you can very quickly build a prototype that is tailored to your use case by connecting the LLM to the relevant data sources.</p>
<p>Once you have your first prototype up and running, you typically iterate a lot on your pipeline design before you go live in production: switching embedding models or generative LLMs, adding rerankers, or leveraging the metadata of your documents.</p>
<p>We often see our users iterate fifty or more times before settling on the design that meets their requirements. Besides optimizing the answer quality, users often have additional requirements like costs, latency or data privacy that determines their target architecture and deployment strategy for going to production.</p>
<p>Two crucial features are needed to support rapid prototyping during RAG development: the flexibility to experiment with many different models and features, and the ability to quickly switch between them.</p>
<p>Both are at the heart of our design for Haystack 2.0. New integrations with microservices from the NVIDIA AI Enterprise software platform, including NVIDIA NIM and NeMo Retriever, now allow you to iterate even faster across your RAG development cycle and give you full flexibility with deployment options: - Visit 
<a href="https://ai.nvidia.com" target="_blank" rel="noopener">NVIDIA API Catalog</a> to quickly experiment with the latest GPU-accelerated generative AI models on NVIDIA-hosted API endpoints. And then switch easily to self-hosted NVIDIA NIM microservice containers that can run in your own VPC or self-hosted cloud. 
<a href="https://developer.nvidia.com/blog/how-to-take-a-rag-application-from-pilot-to-production-in-four-steps/" target="_blank" rel="noopener">Learn more about taking RAG applications from pilot to production in this NVIDIA blog</a>.</p>
<h2 id="what-is-nvidia-nim">What is NVIDIA NIM?</h2>
<p>NVIDIA NIM is a set of easy-to-use microservices designed to speed up generative AI deployment in enterprises. Supporting a wide range of leading community and proprietary AI models, hosted on NVIDIA API catalog, NIM ensures seamless, scalable AI inferencing, whether on-premises or in the cloud, leveraging industry-standard APIs.</p>
<p>To get started, developers can visit the NVIDIA API catalog to interact with GPU-accelerated generative AI models directly from a web browser. The API catalog interface generates application code for every interaction. Developers can paste this code into their own applications to interact with the model endpoints programmatically. When ready
to deploy, NVIDIA AI Enterprise subscribers can export the model to a downloadable NVIDIA NIM container and run it anywhere. This gives developers ownership to their customizations and full control of their IP and AI application.</p>
<h2 id="use-nvidia-ai-with-haystack">Use NVIDIA AI with Haystack</h2>
<p>Now we’ll walk through the nitty-gritty details of how to build a RAG pipeline with Haystack and hosted NVIDIA APIs. You will build two 
<a href="https://docs.haystack.deepset.ai/docs/pipelines" target="_blank" rel="noopener">Haystack pipelines</a>: an indexing pipeline to create and store documents, and a RAG pipeline to query those documents.</p>
<p>In order for this code to work, you will need an 
<a href="https://org.ngc.nvidia.com/setup" target="_blank" rel="noopener">NVIDIA API key</a>. Set it as an environment variable, <code>NVIDIA_API_KEY</code>.</p>
<p>First: install the Haystack NVIDIA connector:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>pip install nvidia-haystack
</span></span></code></pre></div><p>Next, build an indexing pipeline. This example uses one of NVIDIA’s embedding models to turn
documents into vectors, and adds them to the <code>document_store</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack_integrations.components.generators.nvidia</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">NvidiaGenerator</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack_integrations.components.embedders.nvidia</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">NvidiaEmbeddingModel</span>, <span style="color:#268bd2">NvidiaDocumentEmbedder</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.dataclasses</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Document</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.writers</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">DocumentWriter</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.document_stores.in_memory</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">InMemoryDocumentStore</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">documents</span> = [
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(<span style="color:#268bd2">content</span>=<span style="color:#2aa198">"My name is Jean and I live in Paris."</span>),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(<span style="color:#268bd2">content</span>=<span style="color:#2aa198">"My name is Mark and I live in Berlin."</span>),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(<span style="color:#268bd2">content</span>=<span style="color:#2aa198">"My name is Giorgio and I live in Rome."</span>)]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">document_store</span> = <span style="color:#268bd2">InMemoryDocumentStore</span>()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">document_embedder</span> = <span style="color:#268bd2">NvidiaDocumentEmbedder</span>(<span style="color:#268bd2">model</span>=<span style="color:#2aa198">"nvolveqa_40k"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">writer</span> = <span style="color:#268bd2">DocumentWriter</span>(<span style="color:#268bd2">document_store</span>=<span style="color:#268bd2">document_store</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">document_embedder</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"document_embedder"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">writer</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"writer"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"document_embedder.documents"</span>, <span style="color:#2aa198">"writer.documents"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">run</span>(<span style="color:#268bd2">data</span>={<span style="color:#2aa198">"document_embedder"</span>:{<span style="color:#2aa198">"documents"</span>: <span style="color:#268bd2">documents</span>}})
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># Calling filter with no arguments will print the contents of the document store</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">document_store</span>.<span style="color:#268bd2">filter_documents</span>({})
</span></span></code></pre></div><p>Create a RAG pipeline to query the data.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.document_stores.in_memory</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">InMemoryDocumentStore</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.retrievers.in_memory</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">InMemoryEmbeddingRetriever</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.builders</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">PromptBuilder</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack_integrations.components.generators.nvidia</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">NvidiaGenerator</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack_integrations.components.embedders.nvidia</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">NvidiaEmbeddingModel</span>, <span style="color:#268bd2">NvidiaTextEmbedder</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt</span> = <span style="color:#2aa198">""" Answer the query, based on the
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">content in the documents.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">If you can't answer based on the given documents, say so.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">Documents:
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% f</span><span style="color:#2aa198">or doc in documents %}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198"> {{doc.content}}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% e</span><span style="color:#2aa198">ndfor %}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">Query: {{query}}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">"""</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">text_embedder</span> = <span style="color:#268bd2">NvidiaTextEmbedder</span>(<span style="color:#268bd2">model</span>=<span style="color:#2aa198">"playground_nvolveqa_40k"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">retriever</span> = <span style="color:#268bd2">InMemoryEmbeddingRetriever</span>(<span style="color:#268bd2">document_store</span>=<span style="color:#268bd2">document_store</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_builder</span> = <span style="color:#268bd2">PromptBuilder</span>(<span style="color:#268bd2">template</span>=<span style="color:#268bd2">prompt</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">generator</span> = <span style="color:#268bd2">NvidiaGenerator</span>(<span style="color:#268bd2">model</span>=<span style="color:#2aa198">"playground_nv_llama2_rlhf_70b"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">text_embedder</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"text_embedder"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">retriever</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"retriever"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">prompt_builder</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"prompt_builder"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">generator</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"generator"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"text_embedder.embedding"</span>, <span style="color:#2aa198">"retriever.query_embedding"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"retriever.documents"</span>, <span style="color:#2aa198">"prompt_builder.documents"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"prompt_builder"</span>, <span style="color:#2aa198">"generator"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">question</span> = <span style="color:#2aa198">"Who lives in Rome?"</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">result</span> = <span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">run</span>(<span style="color:#268bd2">data</span>={<span style="color:#2aa198">"text_embedder"</span>:{<span style="color:#2aa198">"text"</span>: <span style="color:#268bd2">question</span>},
</span></span><span style="display:flex;"><span>                           	                                      <span style="color:#2aa198">"prompt_builder"</span>:{<span style="color:#2aa198">"query"</span>: <span style="color:#268bd2">question</span>}})
</span></span><span style="display:flex;"><span><span style="color:#cb4b16">print</span>(<span style="color:#268bd2">result</span>)
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># {'text_embedder': {'meta': {'usage': {'prompt_tokens': 10, 'total_tokens': 10}}}, 'generator': {'replies': [Giorgio], 'meta': [{'role': 'assistant', 'finish_reason': 'stop'}], 'usage': {'completion_tokens': 3, 'prompt_tokens': 101, 'total_tokens': 104}}}</span>
</span></span></code></pre></div><p>The pipeline example above uses the API catalog endpoint for LLM inference and embedding. Switching from the API catalog endpoint to a self-hosted NIM microservice only requires the small addition of an <code>api_url</code> init
parameter to the embedder and generator.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">document_embedder</span> = <span style="color:#268bd2">NvidiaDocumentEmbedder</span>(<span style="color:#268bd2">model</span>=<span style="color:#2aa198">"nvolveqa_40k"</span>, <span style="color:#268bd2">api_url</span>=<span style="color:#2aa198">"<container_endpoint_url>"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">text_embedder</span> = <span style="color:#268bd2">NvidiaTextEmbedder</span>(<span style="color:#268bd2">model</span>=<span style="color:#2aa198">"nvolveqa_40k"</span>, <span style="color:#268bd2">api_url</span>=<span style="color:#2aa198">"<container_endpoint_url>"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">generator</span> = <span style="color:#268bd2">NvidiaGenerator</span>(<span style="color:#268bd2">model</span>=<span style="color:#2aa198">"mixtral_8x7b"</span>, <span style="color:#268bd2">api_url</span>=<span style="color:#2aa198">"<container_endpoint_url>"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">generator</span>.<span style="color:#268bd2">warm_up</span>()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">result</span> = <span style="color:#268bd2">generator</span>.<span style="color:#268bd2">run</span>(<span style="color:#268bd2">prompt</span>=<span style="color:#2aa198">"When was the Golden Gate Bridge built?"</span>)
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># The Golden Gate Bridge was built in 1937 and was completed and opened to the public on May 28, 1937... </span>
</span></span></code></pre></div><p>Using the NVIDIA Haystack connector, Deepset customers can apply the benefits of GPU-acceleration to their RAG
applications quickly and easily.</p>
<h2 id="wrapping-it-up">Wrapping it up</h2>
<ul>
<li>Want to learn more about Haystack 2.0? Get started with our 
<a href="https://haystack.deepset.ai/tutorials" target="_blank" rel="noopener">tutorials</a>, 
<a href="https://github.com/deepset-ai/haystack-cookbook/" target="_blank" rel="noopener">cookbooks</a>, and

<a href="https://haystack.deepset.ai/blog" target="_blank" rel="noopener">blog posts</a>.</li>
<li>Have more questions? 
<a href="https://discord.com/invite/xYvH6drSmA" target="_blank" rel="noopener">Join our community on Discord</a> or 
<a href="https://landing.deepset.ai/haystack-community-updates" target="_blank" rel="noopener">sign up for our monthly newsletter</a>.</li>
<li>Want to learn more about NVIDIA NIM and how to run the containers? Be on the lookout
for a more in-depth technical blog about how to use NIM with Haystack.</li>
</ul>

      </description>
    </item>
    
    <item>
      <title>Haystack 2.0:  The Composable Open-Source LLM Framework</title>
      <link>https://haystack.deepset.ai/blog/haystack-2-release/</link>
      <pubDate>Mon, 11 Mar 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/haystack-2-release/</guid>
      <description>
      
      
      

      
      

      <p>Today we are happy to announce 
<a href="/release-notes/2.0.0">the stable release of Haystack 2.0</a> - we’ve been working on this for a while, and some of you have already been 
<a href="/blog/introducing-haystack-2-beta-and-advent">testing the beta since its first release in December 2023</a>.</p>
<p>Haystack is an open-source Python framework for building production-ready LLM applications, with integrations to almost all major model providers and databases.</p>
<p>At its core, Haystack 2.0 is a major rework of the previous version with a very clear goal in mind: making it possible to implement composable AI systems that are easy to use, customize, extend, optimise, evaluate and ultimately deploy to production.</p>
<p>We encourage you to start using Haystack 2.0 as of today, whether you’ve been a Haystack user before or not. You can get started by installing <code>haystack-ai</code>, our new package for Haystack 2.0</p>
<blockquote>
<p>⭐️ <strong>To get started:</strong></p>
<p><code>pip install haystack-ai</code> and follow the 
<a href="/overview/quick-start">get started</a> instructions to build your first LLM app with just a few lines of code.</p>
</blockquote>
<p>If you’re already using Haystack 1.0 in production, don’t worry! If your applications depend on <code>farm-haystack</code> and you’re not ready to migrate just yet, you don’t have to take any action: we will keep supporting Haystack 1.0, releasing security updates and critical bug fixes, giving everybody enough time to migrate. In the coming weeks, we will also start sharing some migration guides to help you along the way.</p>
<h2 id="why-haystack-20">Why Haystack 2.0?</h2>
<p>
<a href="https://github.com/deepset-ai/haystack/releases/tag/0.2.1" target="_blank" rel="noopener">Haystack was first officially released in 2020</a>, in the good old days when the forefront of NLP was semantic search, retrieval, and extractive question-answering. During this time, we established the core of what makes Haystack <em>Haystack</em>: 
<a href="https://docs.haystack.deepset.ai/docs/nodes_overview" target="_blank" rel="noopener">Components</a> and 
<a href="https://docs.haystack.deepset.ai/docs/pipelines" target="_blank" rel="noopener">Pipelines</a>. These allowed users to build end-to-end applications by combining their desired language models (embedding, extractive QA, ranking) with their database of choice.</p>
<p>The boom of LLMs in 2023 made two things clear:</p>
<ol>
<li>👍 The pipeline-component structure is a great abstraction for building composable LLM applications with many moving parts.</li>
<li>👎 Haystack 1.0 often assumed that you would be doing retrieval and extractive QA over a set of documents, imposing limitations and providing a developer experience far from ideal when building LLM applications.</li>
</ol>
<p>So, we decided that the best thing we could do for Haystack and our community was to rewrite the component and pipeline architecture to keep up with the fast-paced AI industry. While Haystack 2.0 is a complete rewrite, the underlying principle of composing components into flexible pipelines remains the same.</p>
<p>With that, let’s take a look at the pillars of Haystack 2.0:</p>
<ul>
<li>Composable and customizable pipelines</li>
<li>A common interface for storing data</li>
<li>A clear path to production</li>
<li>Optimization and Evaluation for Retrieval Augmentation</li>
</ul>
<h2 id="composable-and-customizable-pipelines">Composable and customizable Pipelines</h2>
<p>Modern LLM applications 
<a href="https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/" target="_blank" rel="noopener">comprise many moving parts</a>: retrievers, rankers, LLMs, and many more such as entity extractors, summarizers, format converters and data cleaners. Each one of these ‘subtasks’ is a <em>component</em> in Haystack.</p>
<p>With the first version of Haystack we proved that pipelines are a good abstraction for connecting all those moving parts, but some of the assumptions we made in Haystack 1.0 dated back to a pre-LLM era and needed rethinking.</p>
<p>One important limitation in Haystack 1.0 is that loops are not allowed, and the pipeline graph has to be acyclic. This makes it difficult to implement, for example, agents, which are often designed with a reasoning flow that loops until a task is resolved.</p>
<p>In Haystack 2.0 the pipeline graph can have cycles. Combined with decision components (think about if-then-else clauses in the execution flow) and routers (components that direct the execution flow towards a specific subgraph depending on the input) this can be used to build sophisticated loops that model agentic behavior.</p>
<h3 id="customizable-components">Customizable Components</h3>
<p>We believe that the design of an AI framework should meet the following requirements:</p>
<ul>
<li><strong>Be technology agnostic:</strong> Allow users the flexibility to decide what vendor or technology they want for <em>each</em> of these components and make it easy to switch out any component for another.</li>
<li><strong>Be explicit:</strong> Make it transparent as to how these components can “talk” to each other.</li>
<li><strong>Be flexible:</strong> Make it possible to create custom components whenever custom behavior is desirable.</li>
<li><strong>Be extensible:</strong> Provide a uniform and easy way for the community and third parties to build their own components and foster an open ecosystem around Haystack.</li>
</ul>
<p>All components in Haystack 2.0 (including 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/integrations" target="_blank" rel="noopener">Haystack Integrations</a>) are built with a common “component” interface. The principle is simple:</p>
<ul>
<li>A component implements some logic in a method called <code>run</code></li>
<li>The <code>run</code> method receives one or more input values</li>
<li>The <code>run</code> method returns one or more output values</li>
</ul>
<p>Take 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/embedders" target="_blank" rel="noopener">embedders</a> as an example: these components expect text as input and create vector representations (embeddings) that they return as output. On the other hand, 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/retrievers" target="_blank" rel="noopener">retrievers</a> may need embeddings as input and return documents as output. When creating a new component, to decide what inputs and outputs it should have is part of the ideation process.</p>
<p>While there are many ready-made components built into Haystack, we want to highlight that 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/custom-components" target="_blank" rel="noopener">building your own custom components</a> is also a core functionality of Haystack 2.0.</p>
<blockquote>
<p>In fact, we’ve taken advantage of this ourselves. For example, you can 
<a href="https://haystack.deepset.ai/blog/optimizing-retrieval-with-hyde" target="_blank" rel="noopener">read about how to use the latest optimization techniques</a> (like HyDE) in Haystack pipelines with custom components.</p>
</blockquote>
<h3 id="sharing-custom-components">Sharing Custom Components</h3>
<p>Since the release of Haystack 2.0-Beta, we’ve seen the benefits of having a well-defined simple interface for components. We, our community, and third parties have already created many components, available as additional packages for you to install.</p>
<p>We share these on the 
<a href="https://haystack.deepset.ai/integrations" target="_blank" rel="noopener">Haystack Integrations</a> page, which has expanded to include all sorts of components over the last few months (with contributions from 
<a href="https://haystack.deepset.ai/integrations/assemblyai" target="_blank" rel="noopener">Assembly AI</a>, 
<a href="https://haystack.deepset.ai/integrations/jina" target="_blank" rel="noopener">Jina AI</a>, 
<a href="https://haystack.deepset.ai/integrations/mixedbread-ai" target="_blank" rel="noopener">mixedbread ai</a> and more). We will continue to expand this page with new integrations and you can help us by creating a PR on 
<a href="https://github.com/deepset-ai/haystack-integrations" target="_blank" rel="noopener">haystack-integrations</a> if you’d like to share a component with the community. To learn more about integrations and how to share them, you can check out our 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/integrations" target="_blank" rel="noopener">“Introduction to Integrations” documentation</a>.</p>
<h2 id="a-common-interface-for-storing-data">A common interface for storing data</h2>
<p>Most NLP applications work on large amounts of data. A common design pattern is to connect your internal knowledge base to a Large Language Model (LLM) so that it can answer questions, summarize or translate documents, and extract specific information. For example, in retrieval-augment generative pipelines (RAG), you often use an LLM to answer questions about some data that was previously retrieved.</p>
<p>This data has to come from somewhere, and Haystack 2.0 provides a common interface to access it in a consistent way, independently from where data comes from. This interface is called “Document Store”, and it’s implemented for many different storage services, to make data easily available from within Haystack pipelines.</p>
<p>Today, we are releasing Haystack 2.0 with a 
<a href="https://haystack.deepset.ai/integrations?type=Document+Store" target="_blank" rel="noopener">large selection of database and vector store integrations</a>. These include 
<a href="https://haystack.deepset.ai/integrations/chroma-documentstore" target="_blank" rel="noopener">Chroma</a>, 
<a href="https://haystack.deepset.ai/integrations/weaviate-document-store" target="_blank" rel="noopener">Weaviate</a>, 
<a href="https://haystack.deepset.ai/integrations/pinecone-document-store" target="_blank" rel="noopener">Pinecone</a>, 
<a href="https://haystack.deepset.ai/integrations/qdrant-document-store" target="_blank" rel="noopener">Qdrant</a>, 
<a href="https://haystack.deepset.ai/integrations/elasticsearch-document-store" target="_blank" rel="noopener">Elasticsearch</a>, 
<a href="https://haystack.deepset.ai/integrations/opensearch-document-store" target="_blank" rel="noopener">Open Search</a>, 
<a href="https://haystack.deepset.ai/integrations/pgvector-documentstore" target="_blank" rel="noopener">pgvector</a>, 
<a href="https://haystack.deepset.ai/integrations/mongodb" target="_blank" rel="noopener">MongoDB</a>, 
<a href="https://haystack.deepset.ai/integrations/astradb" target="_blank" rel="noopener">AstraDB</a>, 
<a href="https://haystack.deepset.ai/integrations/neo4j-document-store" target="_blank" rel="noopener">Neo4j</a>, 
<a href="https://haystack.deepset.ai/integrations/marqo-document-store" target="_blank" rel="noopener">Marqo DB</a>, and the list will keep growing. And if your storage service is not supported yet, or should you need a high degree of customization on top of an existing one, by following our 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/creating-custom-document-stores" target="_blank" rel="noopener">guide to creating custom document stores</a>, you can connect your Haystack pipelines to your data from pretty much any storage service.</p>
<h2 id="a-clear-path-to-production">A clear path to production</h2>
<p>The experience we got over the last couple of years, working on Haystack 1.0 and interacting with its community, taught us two things:</p>
<ol>
<li>It’s essential for any AI application framework to be feature-complete and developer-friendly.</li>
<li>It’s only after the deployment phase that AI-based applications can truly make an impact.</li>
</ol>
<p>While rewriting the framework from scratch, we took the opportunity to incorporate specific features that would simplify the deployment of Haystack-based AI applications in a production-grade environment:</p>
<ul>
<li>A customizable 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/logging" target="_blank" rel="noopener">logging system</a> that supports structured logging and tracing correlation out of the box.</li>
<li>
<a href="https://docs.haystack.deepset.ai/v2.0/docs/tracing" target="_blank" rel="noopener">Code instrumentation collecting spans and traces</a> in strategic points of the execution path, with support for Open Telemetry and Datadog already in place.</li>
</ul>
<p>In addition, we decided to start a dedicated project to simplify deploying Haystack pipelines behind a RESTful API: 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/hayhooks" target="_blank" rel="noopener">Hayhooks</a>.</p>
<p>Hayhooks is a client-server application that allows you to deploy Haystack pipelines, serving them through HTTP endpoints dynamically spawned. Two foundational features of Haystack 2.0 made this possible:</p>
<ol>
<li>
<a href="https://docs.haystack.deepset.ai/v2.0/reference/pipeline-api#pipelineinputs" target="_blank" rel="noopener">The ability to introspect a pipeline</a>, determining its inputs and outputs at runtime. This means that every REST endpoint has well-defined, dynamically generated schemas for the request and response body, all depending on the specific pipeline structure.</li>
<li>
<a href="https://docs.haystack.deepset.ai/v2.0/docs/serialization" target="_blank" rel="noopener">A robust serialization mechanism</a>. This allows for the conversion of Haystack pipelines from Python to a preferred data serialization format, and vice versa. The default format is YAML but Haystack is designed to easily extend support for additional serialization formats.</li>
</ol>
<h2 id="optimization-and-evaluation-of-retrieval-augmentation">Optimization and Evaluation of Retrieval Augmentation</h2>
<p>We’ve already been seeing the benefits of the new Haystack design, with pipeline optimization and evaluation being good examples of how we’ve been leveraging Haystack 2.0. How?:</p>
<ul>
<li>It’s easier to extend the capabilities of Haystack</li>
<li>It’s easy to implement new integrations</li>
</ul>
<h3 id="implementing-the-latest-retrieval-optimizations">Implementing the latest retrieval optimizations</h3>
<p>Retrieval is a crucial step for successful RAG pipelines. And there’s been a lot of work to optimize this step. With Haystack 2.0, we’ve been able to:</p>
<ul>
<li>Implement Hypothetical Document Embeddings (HyDE) easily, and we’ve already published 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/hypothetical-document-embeddings-hyde" target="_blank" rel="noopener">a guide to HyDE</a> along with 
<a href="https://haystack.deepset.ai/blog/optimizing-retrieval-with-hyde" target="_blank" rel="noopener">an example walkthrough</a></li>
<li>Added an integration for 
<a href="https://haystack.deepset.ai/integrations/optimum" target="_blank" rel="noopener">Optimum</a> embedders by Hugging Face</li>
</ul>
<p>And we will be able to add more optimization techniques along the way!</p>
<h3 id="evaluation">Evaluation</h3>
<p>Haystack 2.0 is being released with a few evaluation framework integrations in place:</p>
<ul>
<li>
<a href="https://haystack.deepset.ai/integrations/ragas" target="_blank" rel="noopener">Ragas</a></li>
<li>
<a href="https://haystack.deepset.ai/integrations/deepeval" target="_blank" rel="noopener">DeepEval</a></li>
</ul>
<p>Along with a 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/model-based-evaluation" target="_blank" rel="noopener">guide to model-based evaluation</a>.</p>
<h2 id="start-using-haystack-20">Start using Haystack 2.0</h2>
<p>Alongside Haystack 2.0, today we are also releasing a whole set of new tutorials, documentation, resources and more to help you get started:</p>
<ul>
<li>
<a href="https://docs.haystack.deepset.ai/docs" target="_blank" rel="noopener">Documentation</a>: full technical documentation on all Haystack concepts and components</li>
<li>
<a href="https://haystack.deepset.ai/tutorials" target="_blank" rel="noopener">Tutorials</a>: step-by-step, runnable Colab notebooks. Start with our first 2.0 tutorial 
<a href="https://haystack.deepset.ai/tutorials/27_first_rag_pipeline" target="_blank" rel="noopener">“Creating Your First QA Pipeline with Retrieval-Augmentation”</a></li>
<li>
<a href="https://github.com/deepset-ai/haystack-cookbook" target="_blank" rel="noopener">Cookbooks</a>: A collection of useful notebooks that showcase Haystack in various scenarios, using a number of our integrations.</li>
</ul>
<p>And, as always, keep an eye out on our 
<a href="https://haystack.deepset.ai/blog" target="_blank" rel="noopener">blog</a> and 
<a href="https://haystack.deepset.ai/integrations" target="_blank" rel="noopener">integrations</a> for updates and new content.</p>
<h2 id="join-the-community">Join the Community</h2>
<p>Stay up-to-date with Haystack:</p>
<ul>
<li>
<a href="https://discord.com/invite/xYvH6drSmA" target="_blank" rel="noopener">Discord</a></li>
<li>
<a href="https://landing.deepset.ai/haystack-community-updates" target="_blank" rel="noopener">Subscribe to our newsletter</a></li>
<li>
<a href="https://twitter.com/Haystack_AI" target="_blank" rel="noopener">Twitter</a></li>
<li>
<a href="https://github.com/deepset-ai/haystack" target="_blank" rel="noopener">GitHub</a></li>
</ul>

      </description>
    </item>
    
    <item>
      <title>Optimizing Retrieval with HyDE</title>
      <link>https://haystack.deepset.ai/blog/optimizing-retrieval-with-hyde/</link>
      <pubDate>Wed, 28 Feb 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/optimizing-retrieval-with-hyde/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/hypothetical-document-embeddings-hyde&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hypothetical Document Embeddings (HyDE)&lt;/a&gt; is a technique proposed in the paper “
&lt;a href=&#34;https://aclanthology.org/2023.acl-long.99/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Precise Zero-Shot Dense Retrieval without Relevance Labels&lt;/a&gt;” which improves retrieval by generating “fake” hypothetical documents based on a given query, and then uses those “fake” documents embeddings to retrieve similar documents from the same embedding space.&lt;/p&gt;
&lt;p&gt;In this article, we will see how to implement and incorporate it into Haystack by creating a 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/custom-components&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;custom component&lt;/a&gt; that implements HyDE.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;To learn more about how HyDE works, and where it’s useful, check out our guide on 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/hypothetical-document-embeddings-hyde&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hypothetical Document Embeddings (HyDE)&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;build-a-pipeline-to-create-hypothetical-document-embeddings&#34;&gt;Build a Pipeline to Create Hypothetical Document Embeddings&lt;/h2&gt;
&lt;p&gt;First, let’s build a simple pipeline to generate these hypothetical documents. To do so, we will use the following Haystack components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/promptbuilder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;PromptBuilder&lt;/code&gt;&lt;/a&gt; and 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/openaigenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;OpenAIGenerator&lt;/code&gt;&lt;/a&gt; to query an instruction-following language model and generate hypothetical documents.&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/sentencetransformersdocumentembedder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;SentenceTransformersDocumentEmbedder&lt;/code&gt;&lt;/a&gt; encodes the hypothetical documents into vector embeddings.&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/outputadapter&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;OutputAdapter&lt;/code&gt;&lt;/a&gt; to adapt the output of the &lt;code&gt;Generator&lt;/code&gt; to be compatible with the input of the &lt;code&gt;SentenceTransformersDocumentEmbedder&lt;/code&gt;, which expects &lt;code&gt;List[Document]&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;To use the &lt;code&gt;OpenAIGenerator&lt;/code&gt;, you need to set your &lt;code&gt;OPENAI_API_KEY&lt;/code&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;export&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OPENAI_API_KEY&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;secret_string&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;We first build a way to query an instruction-following language model to generate hypothetical documents.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators.openai&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-3.5-turbo&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;							&lt;span style=&#34;color:#268bd2&#34;&gt;generation_kwargs&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;n&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;temperature&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.75&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;max_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;400&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;Given a question, generate a paragraph of text that answers the question.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;			Question: {{question}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;			Paragraph:&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will output a list of 5 hypothetical documents, the same number the authors used for the experiments in the paper. We then use the &lt;code&gt;SentenceTransformersDocumentEmbedder&lt;/code&gt; to encode these hypothetical documents into embeddings.&lt;/p&gt;
&lt;p&gt;But, the &lt;code&gt;SentenceTransformersDocumentEmbedder&lt;/code&gt; expects &lt;code&gt;List[Document]&lt;/code&gt; objects as input, so we need to adapt the output of the &lt;code&gt;OpenAIGenerator&lt;/code&gt; to be compatible with the input of the &lt;code&gt;SentenceTransformersDocumentEmbedder&lt;/code&gt;. For this, we use an &lt;code&gt;OutputAdapter&lt;/code&gt; with a &lt;code&gt;custom filter&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.converters&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OutputAdapter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.embedders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;typing&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;adapter&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OutputAdapter&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;{{answers | build_doc}}&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;output_type&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;custom_filters&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;build_doc&#34;&lt;/span&gt;: &lt;span style=&#34;color:#859900&#34;&gt;lambda&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;: [&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;d&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;d&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;]}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sentence-transformers/all-MiniLM-L6-v2&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;warm_up&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can now create a custom component, &lt;code&gt;HypotheticalDocumentEmbedder&lt;/code&gt;, that expects &lt;code&gt;documents&lt;/code&gt; and can return a list of &lt;code&gt;hypotethetical_embeddings&lt;/code&gt; which is the average of the embeddings from the “hypothetical” (fake) documents.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;numpy&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;array&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;mean&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;component&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;@component&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;HypotheticalDocumentEmbedder&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;@component.output_types&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;hypothetical_embedding&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;float&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;]):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;stacked_embeddings&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;array&lt;/span&gt;([&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;embedding&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;avg_embeddings&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;mean&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;stacked_embeddings&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;axis&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;hyde_vector&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;avg_embeddings&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;reshape&lt;/span&gt;((&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#cb4b16&#34;&gt;len&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;avg_embeddings&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;hypothetical_embedding&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;hyde_vector&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;tolist&lt;/span&gt;()}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we can add all of these into a pipeline and generate hypothetical document embeddings.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;hyde&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;HypotheticalDocumentEmbedder&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;adapter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;adapter&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;hyde&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;hyde&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator.replies&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;adapter.answers&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;adapter.output&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder.documents&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;hyde.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What should I do if I have a fever?&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Below a graphical representation of the pipeline we created&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/optimizing-retrieval-with-hyde/hyde.png alt=&#34;hyde.png&#34;  /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;build-a-complete-hyde-component&#34;&gt;Build a Complete HyDE Component&lt;/h2&gt;
&lt;p&gt;Optionally, we could also create a  &lt;code&gt;HypotheticalDocumentEmbedder&lt;/code&gt;  that encapsulates the entire logic that we saw above. This way, we would be able to use this one components for improved retrieval.&lt;/p&gt;
&lt;p&gt;This component can do a few things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Allow the user to pick the LLM which generates the hypothetical documents&lt;/li&gt;
&lt;li&gt;Allow users to define how many documents should be created with  &lt;code&gt;nr_completions&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Allow users to define the embedding model they want to use to generate the HyDE embeddings.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;component&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;default_to_dict&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;default_from_dict&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.converters&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OutputAdapter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.embedders.sentence_transformers_document_embedder&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators.openai&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;typing&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Dict&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;Any&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;numpy&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;array&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;mean&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.utils&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Secret&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;@component&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;HypotheticalDocumentEmbedder&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;__init__&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;instruct_llm&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-3.5-turbo&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;instruct_llm_api_key&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Secret&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Secret&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_env_var&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;OPENAI_API_KEY&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;nr_completions&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;int&lt;/span&gt; = &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;embedder_model&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sentence-transformers/all-MiniLM-L6-v2&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;instruct_llm&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;instruct_llm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;instruct_llm_api_key&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;instruct_llm_api_key&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;nr_completions&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;nr_completions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;embedder_model&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;embedder_model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;instruct_llm_api_key&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;instruct_llm&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;generation_kwargs&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;n&#34;&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;nr_completions&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;temperature&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.75&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;max_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;400&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;Given a question, generate a paragraph of text that answers the question.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;            Question: {{question}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;            Paragraph:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;            &#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;adapter&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OutputAdapter&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;{{answers | build_doc}}&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;output_type&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;custom_filters&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;build_doc&#34;&lt;/span&gt;: &lt;span style=&#34;color:#859900&#34;&gt;lambda&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;: [&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;d&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;d&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;]},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;embedder_model&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;progress_bar&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;warm_up&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;adapter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;adapter&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;embedder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator.replies&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;adapter.answers&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;adapter.output&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to_dict&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Dict&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;Any&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;default_to_dict&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;instruct_llm&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;instruct_llm&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;instruct_llm_api_key&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;instruct_llm_api_key&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;nr_completions&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;nr_completions&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;embedder_model&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;embedder_model&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;pipeline&#34;&lt;/span&gt;] = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;to_dict&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;@classmethod&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;from_dict&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;cls&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Dict&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;Any&lt;/span&gt;]) -&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;HypotheticalDocumentEmbedder&#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;hyde_obj&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;default_from_dict&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;cls&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;hyde_obj&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_dict&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;pipeline&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;hyde_obj&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;@component.output_types&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;hypothetical_embedding&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;float&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# return a single query vector embedding representing the average of the hypothetical document embeddings&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;stacked_embeddings&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;array&lt;/span&gt;([&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;embedding&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;documents&#34;&lt;/span&gt;]])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;avg_embeddings&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;mean&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;stacked_embeddings&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;axis&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;hyde_vector&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;avg_embeddings&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;reshape&lt;/span&gt;((&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#cb4b16&#34;&gt;len&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;avg_embeddings&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;hypothetical_embedding&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;hyde_vector&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;tolist&lt;/span&gt;()}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;using-the-hypotheticaldocumentembedder-for-retrieval&#34;&gt;Using the &lt;code&gt;HypotheticalDocumentEmbedder&lt;/code&gt; for Retrieval&lt;/h3&gt;
&lt;p&gt;As a final step, let’s see how we can use our new component in a retrieval pipeline. To start, we can create a document store that has some data in it.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;datasets&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;load_dataset&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;Dataset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.embedders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.preprocessors&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentCleaner&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentSplitter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.writers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentWriter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;embedder_model&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sentence-transformers/all-MiniLM-L6-v2&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;index_docs&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Dataset&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cleaner&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentCleaner&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;splitter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentSplitter&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;split_by&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sentence&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;split_length&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;10&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersDocumentEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;embedder_model&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;writer&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentWriter&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;policy&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;skip&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cleaner&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;splitter&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;splitter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;writer&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cleaner&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;documents&#34;&lt;/span&gt;: [&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_dict&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;doc&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;train&#34;&lt;/span&gt;]]}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;load_dataset&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Tuana/game-of-thrones&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;doc_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;index_docs&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now that we’ve populated an &lt;code&gt;InMemoryDocumentStore&lt;/code&gt; with some data, let’s see how we can use the &lt;code&gt;HypotheticalDocumentEmbedder&lt;/code&gt; as a way to retrieve documents 👇&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;retriever_with_hyde&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;doc_store&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;hyde&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;HypotheticalDocumentEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;instruct_llm&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gpt-3.5-turbo&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;nr_completions&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;doc_store&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;hyde&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query_embedder&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query_embedder.hypothetical_embedding&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever.query_embedding&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;retriever_with_hyde&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;doc_store&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Who is Araya Stark?&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retrieval_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query_embedder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;}, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;top_k&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;wrapping-it-up&#34;&gt;Wrapping it up&lt;/h2&gt;
&lt;p&gt;If you’ve gotten this far, you know how to use the HyDE technique and how easy it is to incorporate it into Haystack. To learn more about Haystack,  
&lt;a href=&#34;https://discord.gg/QMP5jgMH&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;join us on Discord&lt;/a&gt;  or  
&lt;a href=&#34;https://landing.deepset.ai/haystack-community-updates&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;sign up for our monthly newsletter&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Announcing the Astra DB Haystack Integration</title>
      <link>https://haystack.deepset.ai/blog/astradb-haystack-integration/</link>
      <pubDate>Fri, 19 Jan 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/astradb-haystack-integration/</guid>
      <description>
      
      
      

      
      

      <p>The Haystack extension family is growing so fast, it’s hard to keep up! Our latest addition is the Astra DB extension by 
<a href="https://datastax.com/" target="_blank" rel="noopener">Datastax</a>. It’s an open source package that helps you use Astra DB as a vector database for your Haystack pipelines.</p>
<p>Let’s learn about the benefits of Astra DB and how to use it with Haystack.</p>
<h3 id="benefits-of-astra-db">Benefits of Astra DB</h3>
<p>DataStax Astra DB is a serverless vector database built on 
<a href="https://cassandra.apache.org/_/index.html" target="_blank" rel="noopener">Apache Cassandra</a>. What makes Astra DB special?</p>
<ul>
<li><strong>Interoperability</strong> with Cassandra’s open source ecosystem and tooling.</li>
<li>Astra DB <strong>supports a variety of different embedding models</strong>. One Astra database instance can have multiple <code>collections</code> with different vector sizes. This makes it easy to test different embedding models and find the best one for your use case.</li>
<li><strong>It’s serverless</strong>. What does that mean for a database? You don’t have to manage individual instances, or deal with cumbersome upgrading or scaling. All of that is taken care of for you behind the scenes.</li>
<li><strong>Enterprise scalability</strong>. Astra DB can be deployed across the major cloud providers (AWS, GCP, or Azure) and across multiple regions depending on your needs.</li>
<li>At the time of this writing, <strong>there’s a free tier available</strong> so you can try it without a credit card.</li>
</ul>
<h3 id="create-your-astra-db-database">Create your Astra DB database</h3>
<p>To ensure these instructions remain up to date, we’re going to point you to the Astra DB docs to explain how to create a database.</p>
<ol>
<li>
<a href="https://docs.datastax.com/en/astra/astra-db-vector/databases/create-database.html#create-vector-database" target="_blank" rel="noopener">Create a free Astra DB database</a>. Make a note of your credentials - you’ll need your Astra API endpoint and Astra application token  to use the Haystack extension.</li>
<li>Choose the number of dimensions that matches the 
<a href="https://haystack.deepset.ai/blog/what-is-text-vectorization-in-nlp" target="_blank" rel="noopener">embedding model</a> you plan on using. For this example we’ll use a 384-dimension model, 
<a href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2" target="_blank" rel="noopener"><code>sentence-transformers/all-MiniLM-L6-v2</code></a>.</li>
<li>
<a href="https://docs.datastax.com/en/astra/astra-db-vector/databases/manage-collections.html#create-collection" target="_blank" rel="noopener">Create a collection</a> with the same number of dimensions as your embedding model. Save the name of your collection since you’ll need this as well.</li>
</ol>
<h3 id="get-started-with-the-astra-db-haystack-integration">Get started with the Astra DB Haystack Integration</h3>
<p>First, install the integration:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>pip install astra-haystack sentence-transformers
</span></span></code></pre></div><p>Remember earlier when I mentioned you were going to need your credentials? I hope you saved them. If not, that’s okay, you can go back to the 
<a href="https://astra.datastax.com/" target="_blank" rel="noopener">Astra Portal</a> and grab them.</p>
<blockquote>
<p>Note: if you were running this code in production, you’d want to save these as environment variables to keep things nice and secure.</p>
</blockquote>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">getpass</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">getpass</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">os</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">os</span>.<span style="color:#268bd2">environ</span>[<span style="color:#2aa198">"OPENAI_API_KEY"</span>] = <span style="color:#268bd2">getpass</span>(<span style="color:#2aa198">"Enter your openAI key:"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">os</span>.<span style="color:#268bd2">environ</span>[<span style="color:#2aa198">"ASTRA_DB_API_ENDPOINT"</span>] = <span style="color:#268bd2">getpass</span>(<span style="color:#2aa198">"Enter your Astra API Endpoint:"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">os</span>.<span style="color:#268bd2">environ</span>[<span style="color:#2aa198">"ASTRA_DB_APPLICATION_TOKEN"</span>] = <span style="color:#268bd2">getpass</span>(<span style="color:#2aa198">"Enter your Astra application token (e.g.AstraCS:xxx ):"</span>)
</span></span></code></pre></div><h2 id="using-the-astra-documentstore-in-an-index-pipeline">Using the Astra DocumentStore in an index pipeline</h2>
<p>Next, we’ll make a Haystack pipeline, create some embeddings from documents, and add them into the 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/astradocumentstore" target="_blank" rel="noopener"><code>AstraDocumentStore</code></a>.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">logging</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Document</span>, <span style="color:#268bd2">Pipeline</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.embedders</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">SentenceTransformersDocumentEmbedder</span>, <span style="color:#268bd2">SentenceTransformersTextEmbedder</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.writers</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">DocumentWriter</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.document_stores.types</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">DuplicatePolicy</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack_integrations.document_stores.astra</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">AstraDocumentStore</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">logger</span> = <span style="color:#268bd2">logging</span>.<span style="color:#268bd2">getLogger</span>(<span style="color:#268bd2">__name__</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">logging</span>.<span style="color:#268bd2">basicConfig</span>(<span style="color:#268bd2">level</span>=<span style="color:#268bd2">logging</span>.<span style="color:#268bd2">INFO</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">embedding_model_name</span> = <span style="color:#2aa198">"sentence-transformers/all-MiniLM-L6-v2"</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># Make sure ASTRA_DB_API_ENDPOINT and ASTRA_DB_APPLICATION_TOKEN environment variables are set before proceeding</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># embedding_dim is the number of dimensions the embedding model supports.</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">document_store</span> = <span style="color:#268bd2">AstraDocumentStore</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">duplicates_policy</span>=<span style="color:#268bd2">DuplicatePolicy</span>.<span style="color:#268bd2">SKIP</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">embedding_dimension</span>=<span style="color:#2aa198;font-weight:bold">384</span>,
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># Add Documents</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">documents</span> = [
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(<span style="color:#268bd2">content</span>=<span style="color:#2aa198">"There are over 7,000 languages spoken around the world today."</span>),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">content</span>=<span style="color:#2aa198">"Elephants have been observed to behave in a way that indicates"</span>
</span></span><span style="display:flex;"><span>        <span style="color:#2aa198">" a high level of self-awareness, such as recognizing themselves in mirrors."</span>
</span></span><span style="display:flex;"><span>    ),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">Document</span>(
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">content</span>=<span style="color:#2aa198">"In certain parts of the world, like the Maldives, Puerto Rico, "</span>
</span></span><span style="display:flex;"><span>        <span style="color:#2aa198">"and San Diego, you can witness the phenomenon of bioluminescent waves."</span>
</span></span><span style="display:flex;"><span>    ),
</span></span><span style="display:flex;"><span>]
</span></span><span style="display:flex;"><span><span style="color:#268bd2">index_pipeline</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">index_pipeline</span>.<span style="color:#268bd2">add_component</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">instance</span>=<span style="color:#268bd2">SentenceTransformersDocumentEmbedder</span>(<span style="color:#268bd2">model</span>=<span style="color:#268bd2">embedding_model_name</span>),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"embedder"</span>,
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">index_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">DocumentWriter</span>(<span style="color:#268bd2">document_store</span>=<span style="color:#268bd2">document_store</span>, <span style="color:#268bd2">policy</span>=<span style="color:#268bd2">DuplicatePolicy</span>.<span style="color:#268bd2">SKIP</span>), <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"writer"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">index_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"embedder.documents"</span>, <span style="color:#2aa198">"writer.documents"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">index_pipeline</span>.<span style="color:#268bd2">run</span>({<span style="color:#2aa198">"embedder"</span>: {<span style="color:#2aa198">"documents"</span>: <span style="color:#268bd2">documents</span>}})
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#cb4b16">print</span>(<span style="color:#268bd2">document_store</span>.<span style="color:#268bd2">count_documents</span>())
</span></span></code></pre></div><p>If all has gone well, there should be 3 documents. 🎉</p>
<h2 id="use-the-astraembeddingretriever-in-a-haystack-rag-pipeline">Use the <code>AstraEmbeddingRetriever</code> in a Haystack RAG pipeline</h2>
<p>In Haystack, every <code>DocumentStore</code> is tightly coupled with the <code>Retriever</code> that fetches from it. Astra DB is no exception. Here we’ll create a RAG pipeline, where the 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/astraretriever" target="_blank" rel="noopener"><code>AstraEmbeddingRetriever</code></a> will fetch documents relevant to our query.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.builders.answer_builder</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">AnswerBuilder</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.builders.prompt_builder</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">PromptBuilder</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.generators</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">OpenAIGenerator</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack_integrations.components.retrievers.astra</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">AstraEmbeddingRetriever</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_template</span> = <span style="color:#2aa198">"""
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">                Given these documents, answer the question.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">                Documents:
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">                {</span><span style="color:#2aa198">% f</span><span style="color:#2aa198">or doc in documents %}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">                    {{ doc.content }}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">                {</span><span style="color:#2aa198">% e</span><span style="color:#2aa198">ndfor %}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">                Question: {{question}}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">                Answer:
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">                """</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">add_component</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">instance</span>=<span style="color:#268bd2">SentenceTransformersTextEmbedder</span>(<span style="color:#268bd2">model</span>=<span style="color:#268bd2">embedding_model_name</span>),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"embedder"</span>,
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">AstraEmbeddingRetriever</span>(<span style="color:#268bd2">document_store</span>=<span style="color:#268bd2">document_store</span>), <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"retriever"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">PromptBuilder</span>(<span style="color:#268bd2">template</span>=<span style="color:#268bd2">prompt_template</span>), <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"prompt_builder"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">OpenAIGenerator</span>(), <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"llm"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">AnswerBuilder</span>(), <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"answer_builder"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"embedder"</span>, <span style="color:#2aa198">"retriever"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"retriever"</span>, <span style="color:#2aa198">"prompt_builder.documents"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"prompt_builder"</span>, <span style="color:#2aa198">"llm"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"llm.replies"</span>, <span style="color:#2aa198">"answer_builder.replies"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"llm.meta"</span>, <span style="color:#2aa198">"answer_builder.meta"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"retriever"</span>, <span style="color:#2aa198">"answer_builder.documents"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># Run the pipeline</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">question</span> = <span style="color:#2aa198">"How many languages are there in the world today?"</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">result</span> = <span style="color:#268bd2">rag_pipeline</span>.<span style="color:#268bd2">run</span>(
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#2aa198">"embedder"</span>: {<span style="color:#2aa198">"text"</span>: <span style="color:#268bd2">question</span>},
</span></span><span style="display:flex;"><span>        <span style="color:#2aa198">"retriever"</span>: {<span style="color:#2aa198">"top_k"</span>: <span style="color:#2aa198;font-weight:bold">2</span>},
</span></span><span style="display:flex;"><span>        <span style="color:#2aa198">"prompt_builder"</span>: {<span style="color:#2aa198">"question"</span>: <span style="color:#268bd2">question</span>},
</span></span><span style="display:flex;"><span>        <span style="color:#2aa198">"answer_builder"</span>: {<span style="color:#2aa198">"query"</span>: <span style="color:#268bd2">question</span>},
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#cb4b16">print</span>(<span style="color:#268bd2">result</span>)
</span></span></code></pre></div><p>The output should look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>{<span style="color:#2aa198">'answer_builder'</span>: {<span style="color:#2aa198">'answers'</span>: [GeneratedAnswer(<span style="color:#268bd2">data</span>=<span style="color:#2aa198">'There are over 7,000 languages spoken around the world today.'</span>, <span style="color:#268bd2">query</span>=<span style="color:#2aa198">'How many languages are there in the world today?'</span>, <span style="color:#268bd2">documents</span>=[Document(<span style="color:#268bd2">id</span>=cfe93bc1c274908801e6670440bf2bbba54fad792770d57421f85ffa2a4fcc94, content: <span style="color:#2aa198">'There are over 7,000 languages spoken around the world today.'</span>, score: 0.9267925, embedding: vector of size 384), Document(<span style="color:#268bd2">id</span>=6f20658aeac3c102495b198401c1c0c2bd71d77b915820304d4fbc324b2f3cdb, content: <span style="color:#2aa198">'Elephants have been observed to behave in a way that indicates a high level of self-awareness, such ...'</span>, score: 0.5357444, embedding: vector of size 384)], <span style="color:#268bd2">meta</span>={<span style="color:#2aa198">'model'</span>: <span style="color:#2aa198">'gpt-3.5-turbo-0613'</span>, <span style="color:#2aa198">'index'</span>: 0, <span style="color:#2aa198">'finish_reason'</span>: <span style="color:#2aa198">'stop'</span>, <span style="color:#2aa198">'usage'</span>: {<span style="color:#2aa198">'completion_tokens'</span>: 14, <span style="color:#2aa198">'prompt_tokens'</span>: 83, <span style="color:#2aa198">'total_tokens'</span>: 97}})]}}
</span></span></code></pre></div><h2 id="wrapping-it-up">Wrapping it up</h2>
<p>If you’ve gotten this far, now you know how to use Astra DB as a data source for your Haystack pipeline. To learn more about Haystack, 
<a href="https://discord.gg/QMP5jgMH" target="_blank" rel="noopener">join us on Discord</a> or 
<a href="https://landing.deepset.ai/haystack-community-updates" target="_blank" rel="noopener">sign up for our monthly newsletter</a>.</p>

      </description>
    </item>
    
    <item>
      <title>PDF-Based Question Answering with Amazon Bedrock and Haystack</title>
      <link>https://haystack.deepset.ai/blog/pdf-qa-application-with-bedrock/</link>
      <pubDate>Wed, 17 Jan 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/pdf-qa-application-with-bedrock/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;
&lt;a href=&#34;https://aws.amazon.com/bedrock/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Amazon Bedrock&lt;/a&gt; is a fully managed service that provides high-performing foundation models from leading AI startups and Amazon through a single API. You can choose from various foundation models to find the one best suited for your use case.&lt;/p&gt;
&lt;p&gt;In this article, I’ll guide you through the process of &lt;strong&gt;creating a generative question answering application&lt;/strong&gt; tailored for PDF files using the newly added 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/amazon-bedrock&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Amazon Bedrock integration&lt;/a&gt; with 
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt; and 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/opensearch-document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenSearch&lt;/a&gt; to store our documents efficiently. The demo will illustrate the step-by-step development of a QA application designed specifically for the Bedrock documentation, demonstrating the power of Bedrock in the process 🚀&lt;/p&gt;
&lt;h2 id=&#34;benefits-of-amazon-bedrock&#34;&gt;Benefits of Amazon Bedrock&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;It provides access to a &lt;strong&gt;diverse range of foundation models&lt;/strong&gt; sourced from leading AI startups, including AI21 Labs, Anthropic, Cohere, Meta, and Stability AI, along with Amazon Titan models.&lt;/li&gt;
&lt;li&gt;You can &lt;strong&gt;experiment with different large language models (LLMs)&lt;/strong&gt; seamlessly using Amazon Bedrock. No need for multiple API keys; just modify the model name and test your application with various prompts and configurations to identify the optimal model for your specific use case.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Amazon Bedrock doesn’t use your prompts and continuations&lt;/strong&gt; in training AWS models or share them with third parties. Your training data is not employed in training the core Amazon Titan models or distributed externally. Additionally, other usage data, such as timestamps and account IDs, is not employed in model training. 
&lt;a href=&#34;https://docs.aws.amazon.com/bedrock/latest/userguide/data-protection.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Source&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;You &lt;strong&gt;don’t need to manage infrastructure&lt;/strong&gt; for hosting your models when deploying your application after the prototyping phase. Amazon Bedrock takes care of the hosting infrastructure, providing a seamless deployment experience.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;set-up-amazon-bedrock&#34;&gt;Set Up Amazon Bedrock&lt;/h2&gt;
&lt;p&gt;To use Amazon Bedrock, start by 
&lt;a href=&#34;https://portal.aws.amazon.com/billing/signup&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;signing up for an AWS account&lt;/a&gt;. Once you have successfully logged in, go to 
&lt;a href=&#34;https://us-east-1.console.aws.amazon.com/bedrock/home&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Amazon Bedrock Console&lt;/a&gt; and get started. Keep in mind that, by default, users do not have model access. You should request access from 
&lt;a href=&#34;https://us-east-1.console.aws.amazon.com/bedrock/home#/modelaccess&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Model Access Page&lt;/a&gt;. For this application, we’re going to use “Titan Text G1 - Express” model by Amazon. Unfortunately, there is no free tier for Amazon Bedrock, therefore, you might need to provide payment information at this step.&lt;/p&gt;
&lt;p&gt;Learn more about the process in this 
&lt;a href=&#34;https://docs.aws.amazon.com/bedrock/latest/userguide/setting-up.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;set up Amazon Bedrock&lt;/a&gt; guide.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/pdf-qa-application-with-bedrock/model-access.png alt=&#34;Model access page in AWS Console with model names and an orange ‘Manage model access’ button&#34; class=&#34;medium&#34; /&gt;
        &lt;figcaption&gt;Manage model access&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h2 id=&#34;api-keys&#34;&gt;API Keys&lt;/h2&gt;
&lt;p&gt;To use Amazon Bedrock, you need &lt;code&gt;aws_access_key_id&lt;/code&gt;, &lt;code&gt;aws_secret_access_key&lt;/code&gt;, and indicate the &lt;code&gt;aws_region_name&lt;/code&gt;. Once logged into your account, create the access keys in “Security Credentials” section. For detailed guidance, refer to the documentation on 
&lt;a href=&#34;https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Managing access keys for IAM users&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/pdf-qa-application-with-bedrock/security-credentials.png alt=&#34;User dropdown menu in AWS Console with several options including “Security Credentials”&#34;  /&gt;
        &lt;figcaption&gt;1- Find the access keys under ‘Security Credentials’ section&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/pdf-qa-application-with-bedrock/create-keys.png alt=&#34;Access keys section in AWS Console, the button with text ‘Create access key’ is marked in a red box&#34; class=&#34;medium&#34; /&gt;
        &lt;figcaption&gt;2- Create a new key (or use an existing one)&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/pdf-qa-application-with-bedrock/retrieve-keys.png alt=&#34;Retrieve Keys page in in AWS Console, giving details about the newly created keys&#34; class=&#34;medium&#34; /&gt;
        &lt;figcaption&gt;3- Copy and store your &lt;strong&gt;&lt;code&gt;aws_access_key_id&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;aws_secret_access_key&lt;/code&gt;&lt;/strong&gt; keys before you press ‘Done’&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h2 id=&#34;development-environment&#34;&gt;Development Environment&lt;/h2&gt;
&lt;h3 id=&#34;install-dependencies&#34;&gt;Install dependencies&lt;/h3&gt;
&lt;p&gt;Let’s install 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/amazon-bedrock&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Amazon Bedrock&lt;/a&gt; and 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/opensearch-document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenSearch&lt;/a&gt; integration of Haystack and 
&lt;a href=&#34;https://pypi.org/project/pypdf/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;pypdf&lt;/a&gt; package to process PDF files.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pip install opensearch-haystack amazon-bedrock-haystack pypdf
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;download-files&#34;&gt;Download Files&lt;/h3&gt;
&lt;p&gt;For this application, we’ll use the user guide of Amazon Bedrock. Amazon Bedrock provides the 
&lt;a href=&#34;https://docs.aws.amazon.com/pdfs/bedrock/latest/userguide/bedrock-ug.pdf&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PDF form of its guide&lt;/a&gt;. You can download the PDF from its source or by running the code below 👇🏼&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;boto3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;botocore&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;UNSIGNED&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;botocore.config&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Config&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;s3&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;boto3&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;client&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;s3&#39;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;config&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;Config&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;signature_version&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;UNSIGNED&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Download the PDF file from our s3 bucket to `/content/bedrock-documentation.pdf` directory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;s3&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;download_file&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;core-engineering&#39;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;public/blog-posts/bedrock-documentation.pdf&#39;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;/content/bedrock-documentation.pdf&#39;&lt;/span&gt;) &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;initialize-an-opensearch-instance&#34;&gt;Initialize an OpenSearch Instance&lt;/h3&gt;
&lt;p&gt;
&lt;a href=&#34;https://opensearch.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenSearch&lt;/a&gt; is a fully open-source search and analytics engine that seamlessly integrates with the 
&lt;a href=&#34;https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Amazon OpenSearch Service&lt;/a&gt;, allowing you to deploy, manage, and scale your OpenSearch cluster effortlessly. To get started, install OpenSearch and launch an instance by following the detailed instructions provided in the 
&lt;a href=&#34;https://opensearch.org/docs/latest/install-and-configure/install-opensearch/index/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenSearch documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here’s how to launch OpenSearch with Docker (recommended):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker pull opensearchproject/opensearch:2.11.0
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker run -p 9200:9200 -p 9600:9600 -e &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;discovery.type=single-node&#34;&lt;/span&gt; -e &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;OPENSEARCH_JAVA_OPTS=-Xms1024m -Xmx1024m&#34;&lt;/span&gt; opensearchproject/opensearch:2.11.0
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;building-the-indexing-pipeline&#34;&gt;Building the Indexing Pipeline&lt;/h2&gt;
&lt;p&gt;Our indexing pipeline will convert the PDF file into a Haystack Document using 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/pypdftodocument&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PyPDFToDocument&lt;/a&gt; and preprocess it by cleaning and splitting it into chunks before storing them in 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/opensearch-document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenSearchDocumentStore&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let’s run the pipeline below and index our file to our document store:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;pathlib&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.converters&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PyPDFToDocument&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.preprocessors&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentCleaner&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentSplitter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.writers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentWriter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.types&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DuplicatePolicy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.document_stores.opensearch&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenSearchDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Initialize the OpenSearchDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenSearchDocumentStore&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;hosts&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;http://localhost:9200&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;use_ssl&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;verify_certs&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;http_auth&lt;/span&gt;=(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;admin&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;admin&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Create pipeline components&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;converter&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PyPDFToDocument&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;cleaner&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentCleaner&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;splitter&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentSplitter&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;split_by&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sentence&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;split_length&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;10&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;split_overlap&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;writer&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentWriter&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;policy&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;DuplicatePolicy&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;SKIP&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Add components to the pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;converter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;converter&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cleaner&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;cleaner&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;splitter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;splitter&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;writer&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;writer&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Connect the components to each other&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;converter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cleaner&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cleaner&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;splitter&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;splitter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;writer&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Run the pipeline with the files you want to index.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;converter&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sources&#34;&lt;/span&gt;: [&lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;/content/bedrock-documentation.pdf&#34;&lt;/span&gt;)]}})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;Pipelines for indexing files into document stores are convenient when you need to add additional files to your document store. Feel free to rerun the pipeline with new documents whenever necessary.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;building-the-query-pipeline&#34;&gt;Building the Query Pipeline&lt;/h2&gt;
&lt;p&gt;Let’s create another pipeline to query our application. In this pipeline, we’ll use 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/opensearchbm25retriever&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenSearchBM25Retriever&lt;/a&gt; to retrieve relevant information from the OpenSearchDocumentStore and an Amazon Titan model &lt;code&gt;amazon.titan-text-express-v1&lt;/code&gt; to generate answers with 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/amazonbedrockgenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;AmazonBedrockGenerator&lt;/a&gt;. You can find other model options in 
&lt;a href=&#34;https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids-arns.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;supported foundation models&lt;/a&gt;. Next, we’ll create a prompt for our task using the Retrieval-Augmented Generation (RAG) approach with 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/promptbuilder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PromptBuilder&lt;/a&gt;. This prompt will help generate answers by considering the provided context. Finally, we’ll connect these three components to complete the pipeline.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.pipeline&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;amazon_bedrock_haystack.generators.amazon_bedrock&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;AmazonBedrockGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.retrievers.opensearch&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenSearchBM25Retriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Create pipeline components&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenSearchBM25Retriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;15&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Set the AWS credentials as environment variables&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;AWS_ACCESS_KEY_ID&#34;&lt;/span&gt;] = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;aws_access_key_id&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;AWS_SECRET_ACCESS_KEY&#34;&lt;/span&gt;] = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;aws_secret&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;AWS_REGION_NAME&#34;&lt;/span&gt;] = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;aws_region_name&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Initialize the AmazonBedrockGenerator with an Amazon Bedrock model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;AmazonBedrockGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;amazon.titan-text-express-v1&#39;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;max_length&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;500&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or document in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    {{ document.content }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Please answer the question based on the given information from Amazon Bedrock documentation.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{{question}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Add components to the pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Connect components to each other&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, ask your question and learn about the Amazon Bedrock service using Amazon Bedrock models!&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What is Amazon Bedrock??&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Example result:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Amazon Bedrock is a fully managed service that makes high-performing foundation models (FMs) from leading AI startups and Amazon available &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; your use through a uniﬁed API. You can choose from a wide range of foundation models to ﬁnd the model that is best suited &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; your use &lt;span style=&#34;color:#859900&#34;&gt;case&lt;/span&gt;. Amazon Bedrock also offers a broad &lt;span style=&#34;color:#cb4b16&#34;&gt;set&lt;/span&gt; of capabilities to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can easily experiment with and evaluate top foundation models &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; your use cases, privately customize them with your data using techniques such as ﬁne-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;With Amazon Bedrock&#39;s serverless experience, you can get started quickly, privately customize foundation models with your own data, and easily and securely integrate and deploy them into your applications using AWS tools without having to manage any infrastructure.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;Keep in mind that generative models are not deterministic and the response you get might differ.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id=&#34;other-questions-you-can-try&#34;&gt;Other questions you can try&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;How can I set up Amazon Bedrock?&lt;/li&gt;
&lt;li&gt;How should I form my prompts for Amazon Titan models?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;next-steps&#34;&gt;Next Steps&lt;/h2&gt;
&lt;p&gt;As the next step, feel free to enhance your document store by indexing additional files or replacing existing ones with your own data.&lt;/p&gt;
&lt;p&gt;For this demonstration, we applied the BM25 method to generate text vectors. If you’d like to improve the application further, select an 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/embedders&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Embedder&lt;/a&gt; and generate dense embeddings for each file using a chosen embedding model before incorporating them into the document store.&lt;/p&gt;
&lt;p&gt;Thank you for your interest! Stay informed about the latest developments in Haystack by 
&lt;a href=&#34;https://landing.deepset.ai/haystack-community-updates&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;subscribing to our newsletter&lt;/a&gt; or 
&lt;a href=&#34;https://discord.com/invite/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;joining our Discord community&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Using Jina Embeddings v2 with Haystack pipelines to summarize legal documents</title>
      <link>https://haystack.deepset.ai/blog/using-jina-embeddings-haystack/</link>
      <pubDate>Wed, 10 Jan 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/using-jina-embeddings-haystack/</guid>
      <description>
      
      
      

      
      

      <p>
<a href="https://jina.ai/news/jina-ai-launches-worlds-first-open-source-8k-text-embedding-rivaling-openai/" target="_blank" rel="noopener">Jina.ai</a> recently upgraded and expanded the capabilities of their previous embedding model in a v2 release.</p>
<p>With the 
<a href="https://haystack.deepset.ai/integrations/jina" target="_blank" rel="noopener">Jina Haystack extension</a>, you can now take advantage of these new text embedders in your Haystack pipelines! In this post, we’ll show what’s cool about Jina Embeddings v2 and how to use them.</p>
<blockquote>
<p>You can follow along in the accompanying 
<a href="https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/jina-embeddings-v2-legal-analysis-rag.ipynb" target="_blank" rel="noopener">Colab notebook of a RAG pipeline that uses the Jina Haystack extension</a>.</p>
</blockquote>
<h2 id="advantages-of-jina-embeddings-v2">Advantages of Jina Embeddings v2</h2>
<ul>
<li><strong>Handling long documents.</strong> The large token window, accommodating up to 8192 tokens, allows you to break the embeddings into larger chunks. It’s more computationally and memory-efficient to use a few larger vectors than a lot of small ones, so this allows Jina v2 to process large documents efficiently.</li>
<li><strong>Improved semantic understanding.</strong> Larger text chunks also contain more <em>context</em> within each chunk, which can help LLMs better understand your documents. Improved understanding means better long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search.</li>
<li><strong>Short vector length</strong>: Jina Embeddings v2 emits embedding vectors of length 768 (base model) or 512 (small model), which are both significantly less than that of the only other embedding model that supports 8k tokens input length, while not compromising on the quality of retrieval, similarity, reranking or other downstream tasks. A shorter vector length implies cost-savings for the vector database, which typically price based on stored vector dimensions.</li>
<li><strong>Fully open source 💙</strong> There are both small and large embedding models available, depending on your computing resources and requirements. To run the embedding models yourself, 
<a href="https://huggingface.co/jinaai/jina-embeddings-v2-base-en" target="_blank" rel="noopener">check out this documentation on HuggingFace</a>.  Alternately, you can use Jina’s fully managed embedding service to handle that for you, which we’ll be doing for this demo.</li>
</ul>
<h2 id="getting-started-using-jina-embeddings-v2-with-haystack">Getting started using Jina Embeddings v2 with Haystack</h2>
<p>To use the integration you’ll need a free Jina api key - get one 
<a href="https://jina.ai/embeddings/" target="_blank" rel="noopener">here</a>.</p>
<p>You can use Jina Embedding models with two Haystack components: <code>JinaTextEmbedder</code> and <code>JinaDocumentEmbedder</code>.</p>
<p>To create semantic embeddings for documents, use 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/jinadocumentembedder" target="_blank" rel="noopener"><code>JinaDocumentEmbedder</code></a> in your indexing pipeline. For generating embeddings for queries, use 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/jinatextembedder" target="_blank" rel="noopener"><code>JinaTextEmbedder</code></a>.</p>
<p>In the following code we’ll demonstrate how to use both components. You can also 
<a href="https://haystack.deepset.ai/integrations/jina" target="_blank" rel="noopener">see the Haystack docs for some minimum viable code examples.</a></p>
<h2 id="summarizing-legal-text-with-a-haystack-rag-pipeline">Summarizing legal text with a Haystack RAG pipeline</h2>
<p>I’m not a lawyer, and neither are large language models. But LLMs are good at analyzing long, complex documents. So let’s try using the Jina v2 embedding models for some legal summarization.</p>
<p>In October 2023, I narrowly escaped jury duty. I had slight FOMO since the case sounded interesting (Google v. Sonos). Let’s see how it turned out.</p>
<p>To follow along with this demo, in addition to a Jina api key you’ll also need a 
<a href="https://huggingface.co/docs/hub/security-tokens" target="_blank" rel="noopener">Hugging Face access token</a>, since we’ll use the 
<a href="https://mistral.ai/news/mixtral-of-experts/" target="_blank" rel="noopener">Mixtral 8x7b LLM</a> for question answering.</p>
<p>First, let’s install all the packages we’ll need.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>pip install jina-haystack chroma-haystack pypdf
</span></span></code></pre></div><p>Then let’s input our credentials. Or you can set them as environment variables instead if you’re feeling fancy.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">getpass</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">getpass</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">os</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">os</span>.<span style="color:#268bd2">environ</span>[<span style="color:#2aa198">"JINA_API_KEY"</span>] = <span style="color:#268bd2">getpass</span>(<span style="color:#2aa198">"JINA api key:"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">os</span>.<span style="color:#268bd2">environ</span>[<span style="color:#2aa198">"HF_API_TOKEN"</span>] = <span style="color:#268bd2">getpass</span>(<span style="color:#2aa198">"Enter your HuggingFace api token: "</span>)
</span></span></code></pre></div><h2 id="building-the-indexing-pipeline">Building the indexing pipeline</h2>
<p>Our indexing pipeline will preprocess the legal document, turn it into vectors, and store them. We’ll use the 
<a href="https://docs.trychroma.com/getting-started" target="_blank" rel="noopener">Chroma DocumentStore</a> to store the vector embeddings, via the 
<a href="https://haystack.deepset.ai/integrations/chroma-documentstore" target="_blank" rel="noopener">Chroma Document Store Haystack integration</a>.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">chroma_haystack.document_store</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">ChromaDocumentStore</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">document_store</span> = <span style="color:#268bd2">ChromaDocumentStore</span>()
</span></span></code></pre></div><p>At a high level, the 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/linkcontentfetcher" target="_blank" rel="noopener"><code>LinkContentFetcher</code></a> pulls this document from its URL. Then we convert it from a PDF into a Document object Haystack can understand.</p>
<p>We preprocess it by removing whitespace and redundant substrings. Then split it into chunks, generate embeddings, and write these embeddings into the <code>ChromaDocumentStore</code>.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.fetchers</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">LinkContentFetcher</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.converters</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">PyPDFToDocument</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.writers</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">DocumentWriter</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.preprocessors</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">DocumentCleaner</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.preprocessors</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">DocumentSplitter</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">chroma_haystack.retriever</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">ChromaEmbeddingRetriever</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.document_stores.types</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">DuplicatePolicy</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">jina_haystack.document_embedder</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">JinaDocumentEmbedder</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">jina_haystack.text_embedder</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">JinaTextEmbedder</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">fetcher</span> = <span style="color:#268bd2">LinkContentFetcher</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">converter</span> = <span style="color:#268bd2">PyPDFToDocument</span>()
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># remove repeated substrings to get rid of headers/footers</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">cleaner</span> = <span style="color:#268bd2">DocumentCleaner</span>(<span style="color:#268bd2">remove_repeated_substrings</span>=<span style="color:#859900;font-weight:bold">True</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># Since jina-v2 can handle 8192 tokens, 500 words seems like a safe chunk size</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">splitter</span> = <span style="color:#268bd2">DocumentSplitter</span>(<span style="color:#268bd2">split_by</span>=<span style="color:#2aa198">"word"</span>, <span style="color:#268bd2">split_length</span>=<span style="color:#2aa198;font-weight:bold">500</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># DuplicatePolicy.SKIP is optional but helps avoid errors if you want to re-run the pipeline</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">writer</span> = <span style="color:#268bd2">DocumentWriter</span>(<span style="color:#268bd2">document_store</span>=<span style="color:#268bd2">document_store</span>, <span style="color:#268bd2">policy</span>=<span style="color:#268bd2">DuplicatePolicy</span>.<span style="color:#268bd2">SKIP</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">retriever</span> = <span style="color:#268bd2">ChromaEmbeddingRetriever</span>(<span style="color:#268bd2">document_store</span>=<span style="color:#268bd2">document_store</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">document_embedder</span> = <span style="color:#268bd2">JinaDocumentEmbedder</span>(<span style="color:#268bd2">model</span>=<span style="color:#2aa198">"jina-embeddings-v2-base-en"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">fetcher</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"fetcher"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">converter</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"converter"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">cleaner</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"cleaner"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">splitter</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"splitter"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">document_embedder</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"embedder"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">writer</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"writer"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"fetcher.streams"</span>, <span style="color:#2aa198">"converter.sources"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"converter.documents"</span>, <span style="color:#2aa198">"cleaner.documents"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"cleaner.documents"</span>, <span style="color:#2aa198">"splitter.documents"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"splitter.documents"</span>, <span style="color:#2aa198">"embedder.documents"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"embedder.documents"</span>, <span style="color:#2aa198">"writer.documents"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># This case references Google V Sonos, October 2023</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">urls</span> = [<span style="color:#2aa198">"https://cases.justia.com/federal/district-courts/california/candce/3:2020cv06754/366520/813/0.pdf"</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">run</span>(<span style="color:#268bd2">data</span>={<span style="color:#2aa198">"fetcher"</span>: {<span style="color:#2aa198">"urls"</span>: <span style="color:#268bd2">urls</span>}})
</span></span></code></pre></div><h2 id="building-the-query-pipeline">Building the query pipeline</h2>
<p>Now the real fun begins. Let’s create a query pipeline so we can actually start asking questions. We write a prompt allowing us to pass our documents to the Mixtral-8x7B LLM. Then we initiatialize the LLM via the <code>HuggingFaceAPIGenerator</code>.</p>
<p>To use this model, you need to accept the conditions here: 
<a href="https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1" target="_blank" rel="noopener">https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1</a></p>
<p>In Haystack 2.0 <code>retriever</code>s are tightly coupled to <code>DocumentStores</code>. If we pass the document store in the <code>retriever</code> we initialized earlier, this pipeline can access those embeddings we generated, and pass them to the LLM.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.generators</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">HuggingFaceAPIGenerator</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.builders.prompt_builder</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">PromptBuilder</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">jina_haystack.text_embedder</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">JinaTextEmbedder</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt</span> = <span style="color:#2aa198">""" Answer the question, based on the
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">content in the documents. If you can't answer based on the documents, say so.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">Documents:
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% f</span><span style="color:#2aa198">or doc in documents %}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">  {{doc.content}}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% e</span><span style="color:#2aa198">ndfor %}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">question: {{question}}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">"""</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">text_embedder</span> = <span style="color:#268bd2">JinaTextEmbedder</span>(<span style="color:#268bd2">model</span>=<span style="color:#2aa198">"jina-embeddings-v2-base-en"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">generator</span> = <span style="color:#268bd2">HuggingFaceAPIGenerator</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">api_type</span>=<span style="color:#2aa198">"serverless_inference_api"</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">api_params</span>={<span style="color:#2aa198">"model"</span>: <span style="color:#2aa198">"mistralai/Mixtral-8x7B-Instruct-v0.1"</span>})  
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_builder</span> = <span style="color:#268bd2">PromptBuilder</span>(<span style="color:#268bd2">template</span>=<span style="color:#268bd2">prompt</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">query_pipeline</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">query_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#2aa198">"text_embedder"</span>,<span style="color:#268bd2">text_embedder</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">query_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">prompt_builder</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"prompt_builder"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">query_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#2aa198">"retriever"</span>, <span style="color:#268bd2">retriever</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">query_pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#2aa198">"generator"</span>, <span style="color:#268bd2">generator</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">query_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"text_embedder.embedding"</span>, <span style="color:#2aa198">"retriever.query_embedding"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">query_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"retriever.documents"</span>, <span style="color:#2aa198">"prompt_builder.documents"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">query_pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"prompt_builder.prompt"</span>, <span style="color:#2aa198">"generator.prompt"</span>)
</span></span></code></pre></div><p>Time to ask a question!</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">question</span> = <span style="color:#2aa198">"Summarize what happened in Google v. Sonos"</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">result</span> = <span style="color:#268bd2">query_pipeline</span>.<span style="color:#268bd2">run</span>(<span style="color:#268bd2">data</span>={<span style="color:#2aa198">"text_embedder"</span>:{<span style="color:#2aa198">"text"</span>: <span style="color:#268bd2">question</span>},
</span></span><span style="display:flex;"><span>                                  <span style="color:#2aa198">"retriever"</span>: {<span style="color:#2aa198">"top_k"</span>: <span style="color:#2aa198;font-weight:bold">3</span>},
</span></span><span style="display:flex;"><span>                                  <span style="color:#2aa198">"prompt_builder"</span>:{<span style="color:#2aa198">"question"</span>: <span style="color:#268bd2">question</span>},
</span></span><span style="display:flex;"><span>                                  <span style="color:#2aa198">"generator"</span>: {<span style="color:#2aa198">"generation_kwargs"</span>: {<span style="color:#2aa198">"max_new_tokens"</span>: <span style="color:#2aa198;font-weight:bold">350</span>}}})
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#cb4b16">print</span>(<span style="color:#268bd2">result</span>[<span style="color:#2aa198">'generator'</span>][<span style="color:#2aa198">'replies'</span>][<span style="color:#2aa198;font-weight:bold">0</span>])
</span></span></code></pre></div><pre tabindex="0"><code>Answer: Google v. Sonos is a patent infringement case in which Sonos sued Google for infringing on two of its patents related to customizing and saving overlapping groups of smart speakers or other zone players according to a common theme..
</code></pre><h2 id="exploring-more-questions-and-documents">Exploring more questions and documents</h2>
<p>You can swap the <code>question</code> variable out and then call <code>pipeline.run</code> again:</p>
<ul>
<li>What role did If This Then That play in Google v. Sonos?</li>
<li>What judge presided over Google v. Sonos?</li>
<li>What should Sonos have done differently?</li>
</ul>
<p>The indexing pipeline is written so that you can swap in other documents and analyze them. You can try plugging the following URLs (or any PDF written in English) into the indexing pipeline and re-running all the code blocks below it.</p>
<ul>
<li>Google v. Oracle: 
<a href="https://supreme.justia.com/cases/federal/us/593/18-956/case.pdf" target="_blank" rel="noopener">https://supreme.justia.com/cases/federal/us/593/18-956/case.pdf</a></li>
<li>JACK DANIEL’S PROPERTIES, INC. v. VIP PRODUCTS
LLC: 
<a href="https://www.supremecourt.gov/opinions/22pdf/22-148_3e04.pdf" target="_blank" rel="noopener">https://www.supremecourt.gov/opinions/22pdf/22-148_3e04.pdf</a></li>
</ul>
<p>Note: if you want to change the prompt template, you’ll also need to re-run the code blocks starting where the <code>DocumentStore</code> is defined.</p>
<h2 id="wrapping-it-up">Wrapping it up</h2>
<p>Thanks for reading! If you want to stay on top of the latest Haystack developments, you can 
<a href="https://landing.deepset.ai/haystack-community-updates" target="_blank" rel="noopener">subscribe to our newsletter</a> or 
<a href="https://discord.com/invite/haystack" target="_blank" rel="noopener">join our Discord community</a>.</p>
<p>To learn more about the technologies used here, check out these blog posts:</p>
<ul>
<li>
<a href="https://jina.ai/news/embeddings-in-depth/" target="_blank" rel="noopener">Embeddings in Depth</a></li>
<li>
<a href="https://haystack.deepset.ai/blog/what-is-text-vectorization-in-nlp" target="_blank" rel="noopener">What is text vectorization in NLP?</a></li>
<li>
<a href="https://haystack.deepset.ai/blog/the-definitive-guide-to-bertmodels" target="_blank" rel="noopener">The definitive guide to BERT models</a></li>
</ul>

      </description>
    </item>
    
    <item>
      <title>Highlights of 2023</title>
      <link>https://haystack.deepset.ai/blog/highlights-of-2023/</link>
      <pubDate>Fri, 05 Jan 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/highlights-of-2023/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This post was originally shared through the Haystack newsletter. 
&lt;a href=&#34;https://landing.deepset.ai/haystack-community-updates&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Subscribe now&lt;/a&gt; to stay in the loop on all things Haystack, including the latest updates, new features, captivating content, and upcoming exciting events!&lt;/em&gt; 🗞️&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;What a fantastic year it has been! In 2023, the world of AI saw tremendous progress, making it an exciting time for learning and growth. As we wind down and approach the new year, it’s a good moment to take a breather, look back on the past year, and appreciate the highlights before gearing up for what’s ahead. We’ve taken some time to review the noteworthy moments that made 2023 special for Haystack and its community.&lt;/p&gt;
&lt;p&gt;A big thank you for being part of this journey! Here’s to a wonderful New Year filled with dizzying generative AI and NLP advancements 🥂✨&lt;/p&gt;
&lt;h2 id=&#34;-haystack-highlights&#34;&gt;💫 Haystack Highlights&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The biggest highlight of the year was 
&lt;a href=&#34;https://haystack.deepset.ai/blog/introducing-haystack-2-beta-and-advent&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;Haystack 2.0-Beta&lt;/strong&gt;&lt;/a&gt;, which is a more flexible, extendable, customizable version of Haystack 💯&lt;/li&gt;
&lt;li&gt;Haystack surpassed 
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;12,000 stars&lt;/strong&gt;&lt;/a&gt; on GitHub 🌟&lt;/li&gt;
&lt;li&gt;It has been exactly 2 years since the 
&lt;a href=&#34;https://github.com/deepset-ai/haystack/releases/tag/v1.0.0&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;first stable release&lt;/a&gt; of Haystack 🎉&lt;/li&gt;
&lt;li&gt;We introduced 
&lt;a href=&#34;https://haystack.deepset.ai/integrations&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack Integrations&lt;/a&gt; to highlight community-contributed external packages and incorporate additional technologies like &lt;strong&gt;Amazon&lt;/strong&gt;, &lt;strong&gt;VertexAI&lt;/strong&gt;, &lt;strong&gt;Elasticsearch&lt;/strong&gt;, &lt;strong&gt;OpenSearch&lt;/strong&gt;, &lt;strong&gt;Chroma&lt;/strong&gt;, &lt;strong&gt;Pinecone&lt;/strong&gt;, &lt;strong&gt;Cohere&lt;/strong&gt;, &lt;strong&gt;Gradient&lt;/strong&gt;, &lt;strong&gt;Jina&lt;/strong&gt;, and &lt;strong&gt;Ollama&lt;/strong&gt; that seamlessly integrate with Haystack 🧩&lt;/li&gt;
&lt;li&gt;We hosted the first 
&lt;a href=&#34;https://haystack.deepset.ai/advent-of-haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;Advent of Haystack&lt;/strong&gt;&lt;/a&gt; 🎄&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Over 80 pull requests from the community&lt;/strong&gt; were merged 🚀&lt;/li&gt;
&lt;li&gt;We launched 
&lt;a href=&#34;https://haystack.deepset.ai/blog&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack Blog&lt;/a&gt;, where we publish articles about Haystack, LLMs, demos, and use cases 📚 &lt;strong&gt;P.S.&lt;/strong&gt; Did you know that you can 
&lt;a href=&#34;https://github.com/deepset-ai/haystack-home/pulls&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;contribute to Haystack Blog&lt;/a&gt;?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/highlights-of-2023/haystack-highlight.png alt=&#34;Festive-themed visual with snowman, christmas tree and text displaying ‘Haystack 2.0-Beta’ and ‘Advent of Haystack.&#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;&lt;em&gt;We announced Haystack 2.0-Beta and hosted the first Advent of Haystack&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h2 id=&#34;-community-highlights&#34;&gt;👥 Community Highlights&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Our community has reached &lt;strong&gt;2300&#43; members&lt;/strong&gt; on Discord, 
&lt;a href=&#34;https://discord.gg/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;join the Haystack Discord community&lt;/a&gt; to ask questions, stay informed about Haystack updates, and connect with fellow Haystackers 🤖&lt;/li&gt;
&lt;li&gt;Discord has been the platform for numerous &lt;strong&gt;office hours&lt;/strong&gt;, a handful of &lt;strong&gt;live coding sessions,&lt;/strong&gt; and engaging 
&lt;a href=&#34;https://drive.google.com/drive/folders/15IbtdRbafnPpUd4yAcNtu4vUveOs33Qn?usp=sharing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;Road to 2.0&lt;/strong&gt;&lt;/a&gt; sessions 🛣️&lt;/li&gt;
&lt;li&gt;We proudly participated in &lt;strong&gt;Hacktoberfest&lt;/strong&gt; for the first time and welcomed new Haystack contributors 🎃&lt;/li&gt;
&lt;li&gt;We introduced our monthly &lt;strong&gt;Haystack community newsletter&lt;/strong&gt; 🗞️&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;120&#43; Advent of Haystack participants&lt;/strong&gt; took the time to give us feedback on Haystack 2.0-Beta &lt;strong&gt;🎄&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;We participated in the 
&lt;a href=&#34;https://partiful.com/e/pQHQrWPg1A6P31AYZMTd&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;Anthropic Hackathon&lt;/strong&gt;&lt;/a&gt; 🧑‍💻&lt;/li&gt;
&lt;li&gt;Haystack has spread its wings on Twitter/X, follow us at 
&lt;a href=&#34;https://twitter.com/Haystack_AI&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack_AI&lt;/a&gt; 🐦&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/highlights-of-2023/community-highlight.jpeg alt=&#34;A spacious room filled with people, gathered and listening to a speaker from an elevated position.&#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;&lt;em&gt;A photo from the Anthropic Hackathon&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h2 id=&#34;-webinar-highlights&#34;&gt;📆 Webinar Highlights&lt;/h2&gt;
&lt;p&gt;We collected some of the webinars that the Haystack community found interesting in 2023:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://youtu.be/-TfPkX3IoUQ?feature=shared&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Prompting LLMs Using Haystack&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://youtu.be/1NPcnlqPf2U?feature=shared&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Building Applications with LLM-Based Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://youtu.be/Gn5SV8q887s?feature=shared&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;From Hybrid Retrieval to RAG with OpenSearch and Haystack&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://community.analyticsvidhya.com/c/datahour/optimizing-llms-with-retrieval-augmented-generation-and-haystack-2-0&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Optimizing LLMs with Retrieval Augmented Generation and Haystack 2.0&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/highlights-of-2023/webinar-highlight.png alt=&#34;A slide explaining LLM strenght and weaknesses from “Optimizing LLMs with Retrieval Augmented Generation and Haystack 2.0” webinar by Sara&#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;&lt;em&gt;‘Optimizing LLMs with Retrieval Augmented Generation and Haystack 2.0’ by Sara&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h2 id=&#34;-blog-post-highlights&#34;&gt;📖 Blog Post Highlights&lt;/h2&gt;
&lt;p&gt;Here are the most visited blog posts of 2023:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://haystack.deepset.ai/blog/gemini-models-with-google-vertex-for-haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Gemini Models with Google Vertex AI Integration for Haystack&lt;/a&gt;: Multi-Modal components and function calling with the new Gemini integrations for Haystack 2.0&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://haystack.deepset.ai/blog/beginners-guide-to-llm-prompting&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;The Beginner’s Guide to LLM Prompting&lt;/a&gt;: Engineer your natural language prompts to get the best answers from a large language model&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://haystack.deepset.ai/blog/rag-deployment&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Deploying RAG to Production&lt;/a&gt;: From a working prototype to a RAG AI system in production: an overview for data scientists&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://haystack.deepset.ai/blog/how-to-prevent-prompt-injections&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;How to Prevent Prompt Injections&lt;/a&gt;: Prevent prompt injections leveraging our new open-source model and dataset&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://haystack.deepset.ai/blog/enhancing-rag-pipelines-in-haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Enhancing RAG Pipelines in Haystack&lt;/a&gt;: How the latest rankers optimize LLM context window utilization in RAG pipelines&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/highlights-of-2023/blog-highlight.jpg alt=&#34;An image of a robot looking at pictures of other robots&#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;&lt;em&gt;We published ‘Gemini Models with Google Vertex AI Integration for Haystack’&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h2 id=&#34;-some-of-your-work&#34;&gt;👏 Some of Your Work&lt;/h2&gt;
&lt;p&gt;In 2023, your contributions to the community were truly outstanding, and we want to highlight some of these works that we enjoyed seeing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;
&lt;a href=&#34;https://github.com/cclauss&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;@cclauss&lt;/a&gt;, 
&lt;a href=&#34;https://github.com/Faaany&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;@faaany&lt;/a&gt;, 
&lt;a href=&#34;https://github.com/danielbichuetti&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;@danielbichuetti&lt;/a&gt;, 
&lt;a href=&#34;https://github.com/awinml&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;@awinml&lt;/a&gt;, 
&lt;a href=&#34;https://github.com/zoltan-fedor&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;@zoltan-fedor&lt;/a&gt;, 
&lt;a href=&#34;https://github.com/benheckmann&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;@benheckmann&lt;/a&gt;, 
&lt;a href=&#34;https://github.com/nickprock&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;@nickprock&lt;/a&gt; and 
&lt;a href=&#34;https://github.com/recrudesce&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;@recrudesce&lt;/a&gt; contributed the most with their PRs&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;
&lt;a href=&#34;https://www.youtube.com/@AIAnytime&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;@AIAnytime&lt;/a&gt; on YouTube delivered two invaluable videos on Haystack:&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://www.youtube.com/watch?v=01_2-Dy57ys&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Chat with Data App: RAG using Mistral 7B, Haystack, and Chainlit&lt;/a&gt; and 
&lt;a href=&#34;https://www.youtube.com/watch?v=C5mqILmVUEo&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;RAG Implementation using Mistral 7B, Haystack, Weaviate, and FastAPI&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;
&lt;a href=&#34;https://medium.com/@lgutierrwr?source=post_page-----b782731ceb3d--------------------------------&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Laura Funderburk&lt;/a&gt; hosted a 
&lt;a href=&#34;https://medium.com/@lgutierrwr/how-i-orchestrated-a-five-week-data-engineering-mentorship-program-a-chronicle-of-growth-and-b782731ceb3d&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;mentorship program&lt;/a&gt; to help users build with Haystack and open source projects, in addition to her great Haystack 
&lt;a href=&#34;https://github.com/lfunderburk/llmops-with-haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;demos&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;and numerous members showcased their projects with the community on the #show-and-tell channel of Discord, as well as on Linkedin and Twitter 🤩&lt;/p&gt;
&lt;h2 id=&#34;-deepset-highlights&#34;&gt;💙 deepset Highlights&lt;/h2&gt;
&lt;p&gt;deepset, the company behind Haystack, announced a &lt;strong&gt;$30 million funding round&lt;/strong&gt; led by Balderton Capital, with participation from existing investors GV, Harpoon, System.One, and Lunar. Read more about it 
&lt;a href=&#34;https://www.deepset.ai/blog/funding-announcement-balderton-capital&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/highlights-of-2023/deepset-highlight.png alt=&#34;deepset logo positioned above the ‘$30M’ text, with confetti in the foreground against a blue background.&#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;&lt;em&gt;deepset raised $30 million funding&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h2 id=&#34;-next&#34;&gt;🔜 Next&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The stable version of Haystack 2.0 will be out in Q1 2024. Check out 
&lt;a href=&#34;https://github.com/orgs/deepset-ai/projects/3&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack public roadmap&lt;/a&gt; to see the upcoming features.&lt;/li&gt;
&lt;li&gt;Dive into the world of Haystack 2.0-beta with our latest discussion entry, designed to be your go-to resource until the stable release of Haystack 2.0. Find the roadmap, release notes, and continually updated feature table in 
&lt;a href=&#34;https://github.com/deepset-ai/haystack/discussions/6684&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;the ultimate guide for Haystack 2.0-beta&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;We have just introduced a new repository: 
&lt;a href=&#34;https://github.com/deepset-ai/haystack-cookbook&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack Cookbook&lt;/a&gt;. Stay tuned as we expand this collection with even more example notebooks throughout 2024.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To stay informed about everything related to Haystack, such as the latest updates, new features, captivating content, and upcoming exciting events, be sure to 
&lt;a href=&#34;https://landing.deepset.ai/haystack-community-updates&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;subscribe to the monthly Haystack newsletter&lt;/a&gt;! 🗞️&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Building a Healthcare Chatbot with Mixtral 8x7b, Haystack, and PubMed</title>
      <link>https://haystack.deepset.ai/blog/mixtral-8x7b-healthcare-chatbot/</link>
      <pubDate>Tue, 02 Jan 2024 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/mixtral-8x7b-healthcare-chatbot/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Unfortunately, countless people around the world have inadequate access to healthcare. I’m lucky to have health insurance and good medical providers taking care of me. However, I still want to educate myself before walking into a doctor’s office.&lt;/p&gt;
&lt;p&gt;Technology can empower people to take charge of their health. Large language models can power chatbots where people can ask medical questions.&lt;/p&gt;
&lt;p&gt;In this post, I’ll show you how I built a medical chatbot with Haystack 2.0-Beta, and the Mixtral 8x7B model by pulling research papers from PubMed.&lt;/p&gt;
&lt;p&gt;You can follow along 
&lt;a href=&#34;https://colab.research.google.com/drive/1Pl8gyfWthqoj7PTCQrteAvtOsswHvkzV&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;with this Colab&lt;/a&gt;. You’ll need a HuggingFace API key. 
&lt;a href=&#34;https://huggingface.co/join&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Sign up for a free account here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;challenges&#34;&gt;Challenges&lt;/h2&gt;
&lt;p&gt;Building a medical chatbot presents some challenges.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Lack of data&lt;/em&gt;. HIPAA and other privacy regulations make it harder to find public QA datasets to fine-tune a model.&lt;/li&gt;
&lt;li&gt;The human impact of providing wrong answers. 😬&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Staying up to date&lt;/em&gt;. It takes a long time to train a LLM. By the time they are released, their knowledge is already a bit stale. Medical research breakthroughs are happening all the time. I am particularly interested in long COVID, which has both impacted humanity on a massive scale, and is the subject of ongoing research.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, I decided to use a RAG pipeline to combine PubMed data with a LLM.&lt;/p&gt;
&lt;h2 id=&#34;the-pubmed-rag-pipeline&#34;&gt;The PubMed RAG Pipeline&lt;/h2&gt;
&lt;p&gt;Retrieval augmented generation, or RAG, is a way of giving a LLM context so it can better answer questions.&lt;/p&gt;
&lt;p&gt;You pass the LLM some documents, along with a query, and prompt the LLM to use the documents when answering the question.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/mixtral-8x7b-healthcare-chatbot/RAG.png alt=&#34;A flowchart diagram showing the architecture of a retrieval augmented generative pipeline. The boxes are Query, Retriever, Your Documents, Prompt, LLM, Output.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://pubmed.ncbi.nlm.nih.gov/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PubMed&lt;/a&gt; has up to date, trustworthy medical information so it seemed like a solid document source. Plus, there’s a 
&lt;a href=&#34;https://github.com/gijswobben/pymed&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PyMed wrapper for the PubMed API&lt;/a&gt; that made querying easy peasy. We’ll wrap this in a 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/custom-components&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack custom component&lt;/a&gt; to format the results as &lt;code&gt;Document&lt;/code&gt;s so that Haystack can use them, and add some light error handling.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;pymed&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PubMed&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;typing&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;component&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pubmed&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PubMed&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;tool&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Haystack2.0Prototype&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;email&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;tilde.thurium@deepset.ai&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;documentize&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;article&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;article&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;abstract&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;meta&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;title&#39;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;article&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;title&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;keywords&#39;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;article&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;keywords&lt;/span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;@component&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;PubMedFetcher&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;@component.output_types&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;articles&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;queries&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;]):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;cleaned_queries&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;queries&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;strip&lt;/span&gt;().&lt;span style=&#34;color:#268bd2&#34;&gt;split&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;articles&lt;/span&gt; = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;try&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;cleaned_queries&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;pubmed&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;max_results&lt;/span&gt; = &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;documentize&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;article&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;article&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;articles&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;extend&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;except&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Exception&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;e&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;e&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Couldn&#39;t fetch articles for queries: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;queries&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt; = {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;articles&#39;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;articles&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For the model, I went with Mixtral’s 8x7b. Mixtral is a unique kind of model that uses 8 &lt;em&gt;“experts”&lt;/em&gt; and an internal &lt;em&gt;“routing”&lt;/em&gt; mechanism that routes a token to a specific expert. This also means that during inferencing, not all parameters are used, which allows the model to response remarkably fast.  
&lt;a href=&#34;https://huggingface.co/blog/moe&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;This HuggingFace blog post explains MoE&lt;/a&gt; in more detail.&lt;/p&gt;
&lt;h2 id=&#34;generating-keywords-for-pubmed-with-mixtralllms&#34;&gt;Generating Keywords for PubMed with Mixtral/LLMs&lt;/h2&gt;
&lt;p&gt;First, I tried an approach where I passed a plain query to PubMed. e.g. &lt;em&gt;“What are the most current treatments for long COVID?”&lt;/em&gt; Unfortunately, that didn’t work too well. The articles returned weren’t very relevant. Which makes sense, because PubMed isn’t optimized for natural language search. It is optimized for keywords, though. And you know what’s great at generating keywords? LLMs!&lt;/p&gt;
&lt;p&gt;So now our flow is as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Our user inputs a question, such as, “What are the most current treatments for long COVID?”&lt;/li&gt;
&lt;li&gt;We prompt the LLM to turn the question into keywords&lt;/li&gt;
&lt;li&gt;Search PubMed and return top_k articles based on those keywords&lt;/li&gt;
&lt;li&gt;Pass those articles to the LLM and ask them to reference it when formulating an answer.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/mixtral-8x7b-healthcare-chatbot/HealthcareChatbotArchitecture.png alt=&#34;A flowchart showing the architecture of the Healthcare Chatbot. The boxes are Query, Keyword Prompt, LLM, Keywords, PubMedFetcher, Articles, Query Prompt, LLM, Ansswer.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;First, initialize the LLMs and warm them up.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;HuggingFaceTGIGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.utils&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Secret&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;keyword_llm&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;HuggingFaceTGIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;mistralai/Mixtral-8x7B-Instruct-v0.1&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;token&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;Secret&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_token&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;huggingface_token&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;keyword_llm&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;warm_up&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;HuggingFaceTGIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;mistralai/Mixtral-8x7B-Instruct-v0.1&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;token&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;Secret&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_token&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;huggingface_token&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;warm_up&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, we create our prompts and our pipeline and hook everything up.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders.prompt_builder&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;keyword_prompt_template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Your task is to convert the follwing question into 3 keywords that can be used to find relevant medical research papers on PubMed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Here is an examples:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;question: &#34;What are the latest treatments for major depressive disorder?&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;keywords:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Antidepressive Agents
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Depressive Disorder, Major
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Treatment-Resistant depression
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;question: {{ question }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;keywords:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Answer the question truthfully based on the given documents.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;If the documents don&#39;t contain an answer, use your existing knowledge base.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;q: {{ question }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Articles:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or article in articles %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  {{article.content}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  keywords: {{article.meta[&#39;keywords&#39;]}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  title: {{article.meta[&#39;title&#39;]}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;keyword_prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;keyword_prompt_template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;fetcher&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PubMedFetcher&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;keyword_prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;keyword_prompt_builder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;keyword_llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;keyword_llm&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;pubmed_fetcher&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;fetcher&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;llm&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;keyword_prompt_builder.prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;keyword_llm.prompt&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;keyword_llm.replies&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;pubmed_fetcher.queries&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;pubmed_fetcher.articles&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.articles&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm.prompt&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Try it for yourself and see!&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What are the most current treatments for long COVID?&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;keyword_prompt_builder&#34;&lt;/span&gt;:{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;:&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;:{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;:{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generation_kwargs&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;max_new_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;500&lt;/span&gt;}}})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;What are the most current treatments &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; long COVID?
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  The COVID-19 pandemic has led to a significant increase in the number of patients with post-acute COVID-19 syndrome (PACS), also known as long COVID. PACS is a complex, multisystem disorder that can affect various organs and systems, including the respiratory, cardiovascular, neurological, and gastrointestinal systems. The pathophysiology of PACS is not yet fully understood, but it is believed to be related to immune dysregulation, persistent inflammation, and microvascular injury.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;The management of PACS is challenging due to its heterogeneous presentation and the lack of evidence-based treatments. Current treatment approaches are mainly supportive and aim to alleviate symptoms and improve quality of life. These include:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- Pulmonary rehabilitation &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; respiratory symptoms
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- Cardiac rehabilitation &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; cardiovascular symptoms
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- Cognitive-behavioral therapy &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; neurological symptoms
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- Dietary modifications and medications &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; gastrointestinal symptoms
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- Vaccination to prevent reinfection and further complications
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- Symptomatic treatment with medications such as nonsteroidal anti-inflammatory drugs (NSAIDs), corticosteroids, and antihistamines
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- Experimental treatments such as antiviral therapy, immunomodulatory therapy, and cell-based therapy
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;potential-issues-and-workarounds&#34;&gt;Potential issues and workarounds&lt;/h2&gt;
&lt;p&gt;PubMed API wasn’t built for high scalability, so this approach wouldn’t be great for a high traffic production system. In that case, you could consider pulling articles into a 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack &lt;code&gt;DocumentStore&lt;/code&gt;&lt;/a&gt; that is backed by persistent storage&lt;/p&gt;
&lt;p&gt;Alternately, you could experiment with models that has been trained on a medical dataset, such as 
&lt;a href=&#34;https://gradient.ai/healthcare&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Gradient&lt;/a&gt;’s.&lt;/p&gt;
&lt;h2 id=&#34;wrapping-it-up&#34;&gt;Wrapping it up&lt;/h2&gt;
&lt;p&gt;Today you learned how to use Mixtral 8x7B and Haystack RAG pipelines to build a medical chatbot. Thanks for reading! If you want to learn more about Haystack 2.0 or RAG Pipelines, these posts might be of interest:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/mixtral-8x7b-for-web-qa.ipynb&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Getting started with Mixtral 8x7b and Haystack&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://haystack.deepset.ai/blog/customizing-rag-to-summarize-hacker-news-posts-with-haystack2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Customizing RAG Pipelines to Summarize Latest Hacker News Posts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Gemini Models with Google Vertex AI Integration for Haystack</title>
      <link>https://haystack.deepset.ai/blog/gemini-models-with-google-vertex-for-haystack/</link>
      <pubDate>Mon, 18 Dec 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/gemini-models-with-google-vertex-for-haystack/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;In this article, we will introduce you to the new Google Vertex AI Integration for Haystack. While this integration introduces several new components to the Haystack eco-system (feel free to explore the full integration repo!), we’d like to start by showcasing two components in particular: the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/vertexaigeminigenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;VertexAIGeminiGenerator&lt;/code&gt;&lt;/a&gt; and the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/vertexaigeminichatgenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;VertexAIGeminiChatGenerator&lt;/code&gt;&lt;/a&gt;, using the &lt;code&gt;gemini-pro&lt;/code&gt; and &lt;code&gt;gemini-1.5-flash&lt;/code&gt; models.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;💚 &lt;em&gt;You can run the example code showcased in this article in the accompanying&lt;/em&gt; &lt;em&gt;
&lt;a href=&#34;https://haystack.deepset.ai/cookbook/vertexai-gemini-examples&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Notebook&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The great news is, to authenticate for access to the Gemini models, you will only need to do a Google authentication in the Colab (instructions in the Colab)&lt;/p&gt;
&lt;h2 id=&#34;vertexaigeminigenerator-for-question-answering-on-images&#34;&gt;VertexAIGeminiGenerator for Question Answering on Images&lt;/h2&gt;
&lt;p&gt;The new &lt;code&gt;VertexAIGeminiGenerator&lt;/code&gt; component allows you to query Gemini models such as &lt;code&gt;gemini-pro&lt;/code&gt; and &lt;code&gt;gemini-1.5-flash&lt;/code&gt;. In this example, let’s use the latter, allowing us to also make use of images in our queries.&lt;/p&gt;
&lt;p&gt;To get started, you will need to install Haystack and the &lt;code&gt;google-vertex-haystack&lt;/code&gt; the integration:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;!pip install haystack-ai google-vertex-haystack
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Just like any other 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/generators&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;generator component in Haystack 2.0-Beta&lt;/a&gt;, to run the &lt;code&gt;GeminiGenerator&lt;/code&gt; on its own, we simply have to call the &lt;code&gt;run()&lt;/code&gt; method. However, unlike our other components, the run method here expects &lt;code&gt;parts&lt;/code&gt; as input. A &lt;code&gt;Part&lt;/code&gt; in the Google Vertex AI API can be anything from a message, to images, or even function calls. Here are the docstrings from the source code for the most up-to-date reference we could find 
&lt;a href=&#34;https://github.com/googleapis/python-aiplatform/blob/5f6ad8df5a08e78a121a72a21e21d95abb072e58/vertexai/generative_models/_generative_models.py#L1427-L1446&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here.&lt;/a&gt; Let’s run this component with a simple query 👇&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.generators.google_vertex&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;VertexAIGeminiGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;gemini&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;VertexAIGeminiGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gemini-1.5-flash&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;project_id&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;YOUR-GCP-PROJECT-ID&#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;gemini&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;parts&lt;/span&gt; = [&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What is the most interesting thing you know?&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;querying-with-images&#34;&gt;Querying with Images&lt;/h3&gt;
&lt;p&gt;Next, let’s make use of the flexibility of &lt;code&gt;parts&lt;/code&gt; and pass in some images alongside a question too. In the example below, we are providing 4 images containing robots, and asking &lt;code&gt;gemini-1.5-flash&lt;/code&gt; what it can tell us about them.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/gemini-models-with-google-vertex-for-haystack/robots.png alt=&#34;Pictures of 4 robots&#34;  /&gt;
&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;requests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.dataclasses.byte_stream&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ByteStream&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;URLS&lt;/span&gt; = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://raw.githubusercontent.com/silvanocerza/robots/main/robot1.jpg&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://raw.githubusercontent.com/silvanocerza/robots/main/robot2.jpg&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://raw.githubusercontent.com/silvanocerza/robots/main/robot3.jpg&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://raw.githubusercontent.com/silvanocerza/robots/main/robot4.jpg&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;images&lt;/span&gt; = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;ByteStream&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;requests&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;get&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;url&lt;/span&gt;).&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;mime_type&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;image/jpeg&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;url&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;URLS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;gemini&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;parts&lt;/span&gt; = [&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What can you tell me about this robots?&#34;&lt;/span&gt;, *&lt;span style=&#34;color:#268bd2&#34;&gt;images&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Fun fact:&lt;/strong&gt; We’ve notices that Gemini consistently misidentifies the 3rd robot! Often the response is: “The third image is of Gort from the 1951 film The Day the Earth Stood Still. Gort is a robot who is sent to Earth to warn humanity about the dangers of nuclear war. He is a powerful and intelligent robot, but he is also compassionate and understanding.”. However, this robot is 
&lt;a href=&#34;https://en.wikipedia.org/wiki/Marvin_the_Paranoid_Android&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Marvin the Paranoid Android&lt;/a&gt; from the &lt;em&gt;
&lt;a href=&#34;https://en.wikipedia.org/wiki/The_Hitchhiker%27s_Guide_to_the_Galaxy&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;The Hitchhiker’s Guide to the Galaxy&lt;/a&gt;&lt;/em&gt; series. It would have been pretty silly if Gort looked like that! 😅&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;vertexaigeminichatgenerator-for-function-calling&#34;&gt;VertexAIGeminiChatGenerator for Function Calling&lt;/h2&gt;
&lt;p&gt;With &lt;code&gt;gemini-pro&lt;/code&gt;, we can also start introducing function calling! So let’s see how we can do that. An important feature to note here is that function calling in this context refers to using Gemini to identify &lt;em&gt;how&lt;/em&gt; a function should be called. To see what we mean by this, let’s see if we can build a system that can run a &lt;code&gt;get_current_weather&lt;/code&gt; function, based on a question asked in natural language.&lt;/p&gt;
&lt;p&gt;For this section, we will be using the new &lt;code&gt;VertexAIGeminiChatGenerator&lt;/code&gt; component, which can optionally be initialized by providing a list of &lt;code&gt;tools&lt;/code&gt;. This will become handy in a moment because we will be able to define functions and provide them to the generator as a list of tools.&lt;/p&gt;
&lt;p&gt;For demonstration purposes, we’re simply creating a &lt;code&gt;get_current_weather&lt;/code&gt; function that returns an object which will &lt;em&gt;always&lt;/em&gt; tell us it’s ‘Sunny, and 21.8 degrees’.. If it’s Celsius, that’s a good day! ☀️&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;typing&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Annotated&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;get_current_weather&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;location&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Annotated&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;The city for which to get the weather, e.g. &#39;San Francisco&#39;&#34;&lt;/span&gt;] = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Munich&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;unit&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Annotated&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;The unit for the temperature, e.g. &#39;celsius&#39;&#34;&lt;/span&gt;] = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;celsius&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;weather&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sunny&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;temperature&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;21.8&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;unit&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;unit&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, we transform the function into a Haystack 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/tool&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;Tool&lt;/code&gt;&lt;/a&gt; object.
The description of the parameteres (provided using &lt;code&gt;Annotated&lt;/code&gt;) will be included in the schema of the tool.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.tools&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;create_tool_from_function&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;weather_tool&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;create_tool_from_function&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;get_current_weather&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can use this tool with the &lt;code&gt;VertexAIGeminiChatGenerator&lt;/code&gt; and ask it to tell us how the function should be called to answer the question “What is the temperature in celsius in Berlin?”:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.generators.google_vertex&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;VertexAIGeminiChatGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.dataclasses&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;gemini_chat&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;VertexAIGeminiChatGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gemini-pro&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;project_id&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;YOUR-GCP-PROJECT-ID&#39;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;weather_tool&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;user_message&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;ChatMessage&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_user&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What is the temperature in celsius in Berlin?&#34;&lt;/span&gt;)]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;replies&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;gemini_chat&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;user_message&lt;/span&gt;)[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;replies&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With the response we get from this interaction, we can call the function &lt;code&gt;get_current_weather&lt;/code&gt; using the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/toolinvoker&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;ToolInvoker&lt;/code&gt; component&lt;/a&gt; and proceed with our chat:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.tools&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ToolInvoker&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;tool_invoker&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ToolInvoker&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;tools&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;weather_tool&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;tool_messages&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;tool_invoker&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;replies&lt;/span&gt;)[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;tool_messages&#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;user_message&lt;/span&gt; &#43; &lt;span style=&#34;color:#268bd2&#34;&gt;replies&lt;/span&gt; &#43; &lt;span style=&#34;color:#268bd2&#34;&gt;tool_messages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;res&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;gemini_chat&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;res&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;text&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;building-a-full-retrieval-augmented-generative-pipeline&#34;&gt;Building a Full Retrieval-Augmented Generative Pipeline&lt;/h2&gt;
&lt;p&gt;Alongside the individual use of the new Gemini components above, you can of course also use them in full 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/pipelines&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack pipelines&lt;/a&gt;. Here is an example of a RAG pipeline that does question-answering on webpages using the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/linkcontentfetcher&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;LinkContentFetcher&lt;/code&gt;&lt;/a&gt; and the &lt;code&gt;VertexAIGeminiGenerator&lt;/code&gt; using the &lt;code&gt;gemini-1.5-flash&lt;/code&gt; model 👇&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;As we are working on the full release of Haystack 2.0, components that are currently available in the Beta release are mostly focused on text. So, truly multi-modal applications as full Haystack pipelines is not yet possible. We are creating components that can easily handle other medias like images, audio, and video and will be back with examples soon!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.generators.google_vertex&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;VertexAIGeminiGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.fetchers.link_content&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;LinkContentFetcher&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.converters&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;HTMLToDocument&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.preprocessors&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentSplitter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.rankers&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;TransformersSimilarityRanker&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders.prompt_builder&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;fetcher&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;LinkContentFetcher&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;converter&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;HTMLToDocument&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_splitter&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentSplitter&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;split_by&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;word&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;split_length&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;50&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;similarity_ranker&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;TransformersSimilarityRanker&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;gemini&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;VertexAIGeminiGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gemini-1.5-flash&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;project_id&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;project_id&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;According to these documents:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or doc in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  {{ doc.content }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Answer the given question: {{question}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Answer:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;fetcher&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;fetcher&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;converter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;converter&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;splitter&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;document_splitter&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ranker&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;similarity_ranker&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gemini&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;gemini&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;fetcher.streams&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;converter.sources&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;converter.documents&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;splitter.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;splitter.documents&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ranker.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ranker.documents&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gemini&#34;&lt;/span&gt;) 
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once we have the pipeline, we can run it with a query about Haystack 2.0-Beta:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What do graphs have to do with Haystack?&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;question&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                   &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ranker&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                   &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;fetcher&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;urls&#34;&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://haystack.deepset.ai/blog/introducing-haystack-2-beta-and-advent&#34;&lt;/span&gt;]}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;result&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;gemini&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now you’ve seen some of what Gemini can do, as well as how to integrate it with Haystack 🫶 If you want to learn more:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Check out the Haystack 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;docs&lt;/a&gt; or 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;tutorials&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Try out the 
&lt;a href=&#34;https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/tutorials/python_quickstart.ipynb#scrollTo=IqFXdgDFRvlU&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Gemini quickstart colab from Google&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Participate in the 
&lt;a href=&#34;https://haystack.deepset.ai/advent-of-haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Advent of Haystack&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Use Gradient Models with Haystack Pipelines</title>
      <link>https://haystack.deepset.ai/blog/using-gradient-models-with-haystack/</link>
      <pubDate>Mon, 11 Dec 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/using-gradient-models-with-haystack/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;One of the more cumbersome aspects of creating LLM applications is model management. Especially in cases where we need to fine-tune, host, and scale the models ourselves. In this case, having options at hand can be great. Today, we’ve expanded the Haystack 2.0 ecosystem with a new integration that can help you with just that&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://gradient.ai&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Gradient&lt;/a&gt; is an LLM development platform that offers web APIs for fine-tuning, embeddings, and inference on state-of-the-art open-source models. In this article, let’s take a look at the new Gradient integration for Haystack, and how you can use it in your retrieval-augmented generative pipelines.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You can find 
&lt;a href=&#34;https://colab.research.google.com/drive/1CQZoKrU0AM3bJtWcAicYA2q5FTu-nPJj?usp=sharing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;an example Colab here&lt;/a&gt;, which uses embedding and generative models from Gradient for a RAG pipeline on Notion pages.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;what-the-gradient-integration-provides&#34;&gt;What the Gradient Integration Provides&lt;/h2&gt;
&lt;p&gt;The 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/gradient&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Gradient Integration&lt;/a&gt; for Haystack comes with three new components for Haystack pipelines:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;GradientDocumentEmbedder&lt;/code&gt;: You can use this component to create embeddings of documents.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;GradientTextEmbedder&lt;/code&gt;: You can use this component to create embeddings for text snippets such as queries.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;GradientGenerator&lt;/code&gt;: You can use this component to generate responses using LLMs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-to-use-the-gradient-integration&#34;&gt;How to use the Gradient Integration&lt;/h2&gt;
&lt;p&gt;The Gradient platform provides an embeddings endpoint (at the time of writing, it supports 
&lt;a href=&#34;https://huggingface.co/BAAI/bge-large-en-v1.5&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;bge-large&lt;/a&gt;) and the fine-tuning and deployment of LLMs such as Llama-2, Bloom, and more (with more planned).&lt;/p&gt;
&lt;p&gt;To fine-tune and use models via Gradient, the first step is to create a workspace.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/using-gradient-models-with-haystack/workspace.png alt=&#34;Create a workspace&#34; class=&#34;medium&#34; /&gt;
        &lt;figcaption&gt;Create a workspace&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Once you have a workspace, you will be able to select a base model and start a fine-tuning job.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/using-gradient-models-with-haystack/fine-tuning.png alt=&#34;Fine-tune a model&#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;Fine-tune a model&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h3 id=&#34;using-the-gradientgenerator&#34;&gt;Using the GradientGenerator&lt;/h3&gt;
&lt;p&gt;You can use the &lt;code&gt;GradientGenerator&lt;/code&gt; either with any of the available 
&lt;a href=&#34;https://docs.gradient.ai/docs/models-1#%EF%B8%8F-gradient-hosted-llms&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;base models that Gradient provides&lt;/a&gt;, or with a model that you have fine-tuned on the platform.&lt;/p&gt;
&lt;p&gt;For example, to use the &lt;code&gt;llama2-7b-chat&lt;/code&gt; model:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;GRADIENT_ACCESS_TOKEN&#34;&lt;/span&gt;] = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;YOUR_GRADIENT_ACCESS_TOKEN&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;GRADIENT_WORKSPACE_ID&#34;&lt;/span&gt;] = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;YOUR_WORKSPACE_ID&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;GradientGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;base_model_slug&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llama2-7b-chat&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                              &lt;span style=&#34;color:#268bd2&#34;&gt;max_generated_token_count&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;350&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Or, to use a model that you’ve fine-tuned on the Gradient platform, provide your &lt;code&gt;model_adapter_id&lt;/code&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;GRADIENT_ACCESS_TOKEN&#34;&lt;/span&gt;] = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;YOUR_GRADIENT_ACCESS_TOKEN&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;GRADIENT_WORKSPACE_ID&#34;&lt;/span&gt;] = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;YOUR_WORKSPACE_ID&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;GradientGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model_adapter_id&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;your_finetuned_model_adapter_id&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                              &lt;span style=&#34;color:#268bd2&#34;&gt;max_generated_token_count&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;350&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;building-a-rag-pipeline&#34;&gt;Building a RAG pipeline&lt;/h3&gt;
&lt;p&gt;In this article, I’ve provided an example 
&lt;a href=&#34;https://colab.research.google.com/drive/1CQZoKrU0AM3bJtWcAicYA2q5FTu-nPJj?usp=sharing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Colab&lt;/a&gt; that does question-answering on your private Notion pages, using the &lt;code&gt;NotionExporter&lt;/code&gt; integration. We use the &lt;code&gt;GradientDocumentEmbedder&lt;/code&gt; component to create embeddings of our notion pages and index them into an &lt;code&gt;InMemoryDocumentStore&lt;/code&gt;. As for the RAG pipeline, you can use the &lt;code&gt;GradientTextEmbedder&lt;/code&gt; and &lt;code&gt;GradientGenerator&lt;/code&gt; to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Embed the user query to retrieve the most relevant documents from our Notion pages&lt;/li&gt;
&lt;li&gt;Generate a response using our own fine-tuned LLM from Gradient:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.embedders.gradient&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;GradientTextEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack_integrations.components.generators.gradient&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;GradientGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34; Answer the query, based on the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;content in the documents.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Documents:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or doc in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  {{doc.content}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Query: {{query}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;GRADIENT_ACCESS_TOKEN&#34;&lt;/span&gt;] = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;YOUR_GRADIENT_ACCESS_TOKEN&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;GRADIENT_WORKSPACE_ID&#34;&lt;/span&gt;] = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;YOUR_WORKSPACE_ID&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;text_embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;GradientTextEmbedder&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;GradientGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model_adapter_id&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;your_finetuned_model_adapter_id&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                              &lt;span style=&#34;color:#268bd2&#34;&gt;max_generated_token_count&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;350&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;text_embedder&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text_embedder&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;instance&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text_embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever.documents&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;generator&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To run this pipeline:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What are the steps for creating a custom component?&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;data&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text_embedder&#34;&lt;/span&gt;:{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;:{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
      </description>
    </item>
    
    <item>
      <title>Introducing Haystack 2.0-Beta and Advent of Haystack</title>
      <link>https://haystack.deepset.ai/blog/introducing-haystack-2-beta-and-advent/</link>
      <pubDate>Mon, 04 Dec 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/introducing-haystack-2-beta-and-advent/</guid>
      <description>
      
      
      

      
      

      <p>Today, we are really happy to announce that we have released 
<a href="https://github.com/deepset-ai/haystack/releases/tag/v2.0.0-beta.1" target="_blank" rel="noopener">Haystack 2.0-Beta</a>, alongside our first-ever 
<a href="https://haystack.deepset.ai/advent-of-haystack" target="_blank" rel="noopener">Advent of Haystack</a>: a set of 10 challenges that we will publish throughout the month of December, each introducing the features and design of Haystack 2.0-Beta.</p>
<h2 id="what-does-this-release-mean-for-me">What does this release mean for me?</h2>
<p>Since the first day we started building Haystack 2.0, we’ve involved our community with our design decisions and the feedback we got on our proposals on GitHub and via our Discord community proved to be incredibly valuable. While this is not yet the full stable release of Haystack 2.0, we want to make this first official commitment to the new design available for you to test and truly experience how Haystack is improving. We are committed to redesigning our LLM framework, and we need your help to shape it. To participate, complete and submit a challenge, with any feedback you would like to give us about your experience.</p>
<p>So, should you migrate your Haystack pipelines yet? No. This is not yet a stable release of Haystack 2.0, and by the time 2.0.0 is released, there will be a set of migration guides to help you with the task. The good news is Haystack 2.0-Beta is coming to you in a brand new package <code>haystack-ai</code>, so you do not have to change anything with your already existing Haystack deployments that depend on <code>farm-haystack</code>. We will keep up the maintenance of Haystack 1.x (currently on 1.22.1) which will continue to be released under <code>farm-haystack</code>.</p>
<p>But should you use Haystack 2.0-Beta? Yes! Because Haystack 2.0-Beta introduces some major improvements to how you will build and customize production-ready LLM applications. And the best way to do so is by participating in Advent of Haystack and checking out the 
<a href="https://docs.haystack.deepset.ai/v2.0/docs" target="_blank" rel="noopener">2.0-Beta documentation</a>.</p>
<p>To get started: <code>pip install haystack-ai</code> 🎉</p>
<blockquote>
<p>Haystack 2.0-Beta does not have feature parity with Haystack 1.x yet. The stable release of Haystack 2.0 will happen once a higher level of feature parity exists. To see a full list of available features in this beta release, check out our 
<a href="https://github.com/deepset-ai/haystack/releases/tag/v2.0.0-beta.1" target="_blank" rel="noopener">release notes</a>.</p>
</blockquote>
<blockquote>
<p>Over the last few months, we’ve also been working with some of our partners and community members on extending the 
<a href="https://haystack.deepset.ai/integrations" target="_blank" rel="noopener">Haystack integrations</a> with integrations compatible with Haystack 2.0. Today, you can also start using those with this Beta release.</p>
</blockquote>
<h2 id="why-is-haystack-changing">Why is Haystack Changing</h2>
<p>When the 
<a href="https://github.com/deepset-ai/haystack/releases/tag/0.2.1" target="_blank" rel="noopener">first proper release of Haystack</a> came about in 2020, a lot of the design centered around retrieval, embedding creation, indexing, semantic search and extractive QA. As you may have seen, that completely pivoted in the past (nearly two) years. With the rise of LLMs, we want to build more applications that make use of retrieval-augmented generation (RAG), agents and the ever-expanding LLM capabilities.</p>
<p>While the Haystack 1.x design - and especially the pipeline architecture - was fit for these use cases, the developer experience was sometimes unintuitive and difficult to customize, especially when diverting from the standard semantic search use cases.. This was mainly due to some major assumptions in the design of 1.x. For example, if you’ve used Haystack you know that a pipeline <em>has</em> to start with either a “Query” or a “File” input, even if at the end of the day you don’t want to use them. Haystack 1.x was still able to be the basis on top of which many LLM applications have been built with the addition of the <code>PromptNode</code> in 
<a href="https://github.com/deepset-ai/haystack/releases/tag/v1.12.1" target="_blank" rel="noopener">Haystack 1.12.1</a>. However, the design of our framework often made it difficult to navigate the vast optionality that it actually provides, with many model providers and capabilities being seemingly “hidden” in larger concepts such as the PromptNode, or even the <code>EmbeddingRetriever</code>.</p>
<p>With Haystack 2.0, our aim is to be a lot more explicit about what each component does, and (to possibly exploit an overused term) to make the code self-explanatory, on top of making the Pipeline and Component architecture a lot more flexible, extendable, customizable while maintaining our high standards in terms of production readiness.</p>
<h2 id="how-is-haystack-changing">How is Haystack Changing</h2>
<p>Although Haystack 2.0 is a complete overhaul of the framework, it remains rooted in the fundamental abstractions that contributed to its prior success: users can continue using pipelines, document stores and nodes, which have now been rebranded as “components”.</p>
<p>Here we will briefly highlight what some of the major changes are, with particular regard to developer experience. To see where we are in terms of feature parity with Haystack 1.x, check out the table in our 
<a href="https://github.com/deepset-ai/haystack/releases/tag/v2.0.0-beta.1" target="_blank" rel="noopener">release notes</a>.</p>
<h3 id="components">Components</h3>
<p>Nodes have always been considered the building blocks of Haystack pipelines, but that analogy isn’t completely true. If somebody is given building blocks, they assume they can arrange them however they want, but that’s not the case with nodes: for example, you can only put a Node of type Document Store at the very end of a pipeline. If you wonder why this happens, imagine building something with Legos, just the bricks all have different stubs and tubes sizes, and you have to try which one can possibly fit the other.</p>
<p>We decided to change the name Node to Component in Haystack 2.0 to stress how different the new design is. Components are still the building blocks of a Pipeline, but this time the analogy is for real:</p>
<ul>
<li>Every Component declares its input and output types, making clear to which other Component can be connected. Connections can therefore be validated already while building a Pipeline.</li>
<li>Every Component is self contained and fulfils a specific purpose: we like to say that a good Component should only have one job.</li>
<li>Every Component must respect a strict contract to be used within a Pipeline, but everything else is left to the developer to provide maximum flexibility. You can stretch this flexibility up to the point of running a single Component “standalone”, without the need of a Pipeline.</li>
</ul>
<p>For example, below is a component that takes in a <code>query</code> and returns a list of documents as <code>documents</code>.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">typing</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">List</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">component</span>, <span style="color:#268bd2">Document</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">@component</span>  
</span></span><span style="display:flex;"><span><span style="color:#859900">class</span> <span style="color:#cb4b16">MyCustomComponent</span>():  
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>  <span style="color:#268bd2">@component.output_types</span>(<span style="color:#268bd2">documents</span>=<span style="color:#268bd2">List</span>[<span style="color:#268bd2">Document</span>])  
</span></span><span style="display:flex;"><span>  <span style="color:#859900">def</span> <span style="color:#268bd2">run</span>(<span style="color:#cb4b16">self</span>, <span style="color:#268bd2">query</span>: <span style="color:#cb4b16">str</span>):  
</span></span><span style="display:flex;"><span>    <span style="color:#93a1a1;font-style:italic"># do something  </span>
</span></span><span style="display:flex;"><span>    <span style="color:#859900">return</span> {<span style="color:#2aa198">'documents'</span>: <span style="color:#268bd2">docs</span>}
</span></span></code></pre></div><p>For a full example, you can check out a custom component Tuana built that 
<a href="https://haystack.deepset.ai/blog/customizing-rag-to-summarize-hacker-news-posts-with-haystack2" target="_blank" rel="noopener">fetches the latest Hacker News articles</a>.</p>
<blockquote>
<p>You can find the complete documentation of Haystack 2.0-Beta components 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/components" target="_blank" rel="noopener">here</a>.</p>
</blockquote>
<h3 id="pipelines">Pipelines</h3>
<p>Pipelines are a core offering of Haystack, and that’s not changing. What’s different is what a Pipeline can do and how it can be assembled. In fact, the first challenge in Advent of Haystack exposes you to the most obvious changes.</p>
<h4 id="connections">Connections</h4>
<p>Connecting pipeline components is becoming a lot more flexible. We’re moving away from having a rigid contract requiring “Query” or “File” as the first input, and every component can be connected to any other compatible one (or even more than one), being explicit about which output is being connected to which input. As a result, the final input and output of a Pipeline run will be solely determined by how components are laid out.</p>
<p><strong>In Haystack 1.x</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">my_component_1</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"My_Component_1"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"Query"</span>])
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">my_component_2</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"My_Component_2"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"My_Component_1"</span>])
</span></span></code></pre></div><p><strong>In Haystack 2.0-Beta</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">my_component_1</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"my_component_1"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#268bd2">instance</span>=<span style="color:#268bd2">my_component_2</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"my_component_2"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"my_component_1.output_1"</span>, <span style="color:#2aa198">"my_component_2.input_4"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"my_component_1.output_3"</span>, <span style="color:#2aa198">"my_component_2.input_1"</span>)
</span></span></code></pre></div><h4 id="directed-multigraphs-out-with-the-acyclical">Directed (Multi)Graphs (out with the acyclical)</h4>
<p>Haystack 1.x pipelines are implemented as directed acyclic graphs (DAGs). Massi’s analogy of it is that using a pipeline in Haystack 1.x is like going down a water slide. A very powerful architecture that can branch out from a platform and join up at the main pool, building pipelines in Haystack 1.x is a matter of stringing together the Nodes that you require to build out the NLP application you’re aiming for. But it’s always one directional with a clear start and end.</p>
<p>In Haystack 2.0, we are getting rid of the A in DAG (acyclic), meaning we can have pipelines that can branch out, join, and also cycle back to another component. This allows us to set the Haystack framework up for pipelines that can retry, loop back, and potentially even run <em>forever</em> as a service. One of the first things we tried to build was a “retry” Component that allows a Pipeline to loop back if it deems an output to not be good enough.</p>
<p>These new pipelines are also technically multigraphs, meaning a single component with multiple outputs can connect to another single component with multiple inputs.</p>
<h4 id="serialization">Serialization</h4>
<p>Serialization means converting a pipeline to a format that you can save on your disk or send over the wire to load it later. In both Haystack 1.x and in Haystack 2.0-beta we use YAML for this, although we will be extending the support to other formats for Haystack 2.0.</p>
<p>However, one of the major changes to Haystack in terms of serialization is the addition of “Marshallers” to the core project. Marshallers are how we make available different serialization formats to pipelines, and they can be used to add any format that Haystack doesn’t support out of the box. For example, if you want to represent pipelines with TOML you can create a TOML Marshaller that you can pass to the serialization API.</p>
<blockquote>
<p>You can find our complete documentation on Haystack 2.0-Beta Pipelines 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/pipelines" target="_blank" rel="noopener">here</a>. You can find our complete documentation on Haystack 2.0-Beta Pipeline Serialization 
<a href="https://docs.haystack.deepset.ai/v2.0/docs/serialization" target="_blank" rel="noopener">here</a></p>
</blockquote>
<h3 id="prompt-templating">Prompt Templating</h3>
<p>One other change we are excited about is how prompt templating is changing in Haystack 2.0, which is already available in this Beta release. We are now using Jinja templating for prompts, making it very clean and readable to build prompts that have loops and that can even make use of functions inside the prompt. For example, below is a simple prompt template in Haystack 2.0 which loops through documents and also adds some meta information from those documents into the prompt.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.builders</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">PromptBuilder</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_template</span> = <span style="color:#2aa198">""" Answer the question based on the context. Refer to the URL
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">in the generated answer.
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">Context:  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% f</span><span style="color:#2aa198">or doc in documents %}  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">  {{doc.content}}  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">  URL: {{article.meta['url']}}  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% e</span><span style="color:#2aa198">ndfor %}  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">Question: {{question}}
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">"""</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_builder</span> = <span style="color:#268bd2">PromptBuilder</span>(<span style="color:#268bd2">template</span>=<span style="color:#268bd2">prompt_template</span>)
</span></span></code></pre></div><h2 id="join-us-in-testing-haystack-20-beta">Join us in testing Haystack 2.0-Beta</h2>
<p>The release of Haystack 2.0-Beta marks a significant milestone in the evolution of the Haystack framework. This announcement comes with an added bonus - the inaugural Advent of Haystack, featuring 10 challenges throughout December, offering you a hands-on opportunity to explore the features and design of the latest release. The development of Haystack 2.0 has been a collaborative effort with the community and while not yet the stable release, this Beta version invites you to test and engage in shaping the future of Haystack.</p>
<p>Cheers to the evolution of Haystack and the exciting developments ahead! 🎉</p>

      </description>
    </item>
    
    <item>
      <title>Using Generative AI to Query Large BI Tables: Our Findings</title>
      <link>https://haystack.deepset.ai/blog/business-intelligence-sql-queries-llm/</link>
      <pubDate>Wed, 29 Nov 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/business-intelligence-sql-queries-llm/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;In organizations with large collections of data in tabular form, it’s the job of a data analyst to make sense of that data by extracting, transforming, and building stories around it. An analyst’s primary tool for accessing the data is SQL. Given the impressive capabilities of large language models (LLMs), it’s natural to wonder if AI can help us translate our information needs into well-formed SQL queries.&lt;/p&gt;
&lt;p&gt;Granted, most LLMs can output at least some SQL queries based on natural language input. But can they handle large, multi-table databases in a real-world setting? In other words, can they do the job of a data analyst? A small team here at deepset set out to answer that question. Over the course of three months, we tried to find the best way to generate SQL queries on a real dataset.&lt;/p&gt;
&lt;p&gt;Their findings in a nutshell: business intelligence (BI) is hard, and so is evaluating BI use cases. Current approaches that claim high accuracy probably don’t tell the whole story, and GPT-4 is one of the best models available for generating SQL queries. If you want to learn more about our findings – and find out which of the approaches we evaluated performed the best – then this article is for you.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We’re also releasing our benchmark: 
&lt;a href=&#34;https://github.com/deepset-ai/biqa-llm&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;the code used to prompt and evaluate the LLMs&lt;/a&gt;, as well as the 
&lt;a href=&#34;https://huggingface.co/datasets/deepset/stackoverflow-survey-2023-text-sql&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataset we created for this project&lt;/a&gt;. We hope you find them useful 🙂!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;text-to-sql-is-different-from-table-qa&#34;&gt;Text-to-SQL is different from Table QA&lt;/h2&gt;
&lt;p&gt;At 
&lt;a href=&#34;https://www.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset&lt;/a&gt;, we have long been interested in the task of using language models to answer questions based on tabular data. Most recently, we wrote about 
&lt;a href=&#34;https://haystack.deepset.ai/blog/airbus-case-study/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our project with Airbus&lt;/a&gt;, which involved retrieving tables from pilot manuals and extracting answers from them – sometimes even using simple aggregation functions such as finding the maximum or average value of a column.&lt;/p&gt;
&lt;p&gt;BI use cases, however, operate on a completely different scale. A query language like SQL can execute much more powerful operations on much larger tables; for example, it can perform queries across multiple tables, join them, and reorder them according to complex criteria. To distinguish this problem from the simpler (and mostly solved) 
&lt;a href=&#34;https://www.deepset.ai/blog/question-answering-on-structured-data&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;task of Table QA&lt;/a&gt;, we called it “Business Intelligence Question Answering,” or BIQA for short. It’s also often referred to as “text-to-SQL”.&lt;/p&gt;
&lt;p&gt;The BI use case is difficult, even for humans. This is because of the inherent ambiguity of natural language, which becomes particularly apparent when we try to map it to a formal language such as SQL. As a result, a question in natural language can have multiple interpretations in SQL. Let’s say we want to find out how many developers use Python in their day-to-day work. When we formulate this as an SQL query, we have to be specific: who exactly counts as a developer? How do we define “day-to-day”?&lt;/p&gt;
&lt;p&gt;High standards for data quality can go a long way toward simplifying the ambiguity problem: aptly named columns, thorough schema descriptions, and logically organized tables all contribute to tabular databases that are easier to manage, for both humans and machines.&lt;/p&gt;
&lt;h2 id=&#34;the-project&#34;&gt;The project&lt;/h2&gt;
&lt;p&gt;The goal of our project was to find the best solution for converting natural language text into valid SQL queries that produce the correct result for the user. The generated query is returned along with the result so that users with some knowledge of SQL can verify that it is correct.&lt;/p&gt;
&lt;p&gt;The ability to query large databases in natural language would save people a lot of time. The LLM approach also means that you can make typos, query in languages other than English, and even integrate the SQL capability into a more complex workflow – it could be used as a 
&lt;a href=&#34;https://haystack.deepset.ai/blog/introducing-haystack-agents&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;tool for an agent&lt;/a&gt;, for example.&lt;/p&gt;
&lt;h3 id=&#34;existing-approaches&#34;&gt;Existing approaches&lt;/h3&gt;
&lt;p&gt;When we started, it seemed like we had a wide range of approaches to choose from. Over the past few months, many teams have tried to use the natural language understanding (NLU) capabilities of LLMs to generate SQL queries.&lt;/p&gt;
&lt;p&gt;But when we looked more closely at their results, we found that they were lacking: Often, the best-performing model only solved a very specific problem at hand and wasn’t able to generalize to other BI use cases. This also meant that models performed very differently across benchmarks. We decided to create our own dataset and benchmark because we wanted data points that best reflected a common, real-world use case – and to ensure that the evaluation set consisted of data that the LLMs hadn’t seen before.&lt;/p&gt;
&lt;h3 id=&#34;the-dataset&#34;&gt;The dataset&lt;/h3&gt;
&lt;p&gt;For our dataset, we used the latest 
&lt;a href=&#34;https://survey.stackoverflow.co/2023/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Developer Survey by StackOverflow&lt;/a&gt;. Every year, the platform polls developers and community members at large about their use of programming languages and other tools. The survey results are displayed as interactive charts on their website. In addition, StackOverflow publishes the entire dataset of questions and answers. For our dataset, we reverse-engineered StackOverflow’s SQL queries and ended up with about 120 query-answer pairs.&lt;/p&gt;
&lt;p&gt;To make the database more like a real production environment, we normalized the database schema. For example, we split tables whose columns allowed multiple values at once into separate, related tables. Each data point consists of a natural language query and a corresponding label, the SQL equivalent of the query.&lt;/p&gt;
&lt;h3 id=&#34;the-conundrum-of-evaluating-text-to-sql&#34;&gt;The conundrum of evaluating text to SQL&lt;/h3&gt;
&lt;p&gt;In our deep dive into existing approaches, we also found that even when they reported high-accuracy results, they were often based on questionable evaluation methods. Due to the ambiguity of natural language described above, the evaluation of text to SQL is still an unsolved problem with no standardized approaches. In our opinion, it always requires at least a manual component. This is an uncomfortable truth that is often glossed over by research teams - perhaps because they don’t want to talk about the fact that they only used a tiny evaluation dataset to make up for the time-consuming practice of manually evaluating the results.&lt;/p&gt;
&lt;p&gt;After trying several methods, we settled on a semi-automated evaluation procedure. Our evaluation script pre-processed the output of the LLM, classifying each result as either correct (if it fully matched the ground truth in our evaluation dataset), incorrect (if the result was None), or requiring manual evaluation. We then manually processed this last set of results to verify that the result was indeed incorrect, or that the model had simply chosen a different SQL query that still produced the same result.&lt;/p&gt;
&lt;p&gt;Obviously, this approach is very time-consuming. That’s why, over time, we added several accepted queries to our benchmark dataset. This allowed us to iterate faster because we spent less time manually evaluating.&lt;/p&gt;
&lt;h2 id=&#34;summarizing-our-approaches&#34;&gt;Summarizing our approaches&lt;/h2&gt;
&lt;p&gt;We tried a number of different approaches that allowed us to explore the full range of NLP pipelines with LLMs: agents, retrieval augmentation, fine-tuning, and more. Here we summarize our results. At the end of the section, we provide an overview of all the approaches and their performance.&lt;/p&gt;
&lt;h3 id=&#34;schema-aware-prompting&#34;&gt;Schema-aware prompting&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: In our baseline approach, we included the schema of our database in the prompt to the LLM. The schema describes the layout of the database: table names and descriptions, along with column names. This approach gives the LLM a lot of context.&lt;/p&gt;
&lt;p&gt;We also experimented with including rows from the database in the prompt to help the model better parse ambiguous column names and to give it more information about what to expect from a given cell. For instance, in the example below, we can see that HTML/CSS is counted as one language in the Languages table. This way, the model understands that the two are always combined, and can adjust its queries to the database accordingly.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/business-intelligence-sql-queries-llm/html-css.png alt=&#34;“Screenshot of shell with three tables, named ‘Employment’, ‘Languages’, and ‘Coding activities’, with three rows each. One of the values in the languages table is ‘HTML/CSS’.”&#34; class=&#34;small&#34; /&gt;
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Observations&lt;/strong&gt;: Schema-aware prompting produces very long prompts. We have 66 tables in our database, and including detailed information about each of them resulted in a prompt with about 3000 tokens. That’s not only expensive, it also means that the model has to process more information at once, which can negatively affect its output. The simple combination of schema-aware prompting and GPT-4 got us to about 48% accuracy. The model errors ranged from invalid SQL queries that didn’t match the actual schema, to more subtle errors where the SQL query returned a result that wasn’t the correct answer to the question. Including sample rows in the prompt certainly improved performance.&lt;/p&gt;
&lt;h3 id=&#34;agent-driven-navigation-of-the-database&#34;&gt;Agent-driven navigation of the database&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: An 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/agent&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;agent&lt;/a&gt; is an LLM that can act autonomously, given a set of tools. For example, it can iterate on the output of another LLM before returning it to the user. In our approach, we instruct the model to navigate through the database using increasingly complex SQL queries. This means that we don’t have to provide the database schema ourselves, but can let the model figure it out on a need-to-know basis. In fact, we asked the agent to begin its exploration by retrieving information about all the tables in the database. From there, it had to figure out the rest of the strategy on its own.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Observations&lt;/strong&gt;: The agent approach allowed us to have the LLM fix invalid queries. One benefit was that the queries were always ultimately valid SQL. Here’s an example of the agent-driven approach in action (there’s a breakdown of the individual steps after the screenshot):&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/business-intelligence-sql-queries-llm/agent.png alt=&#34;“Screenshot of the agent navigating the database.”&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;&lt;em&gt;# Question: How many respondents consider themselves to be “definitely” or “somewhat” a part of the Stack Overflow community by age in %?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;{“query”: “PRAGMA table_info(Responses);”, “show_user”: false}&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;# First up the agent gets the info for the Responses table.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;{“query”: “SELECT DISTINCT SOComm FROM Responses;”, “show_user”: false}&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;# It figures out that it needs to look at the SOComm column, specifically the distinct values.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;{“query”: “SELECT DISTINCT Age FROM Responses;”, “show_user”: false}&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;# It also checks the Age column.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;{“query”: “SELECT Age, COUNT(*) as Total, SUM(CASE WHEN SOComm IN (‘Yes, definitely’, ‘Yes, somewhat’) THEN 1 ELSE 0 END) as PartOfSOCommunity FROM Responses GROUP BY Age;”, “show_user”: false}&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;# Performs the necessary operation but still has show_user: false.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;{“query”: “SELECT Age, ROUND((SUM(CASE WHEN SOComm IN (‘Yes, definitely’, ‘Yes, somewhat’) THEN 1 ELSE 0 END) * 100.0) / COUNT(*), 2) as Percentage FROM Responses GROUP BY Age;”, “show_user”: true}&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;# Rounds out the percentage answer and sets show_user: true to end the generation.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;While this approach allowed us to initially enter shorter prompts, the agent’s iterative method meant that the historical data was passed along with each new step the agent took: the prompt grew. So we’re faced with the same problem of a large prompt that’s hard for the model to manage and results in increasingly expensive API calls. There’s also a problem with the black-box nature of the agent approach, which sometimes resulted in errors that were harder to understand than with schema-aware prompting.&lt;/p&gt;
&lt;h3 id=&#34;include-raw-schema-descriptions&#34;&gt;Include raw schema descriptions&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: In this approach, we provided not only the schema generated by the database, but also StackOverflow’s descriptions of what the columns mean. We prepended these raw descriptions in the prompt before any table schema descriptions. These hand-written descriptions contain valuable information, such as the questions used to elicit the answers. Therefore, they are expected to clarify any ambiguities in the database schema. For example, in the following screenshot, we see that the &lt;code&gt;YearsCode&lt;/code&gt; column is defined as years of programming experience “including any education”.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/business-intelligence-sql-queries-llm/detailed-descriptions.png alt=&#34;“Screenshot of a table with raw schema descriptions.”&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Observations&lt;/strong&gt;: Adding the descriptions for each column made the prompt even longer (~7k tokens). It improved accuracy by 10 percentage points to 58%.&lt;/p&gt;
&lt;h3 id=&#34;including-column-level-descriptions-and-distinct-values&#34;&gt;Including column-level descriptions and distinct values&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: In this approach, we sought to enrich the column-level information available to the LLM by including descriptions for each column as part of the schema. We also included up to 20 distinct values that a column could contain along with the descriptions. This resulted in a query of ~6.5k tokens.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/business-intelligence-sql-queries-llm/descriptions.png alt=&#34;“Screenshot of prompt with descriptions and distinct values.”&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Observations&lt;/strong&gt;: This further improved the results, bringing the accuracy up to 65%.&lt;/p&gt;
&lt;h3 id=&#34;including-few-shot-examples-and-custom-instructions&#34;&gt;Including few-shot examples and custom instructions&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: After analyzing some of the problems the model was encountering, we tried modifying the prompt and adding few-shot examples to mitigate the error. A common mistake the LLM made was counting people repeatedly if they appeared in multiple rows. Instead, we want it to add a &lt;code&gt;DISTINCT&lt;/code&gt; clause to the &lt;code&gt;COUNT()&lt;/code&gt; function to remove the repeated appearance of the same data.&lt;/p&gt;
&lt;p&gt;Therefore, we added the following instruction to the prompt:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Also, for percentage calculations over responses use either the main table “responses” accounting for nulls or distinct values from the Associative table. Not full counts from the Associative Tables.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In addition, we included some examples to show the LLM the desired behavior.&lt;/p&gt;
&lt;p&gt;It could be argued that this goes beyond prompt engineering for the database or domain and comes close to overfitting the evaluation set. Nevertheless, it is a demonstration of the LLM’s responsiveness to these changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Observations&lt;/strong&gt;: The model responded to the instruction and examples and mostly stopped making the observed mistakes. This brought the accuracy to 70%.&lt;/p&gt;
&lt;h3 id=&#34;retrieval-augmentation&#34;&gt;Retrieval augmentation&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: The model doesn’t need to know all the tables in the database to answer a question. For example, most of the questions in the evaluation set could be answered using five or fewer of the 66 tables. A retrieval module could retrieve the right tables, which would shorten the prompt and help the model focus on only relevant information. It’s important to note, however, that no retriever tuned for the specific task of matching a natural language query to a table schema was available. We used 
&lt;a href=&#34;https://huggingface.co/deepset/all-mpnet-base-v2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset/all-mpnet-base-v2&lt;/a&gt;, which we had seen perform well on previous table retrieval tasks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Observations&lt;/strong&gt;: The retrieval component didn’t work very well. The embedding model simply wasn’t able to select the relevant tables per query. The graph below plots the number of tables retrieved on the x-axis against the recall (i.e., the percentage of tables retrieved correctly) on the y-axis. It shows that even with an increased &lt;code&gt;top_k&lt;/code&gt; value, the retrieval component didn’t manage to identify all the relevant tables. Of course, this results in the LLM not seeing the right context to generate its SQL queries in the first place.&lt;/p&gt;
&lt;p&gt;Column-level retrieval (i.e., embedding columns separately and returning the whole table if a column has a high value) performed better than table-level retrieval. This is probably because it’s easier to match queries to individual columns, since a table with many columns that are irrelevant to the query can drown out the signal from a single column. Also, we’re dealing with a setup where a specific table (“Responses”) with a large number of columns should almost always be retrieved.&lt;/p&gt;
&lt;p&gt;We also tried Perfect Retrieval (passing only the schemas for relevant tables) to determine the upper bound for the retrieval-based approach. Much to our surprise, it performed worse than just passing all tables (60% vs. 65.8%), due to an increased number of errors made by the LLM.&lt;/p&gt;
&lt;p&gt;The poor performance of the RAG approach is partly due to the unbalanced dataset, where the number of columns varies greatly between tables. We could try to improve it (e.g. by switching to a better retrieval model or reformulating the setup), but given the underperformance of even Perfect Retrieval on this dataset, this exploration would be better done on a different dataset/setup.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/business-intelligence-sql-queries-llm/roc.png alt=&#34;“Roc curve of different top_k values for retrieval plotted against their recall. It shows poor performance across the board.”&#34;  /&gt;
&lt;/p&gt;
&lt;h3 id=&#34;other-models&#34;&gt;Other models&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: Besides GPT-4, we also tried many open weight models: Starcoderbase, New Hope, Codegen 2.5, sqlcoder models, etc.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Observations&lt;/strong&gt;: Among them, the recently released sqlcoder-34b-alpha (loaded in 8bit) gave the most promising results (with 35.8%), but fell short of GPT-4. The main problem we encountered was that it often hallucinated non-existent columns, e.g. &lt;code&gt;work_exp&lt;/code&gt;, &lt;code&gt;response_op_sys_professional_use&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;fine-tuning-an-existing-oss-model&#34;&gt;Fine-tuning an existing OSS model&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: We can use a smaller open source LLM and fine-tune it on a suitable dataset. For that purpose, we fine-tuned a number of models on an EC2 instance using the resource-efficient 
&lt;a href=&#34;https://arxiv.org/abs/2305.14314&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;QLoRA approach&lt;/a&gt; and a subset of the 
&lt;a href=&#34;https://yale-lily.github.io/spider&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Spider dataset&lt;/a&gt;. These models included Llama 2 70B and, in particular, all code-specific models available at the time (e.g., Starcoderbase, New Hope, Codegen 2.5).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Observations&lt;/strong&gt;: Unfortunately, all of our OSS approaches failed, most likely because the dataset we used for fine-tuning wasn’t up to the task. The Spider dataset is academically oriented and doesn’t really capture the real-world business use cases that we were interested in (and that our evaluation dataset reflects quite accurately). The performance scores were correspondingly low, around 10%.&lt;/p&gt;
&lt;h2 id=&#34;an-improved-schema--evaluation-results&#34;&gt;An improved schema &#43; evaluation results&lt;/h2&gt;
&lt;p&gt;From our qualitative analysis of model predictions and error patterns, we noticed that many table or column names were difficult to map back to the original question. This seemed to be confirmed by the better performance of the LLM when it was given descriptions of the columns in addition to the schema.&lt;/p&gt;
&lt;p&gt;The initial task was therefore akin to giving a new business analyst a tangled database with dozens of tables and an arcane schema without any documentation or help, and then expecting them to come up with the right answers to our questions. We felt we hadn’t given the model a fair chance to do the job.&lt;/p&gt;
&lt;p&gt;Good data hygiene is essential for BI, so we decided to sit down and change the schema, structure the tables in a way that was easier to understand, and change the column naming to be self-explanatory. For example, we renamed a column from &lt;code&gt;LearnCode&lt;/code&gt; to &lt;code&gt;LearningToCodeMethods&lt;/code&gt;, which greatly reduced the ambiguity of the initial column name.&lt;/p&gt;
&lt;p&gt;All of the evaluation results we report in this article are based on the new schema, which has significantly improved the performance of various approaches across the board.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;Approach&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;Overall % Correct&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Agent-driven&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;40.8&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Schema-aware&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;48.3&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Schema-aware &#43; raw descriptions&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;58.3&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Schema-aware &#43; column-level descriptions &#43; distinct-20&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;65.8&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Schema-aware &#43; perfect retrieval &#43; distinct-20&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;60.0&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Schema-aware &#43; column-level descriptions &#43; distinct-20 &#43; few-shot&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;70.0&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Schema-aware &#43; perfect retrieval (sqlcoder-34b-alpha @ 8bit)&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;35.8&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;challenges-and-takeaways&#34;&gt;Challenges and takeaways&lt;/h2&gt;
&lt;p&gt;Creating a general-purpose, LLM-driven solution for business intelligence in SQL is still a largely unsolved problem - which makes it all the more interesting to explore further. The main challenges relate to the following two factors:&lt;/p&gt;
&lt;h3 id=&#34;dataset-creation-and-evaluation-is-very-time-consuming&#34;&gt;Dataset creation and evaluation is very time-consuming&lt;/h3&gt;
&lt;p&gt;Creating a dataset for evaluating text-to-SQL approaches is much harder than for, say, extractive question answering. Essentially, you have to write a piece of code (the SQL query) for each data point. That query needs to be tested and debugged before it can go into the dataset.&lt;/p&gt;
&lt;p&gt;And it’s not just the creation of the dataset - the evaluation itself is time-consuming because, as we described earlier, it cannot be fully automated. Text-to-SQL evaluation, as we can do it now, isn’t exactly scalable. Projects that want an accurate evaluation of their methods need to take into account that it will take a lot of their time.&lt;/p&gt;
&lt;h3 id=&#34;the-ambiguity-of-language-and-databases&#34;&gt;The ambiguity of language and databases&lt;/h3&gt;
&lt;p&gt;As we’ve seen, natural language queries can be quite ambiguous, making them difficult to translate into SQL queries. On the other hand, there is often more than one correct way to extract a piece of information from a database, and our evaluation dataset may not consider all of them. This brings us back to the fact that we need to at least partially evaluate the results of the model by hand after each run.&lt;/p&gt;
&lt;p&gt;Another factor that comes into play here is that databases themselves can be ambiguous - even more so if the schema isn’t particularly well designed. That’s why it helps to have someone in the loop who knows the database well - it will help you to better evaluate the generated queries. You can even use the model’s output as a sort of check on your naming practices - if a model keeps misusing a column, for example, that might be a sign that it’s time to rename it or update its description.&lt;/p&gt;
&lt;h2 id=&#34;demo&#34;&gt;Demo&lt;/h2&gt;
&lt;p&gt;In addition to our dataset and benchmarks, we also published a 
&lt;a href=&#34;https://cloud.deepset.ai/shared_prototypes?share_token=prototype_eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3NDIwMjgyMjUuOTkzLCJhdWQiOiJleHRlcm5hbCB1c2VyIiwiaXNzIjoiZEMiLCJ3b3Jrc3BhY2VfaWQiOiIwNTdjOWI5MC1jNzQzLTRlOTEtYjI3OS02ZWEwNTcwMThlYjIiLCJ3b3Jrc3BhY2VfbmFtZSI6InN0YWNrb3ZlcmZsb3ctc3VydmV5Iiwib3JnYW5pemF0aW9uX2lkIjoiNGM2MTkwMGYtMTBiNi00MDljLTkzNjQtMGE2NzlhY2NjMWM5Iiwic2hhcmVfaWQiOiJhOTEyYzllYS0yNTIxLTRkMzctYTM3ZS02ZDVkYWI2ZDQ1MjUifQ.pULl5Oud-piPgHx3ff5lovH0xZ3KjrEjObfjEiZje-0&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;demo&lt;/a&gt; of our best-performing text-to-SQL approach to share with colleagues, friends, and now you, our readers. It serves to validate whether our accuracy results are representative of a real-world scenario. We found that user feedback was quite close to our results. The demo is hosted on deepset’s 
&lt;a href=&#34;https://www.deepset.ai/deepset-cloud&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;enterprise platform, deepset Cloud&lt;/a&gt;, which creates an out-of-the-box user interface. Here’s what it looks like:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/business-intelligence-sql-queries-llm/dc.png alt=&#34;“Screenshot of deepset Cloud search interface.”&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Let’s try to recreate a question from the developer survey:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;How are experienced developers vs newbies planning to use AI tools in the future? Please provide the numbers.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The result is a table followed by the generated query itself. Let’s have a look at the query first:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/business-intelligence-sql-queries-llm/experience-query.png alt=&#34;“Screenshot of the SQL query generated by the LLM.”&#34; class=&#34;medium&#34; /&gt;
&lt;/p&gt;
&lt;p&gt;Note that the LLM has decided to come up with its own definition of who counts as an “experienced developer,” namely someone who has been coding for more than five years. This is an example of how the model has to compensate for the ambiguity in our natural language query. Here is a snapshot of the table we get as a result:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/business-intelligence-sql-queries-llm/table.png alt=&#34;“Table returned in response to the query.”&#34; class=&#34;medium&#34; /&gt;
&lt;/p&gt;
&lt;p&gt;We can now reuse this generated content by copying the table into a csv file (or other structured file format). We can even visualize it using tools such as the Python libraries pandas and matplotlib:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/business-intelligence-sql-queries-llm/pie-charts.png alt=&#34;“Two pie charts showing the distribution of how experienced developers and newbies are planning to use AI tools.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Now we can see that even though the class of newbies (as defined by the LLM) is much larger than the cohort of experienced developers, the expectations of both groups from AI tools seem to be very similar.&lt;/p&gt;
&lt;p&gt;This is just a glimpse of what our text-to-SQL solution can do. Feel free to play around with the demo and give us your feedback!&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The executive’s dream of simply asking AI for any insight or dashboard without going through a business analyst won’t come true anytime soon. The approaches we have tested are still too fragile to blindly trust their results for business-critical decisions.&lt;/p&gt;
&lt;p&gt;However, in the same way that Copilot increases developer productivity, an AI companion for BI could significantly increase the productivity of any business analyst. They could use the LLM to generate an initial query, review the generated SQL and output, and only edit if necessary. Just from using the tool ourselves, we’re sure it would boost analyst productivity in a similar way to how code assistants have impacted developers.&lt;/p&gt;
&lt;p&gt;It’s really hard to create a general-purpose SQL model – most success stories involve very specific use cases. For now, you can create specialized models with better fine-tuning datasets for specific use cases only. But keep in mind that creating datasets is time consuming.&lt;/p&gt;
&lt;p&gt;A robust evaluation that still allows you to move quickly is essential to making progress. If your organization has good data management practices in place, you’re likely to be more successful in your text-to-SQL efforts.&lt;/p&gt;
&lt;p&gt;Interested in moving beyond the AI hype and using LLMs where they can add real value? Check out our 
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack repo&lt;/a&gt; on GitHub and 
&lt;a href=&#34;https://discord.com/invite/xYvH6drSmA&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;join us on Discord&lt;/a&gt;!&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>RAG Pipelines From Scratch</title>
      <link>https://haystack.deepset.ai/blog/rag-pipelines-from-scratch/</link>
      <pubDate>Tue, 21 Nov 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/rag-pipelines-from-scratch/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Retrieval Augmented Generation (RAG) is quickly becoming an essential technique to make LLMs more reliable and effective at answering any question, regardless of how specific. To stay relevant in today’s NLP landscape, Haystack must enable it.&lt;/p&gt;
&lt;p&gt;Let’s see how to build such applications with Haystack 2.0, from a direct call to an LLM to a fully-fledged, production-ready RAG pipeline that scales. At the end of this post, we will have an application that can answer questions about world countries based on data stored in a private database. At that point, the knowledge of the LLM will be only limited by the content of our data store, and all of this can be accomplished without fine-tuning language models.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;💡 &lt;em&gt;I recently gave a talk about RAG applications in Haystack 2.0, so if you prefer videos to blog posts, you can find the recording 
&lt;a href=&#34;https://zansara.dev/talks/2023-10-12-office-hours-rag-pipelines/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;. Keep in mind that the code might be slightly outdated.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;what-is-rag&#34;&gt;What is RAG?&lt;/h2&gt;
&lt;p&gt;The idea of Retrieval Augmented Generation was first defined in a 
&lt;a href=&#34;https://arxiv.org/abs/2005.11401&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;paper&lt;/a&gt; by Meta in 2020. It was designed to solve a few of the inherent limitations of seq2seq models (language models that, given a sentence, can finish writing it for you), such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Their internal knowledge, as vast as it may be, will always be limited and at least slightly out of date.&lt;/li&gt;
&lt;li&gt;They work best on generic topics rather than niche and specific areas unless they’re fine-tuned on purpose, which is a costly and slow process.&lt;/li&gt;
&lt;li&gt;All models, even those with subject-matter expertise, tend to “hallucinate”: they confidently produce false statements backed by apparently solid reasoning.&lt;/li&gt;
&lt;li&gt;They cannot reliably cite their sources or tell where their knowledge comes from, which makes fact-checking their replies nontrivial.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;RAG solves these issues of “grounding” the LLM to reality by providing some relevant, up-to-date, and trusted information to the model together with the question. In this way, the LLM doesn’t need to draw information from its internal knowledge, but it can base its replies on the snippets provided by the user.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/rag-pipelines-from-scratch/rag-paper-image.png alt=&#34;RAG Paper diagram&#34;  /&gt;
        &lt;figcaption&gt;A visual representation of RAG from the original paper&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;As you can see in the image above (taken directly from the original paper), a system such as RAG is made of two parts: one that finds text snippets that are relevant to the question asked by the user and a generative model, usually an LLM, that rephrases the snippets into a coherent answer for the question.&lt;/p&gt;
&lt;p&gt;Let’s build one of these with Haystack 2.0!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;💡 &lt;em&gt;Do you want to see this code in action? Check out the Colab notebook 
&lt;a href=&#34;https://colab.research.google.com/drive/1FkDNS3hTO4oPXHFbXQcldls0kf-KTq-r?usp=sharing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Warning:&lt;/strong&gt; &lt;em&gt;This code was tested on &lt;code&gt;haystack-ai==2.0.0b5&lt;/code&gt;. Haystack 2.0 is still unstable, so later versions might introduce breaking changes without notice until Haystack 2.0 is officially released. The concepts and components however stay the same.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;generators-haystacks-llm-components&#34;&gt;Generators: Haystack’s LLM components&lt;/h2&gt;
&lt;p&gt;As every NLP framework that deserves its name, Haystack supports LLMs in different ways. The easiest way to query an LLM in Haystack 2.0 is through a Generator component: depending on which LLM and how you intend to query it (chat, text completion, etc…), you should pick the appropriate class.&lt;/p&gt;
&lt;p&gt;We’re going to use &lt;code&gt;gpt-3.5-turbo&lt;/code&gt; (the model behind ChatGPT) for these examples, so the component we need is 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/openaigenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;OpenAIGenerator&lt;/code&gt;&lt;/a&gt;. Here is all the code required to use it to query OpenAI’s &lt;code&gt;gpt-3.5-turbo&lt;/code&gt; :&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What&#39;s the official language of France?&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# returns {&#34;replies&#34;: [&#39;The official language of France is French.&#39;]}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can select your favorite OpenAI model by specifying a &lt;code&gt;model&lt;/code&gt; at initialization, for example, &lt;code&gt;gpt-4&lt;/code&gt;. It also supports setting an &lt;code&gt;api_base_url&lt;/code&gt; for private deployments, a &lt;code&gt;streaming_callback&lt;/code&gt; if you want to see the output generated live in the terminal, and optional &lt;code&gt;kwargs&lt;/code&gt; to let you pass whatever other parameter the model understands, such as the number of answers (&lt;code&gt;n&lt;/code&gt;), the temperature (&lt;code&gt;temperature&lt;/code&gt;), etc.&lt;/p&gt;
&lt;p&gt;Note that in this case, we’re passing the API key to the component’s constructor. This is unnecessary: &lt;code&gt;OpenAIGenerator&lt;/code&gt; can read the value from the &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; environment variable and also from the &lt;code&gt;api_key&lt;/code&gt; module variable of 
&lt;a href=&#34;https://github.com/openai/openai-python#usage&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;openai&lt;/code&gt;’s SDK&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Right now, Haystack supports HuggingFace models through the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/huggingfacelocalgenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;HuggingFaceLocalGenerator&lt;/code&gt;&lt;/a&gt; and 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/huggingfacetgigenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;HuggingFaceTGIGenerator&lt;/code&gt;&lt;/a&gt; components, and many more LLMs are coming soon.&lt;/p&gt;
&lt;h2 id=&#34;promptbuilder-structured-prompts-from-templates&#34;&gt;PromptBuilder: structured prompts from templates&lt;/h2&gt;
&lt;p&gt;Let’s imagine that our LLM-powered application also comes with some pre-defined questions that the user can select instead of typing in full. For example, instead of asking them to type &lt;code&gt;What&#39;s the official language of France?&lt;/code&gt;, we let them select &lt;code&gt;Tell me the official languages&lt;/code&gt; from a list, and they simply need to type “France” (or “Wakanda” for a change - our chatbot needs some challenges too).&lt;/p&gt;
&lt;p&gt;In this scenario, we have two pieces of the prompt: a variable (the country name, like “France”) and a prompt template, which in this case is &lt;code&gt;&#34;What&#39;s the official language of {{ country }}?&#34;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Haystack offers a component that can render variables into prompt templates: it’s called 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/promptbuilder&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;PromptBuilder&lt;/code&gt;&lt;/a&gt;. As the generators we’ve seen before, also &lt;code&gt;PromptBuilder&lt;/code&gt; is nearly trivial to initialize and use.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders.prompt_builder&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What&#39;s the official language of {{ country }}?&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;country&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;France&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# returns {&#39;prompt&#39;: &#34;What&#39;s the official language of France?&#34;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note how we defined a variable, &lt;code&gt;country&lt;/code&gt;, by wrapping its name in double curly brackets. PromptBuilder lets you define any input variable that way: if the prompt template was &lt;code&gt;&#34;What&#39;s the official language of {{ nation }}?&#34;&lt;/code&gt;, the &lt;code&gt;run()&lt;/code&gt; method of &lt;code&gt;PromptBuilder&lt;/code&gt; would have expected a &lt;code&gt;nation&lt;/code&gt; input.&lt;/p&gt;
&lt;p&gt;This syntax comes from 
&lt;a href=&#34;https://jinja.palletsprojects.com/en/3.0.x/intro/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Jinja2&lt;/a&gt;, a popular templating library for Python. If you have ever used Flask, Django, or Ansible, you will feel at home with &lt;code&gt;PromptBuilder&lt;/code&gt;. Instead, if you never heard of any of these libraries, you can check out the 
&lt;a href=&#34;https://jinja.palletsprojects.com/en/3.0.x/templates/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;syntax&lt;/a&gt; on Jinja’s documentation. Jinja has a powerful templating language and offers way more features than you’ll ever need in prompt templates, ranging from simple if statements and for loops to object access through dot notation, nesting of templates, variables manipulation, macros, full-fledged import and encapsulation of templates, and more.&lt;/p&gt;
&lt;h2 id=&#34;a-simple-generative-pipeline&#34;&gt;A Simple Generative Pipeline&lt;/h2&gt;
&lt;p&gt;With these two components, we can assemble a minimal pipeline to see how they work together. Connecting them is trivial: &lt;code&gt;PromptBuilder&lt;/code&gt; generates a &lt;code&gt;prompt&lt;/code&gt; output, and &lt;code&gt;OpenAIGenerator&lt;/code&gt; expects an input with the same name and type.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders.prompt_builder&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What&#39;s the official language of {{ country }}?&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;country&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;France&#34;&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# returns {&#34;llm&#34;: {&#34;replies&#34;: [&#39;The official language of France is French.&#39;] }}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here is the pipeline graph:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/rag-pipelines-from-scratch/simple-llm-pipeline.png alt=&#34;Simple LLM pipeline&#34;  /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;make-the-llm-cheat&#34;&gt;Make the LLM cheat&lt;/h2&gt;
&lt;p&gt;Building the Generative part of a RAG application was very simple! So far, we only provided the question to the LLM, but no information to base its answers on. Nowadays, LLMs possess a lot of general knowledge, so questions about famous countries such as France or Germany are easy for them to reply to correctly. However, when using an app about world countries, some users may be interested in knowing more about obscure or defunct microstates that don’t exist anymore. In this case, ChatGPT is unlikely to provide the correct answer without any help.&lt;/p&gt;
&lt;p&gt;For example, let’s ask our pipeline something &lt;em&gt;really&lt;/em&gt; obscure.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;country&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;the Republic of Rose Island&#34;&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# returns {&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     &#34;llm&#34;: {&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#         &#34;replies&#34;: [&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#             &#39;The official language of the Republic of Rose Island was Italian.&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#         ]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     }&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# }&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The answer is an educated guess but is not accurate: although it was located just outside of Italy’s territorial waters, according to 
&lt;a href=&#34;https://en.wikipedia.org/wiki/Republic_of_Rose_Island&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Wikipedia&lt;/a&gt; the official language of this short-lived micronation was Esperanto.&lt;/p&gt;
&lt;p&gt;How can we get ChatGPT to reply to such a question correctly? One way is to make it “cheat” by providing the answer as part of the question. In fact, &lt;code&gt;PromptBuilder&lt;/code&gt; is designed to serve precisely this use case.&lt;/p&gt;
&lt;p&gt;Here is our new, more advanced prompt:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Given the following information, answer the question.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Context: {{ context }}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Question: {{ question }}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s build a new pipeline using this prompt!&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;context_template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Given the following information, answer the question.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Context: {{ context }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Question: {{ question }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;language_template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What&#39;s the official language of {{ country }}?&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;context_prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;context_template&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;language_prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;language_template&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;language_prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;context_prompt.question&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;context_prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;context_prompt&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;context&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Rose Island had its own government, currency, post office, and commercial establishments, and the official language was Esperanto.&#34;&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;language_prompt&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;country&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;the Republic of Rose Island&#34;&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# returns {&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     &#34;llm&#34;: {&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#         &#34;replies&#34;: [&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#             &#39;The official language of the Republic of Rose Island is Esperanto.&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#         ]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     }&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# }&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s look at the graph of our Pipeline:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/rag-pipelines-from-scratch/double-promptbuilder-pipeline.png alt=&#34;Double PromptBuilder pipeline&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;The beauty of &lt;code&gt;PromptBuilder&lt;/code&gt; lies in its flexibility. It allows users to chain instances together to assemble complex prompts from simpler schemas: for example, we used the output of the first &lt;code&gt;PromptBuilder&lt;/code&gt; as the value of &lt;code&gt;question&lt;/code&gt; in the second prompt.&lt;/p&gt;
&lt;p&gt;However, in this specific scenario, we can build a simpler system by merging the two prompts into one.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Given the following information, answer the question.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Context: {{ context }}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Question: What&#39;s the official language of {{ country }}?
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Using this new prompt, the resulting pipeline becomes again very similar to our first.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Given the following information, answer the question.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Context: {{ context }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Question: What&#39;s the official language of {{ country }}?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;context&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Rose Island had its own government, currency, post office, and commercial establishments, and the official language was Esperanto.&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;country&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;the Republic of Rose Island&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# returns {&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     &#34;llm&#34;: {&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#         &#34;replies&#34;: [&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#             &#39;The official language of the Republic of Rose Island is Esperanto.&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#         ]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     }&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# }&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/rag-pipelines-from-scratch/double-variable-promptbuilder-pipeline.png alt=&#34;PromptBuilder with two inputs pipeline&#34;  /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;retrieving-the-context&#34;&gt;Retrieving the context&lt;/h2&gt;
&lt;p&gt;For now, we’ve been playing with prompts, but the fundamental question remains unanswered: where do we get the correct text snippet for the question the user is asking? We can’t expect such information as part of the input: we need our system to be able to fetch this information independently, based uniquely on the query.&lt;/p&gt;
&lt;p&gt;Thankfully, retrieving relevant information from large 
&lt;a href=&#34;https://en.wikipedia.org/wiki/Text_corpus&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;corpora&lt;/a&gt; (a technical term for extensive collections of data, usually text) is a task that Haystack excels at since its inception: the components that perform this task are called 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/docs/retrievers&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Retrievers&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Retrieval can be performed on different data sources: to begin, let’s assume we’re searching for data in a local database, which is the use case that most Retrievers are geared towards.&lt;/p&gt;
&lt;p&gt;Let’s create a small local database to store information about some European countries. Haystack offers a neat object for these small-scale demos: &lt;code&gt;InMemoryDocumentStore&lt;/code&gt;. This document store is little more than a Python dictionary under the hood but provides the same exact API as much more powerful data stores and vector stores, such as 
&lt;a href=&#34;https://github.com/deepset-ai/haystack-core-integrations/tree/main/document_stores/elasticsearch&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Elasticsearch&lt;/a&gt; or 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/chroma-documentstore&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ChromaDB&lt;/a&gt;. Keep in mind that the object is called “Document Store” and not simply “datastore” because what it stores is Haystack’s Document objects: a small dataclass that helps other components make sense of the data that they receive.&lt;/p&gt;
&lt;p&gt;So, let’s initialize an &lt;code&gt;InMemoryDocumentStore&lt;/code&gt; and write some &lt;code&gt;Documents&lt;/code&gt; into it.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.dataclasses&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt; = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;German is the the official language of Germany.&#34;&lt;/span&gt;), 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;The capital of France is Paris, and its official language is French.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Italy recognizes a few official languages, but the most widespread one is Italian.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docstore&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docstore&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;write_documents&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docstore&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;filter_documents&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# returns [&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     Document(content=&#34;German is the the official language of Germany.&#34;), &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     Document(content=&#34;The capital of France is Paris, and its official language is French.&#34;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     Document(content=&#34;Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.&#34;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     Document(content=&#34;Italy recognizes a few official languages, but the most widespread one is Italian.&#34;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# ]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once the document store is set up, we can initialize a retriever. In Haystack 2.0, each document store comes with its own set of highly optimized retrievers: &lt;code&gt;InMemoryDocumentStore&lt;/code&gt; offers two, one based on BM25 ranking and one based on embedding similarity.&lt;/p&gt;
&lt;p&gt;Let’s start with the BM25-based retriever, which is slightly easier to set up. Let’s first use it in isolation to see how it behaves.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryBM25Retriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryBM25Retriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;docstore&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Rose Island&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# returns [&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     Document(content=&#34;Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.&#34;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# ]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Rose Island&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# returns [&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     Document(content=&#34;Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.&#34;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     Document(content=&#34;Italy recognizes a few official languages, but the most widespread one is Italian.&#34;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     Document(content=&#34;The capital of France is Paris, and its official language is French.&#34;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# ]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We see that 
&lt;a href=&#34;https://docs.haystack.deepset.ai/v2.0/reference/retriever-api#inmemorybm25retriever&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;InMemoryBM25Retriever&lt;/code&gt;&lt;/a&gt; accepts a few parameters. &lt;code&gt;query&lt;/code&gt; is the question we want to find relevant documents for. In the case of BM25, the algorithm only searches for exact word matches. The resulting retriever is very fast, but it doesn’t fail gracefully: it can’t handle spelling mistakes, synonyms, or descriptions of an entity. For example, documents containing the word “cat” would be considered irrelevant against a query such as “felines”.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;top_k&lt;/code&gt; controls the number of documents returned. We can see that in the first example, only one document is returned, the correct one. In the second, where &lt;code&gt;top_k = 3&lt;/code&gt;, the retriever is forced to return three documents even if just one is relevant, so it picks the other two randomly. Although the behavior is not optimal, BM25 guarantees that if there is a document that is relevant to the query, it will be in the first position, so for now, we can use it with &lt;code&gt;top_k=1&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Retrievers also accepts a &lt;code&gt;filters&lt;/code&gt; parameter, which lets you pre-filter the documents before retrieval. This is a powerful technique that comes useful in complex applications, but for now we have no use for it. I will talk more in detail about this topic, called metadata filtering, in a later post.&lt;/p&gt;
&lt;p&gt;Let’s now make use of this new component in our Pipeline.&lt;/p&gt;
&lt;h2 id=&#34;our-first-rag-pipeline&#34;&gt;Our first RAG Pipeline&lt;/h2&gt;
&lt;p&gt;The retriever does not return a single string but a list of Documents. How do we put the content of these objects into our prompt template?&lt;/p&gt;
&lt;p&gt;It’s time to use Jinja’s powerful syntax to do some unpacking on our behalf.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Given the following information, answer the question.
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Context: 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{% for document in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    {{ document.content }}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{% endfor %}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Question: What&#39;s the official language of {{ country }}?
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Notice how, despite the slightly alien syntax for a Python programmer, what the template does is reasonably evident: it iterates over the documents and, for each of them, renders their &lt;code&gt;content&lt;/code&gt; field.&lt;/p&gt;
&lt;p&gt;With all these pieces set up, we can finally put them all together.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Given the following information, answer the question.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Context: 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or document in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    {{ document.content }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Question: What&#39;s the official language of {{ country }}?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryBM25Retriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;docstore&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;country&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;country&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;the Republic of Rose Island&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# returns {&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     &#34;llm&#34;: {&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#         &#34;replies&#34;: [&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#             &#39;The official language of the Republic of Rose Island is Esperanto.&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#         ]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     }&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# }&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/rag-pipelines-from-scratch/bm25-rag-pipeline.png alt=&#34;BM25 RAG Pipeline&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Congratulations! We’ve just built our first, true-to-its-name RAG Pipeline.&lt;/p&gt;
&lt;h2 id=&#34;scaling-up-elasticsearch&#34;&gt;Scaling up: Elasticsearch&lt;/h2&gt;
&lt;p&gt;So, we now have our running prototype. What does it take to scale this system up for production workloads?&lt;/p&gt;
&lt;p&gt;Of course, scaling up a system to production readiness is no simple task that can be addressed in a paragraph. Still, we can start this journey with one component that can readily be improved: the document store.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;InMemoryDocumentStore&lt;/code&gt; is clearly a toy implementation: Haystack supports much more performant document stores such as 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/elasticsearch-document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Elasticsearch&lt;/a&gt;, 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/chroma-documentstore&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ChromaDB&lt;/a&gt; and 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/marqo-document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Marqo&lt;/a&gt;. Since we have built our app with a BM25 retriever, let’s select Elasticsearch as our production-ready document store of choice.&lt;/p&gt;
&lt;p&gt;How do we use Elasticsearch on our pipeline? All it takes is to swap out &lt;code&gt;InMemoryDocumentStore&lt;/code&gt; and &lt;code&gt;InMemoryBM25Retriever&lt;/code&gt; with their Elasticsearch counterparts, which offer nearly identical APIs.&lt;/p&gt;
&lt;p&gt;First, let’s create the document store: we will need a slightly more complex setup to connect to the Elasticearch backend. In this example, we use Elasticsearch version 8.8.0, but every Elasticsearch 8 version should work.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;elasticsearch_haystack.document_store&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ElasticsearchDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;host&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;get&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ELASTICSEARCH_HOST&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;https://localhost:9200&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;user&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;elastic&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pwd&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;environ&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ELASTICSEARCH_PASSWORD&#34;&lt;/span&gt;]  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# You need to provide this value&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docstore&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ElasticsearchDocumentStore&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;hosts&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;host&lt;/span&gt;], 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;basic_auth&lt;/span&gt;=(&lt;span style=&#34;color:#268bd2&#34;&gt;user&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;pwd&lt;/span&gt;), 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;ca_certs&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;/content/elasticsearch-8.8.0/config/certs/http_ca.crt&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, let’s write again our four documents into the store. In this case, we specify the duplicate policy, so if the documents were already present, they would be overwritten. All Haystack document stores offer three policies to handle duplicates: &lt;code&gt;FAIL&lt;/code&gt; (the default), &lt;code&gt;SKIP&lt;/code&gt;, and &lt;code&gt;OVERWRITE&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DuplicatePolicy&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt; = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;German is the the official language of Germany.&#34;&lt;/span&gt;), 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;The capital of France is Paris, and its official language is French.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Italy recognizes a few official languages, but the most widespread one is Italian.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docstore&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;write_documents&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;policy&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;DuplicatePolicy&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;OVERWRITE&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once this is done, we are ready to build the same pipeline as before, but using &lt;code&gt;ElasticsearchBM25Retriever&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;elasticsearch_haystack.bm25_retriever&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ElasticsearchBM25Retriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Given the following information, answer the question.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Context: 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or document in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    {{ document.content }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Question: What&#39;s the official language of {{ country }}?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;ElasticsearchBM25Retriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;docstore&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;draw&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;elasticsearch-rag-pipeline.png&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;country&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;the Republic of Rose Island&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipe&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;country&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;country&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;country&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# returns {&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     &#34;llm&#34;: {&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#         &#34;replies&#34;: [&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#             &#39;The official language of the Republic of Rose Island is Esperanto.&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#         ]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     }&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# }&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/rag-pipelines-from-scratch/elasticsearch-rag-pipeline.png alt=&#34;Elasticsearch RAG Pipeline&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;That’s it! We’re now running the same pipeline over a production-ready Elasticsearch instance.&lt;/p&gt;
&lt;h2 id=&#34;wrapping-up&#34;&gt;Wrapping up&lt;/h2&gt;
&lt;p&gt;In this post, we’ve detailed some fundamental components that make RAG applications possible with Haystack: Generators, the PromptBuilder, and Retrievers. We’ve seen how they can all be used in isolation and how you can make Pipelines out of them to achieve the same goal. Last, we’ve experimented with some of the (very early!) features that make Haystack 2.0 production-ready and easy to scale up from a simple demo with minimal changes.&lt;/p&gt;
&lt;p&gt;However, this is just the start of our journey into RAG. Stay tuned!&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Multilingual Generative Question Answering with Haystack and Cohere</title>
      <link>https://haystack.deepset.ai/blog/multilingual-qa-with-cohere/</link>
      <pubDate>Wed, 08 Nov 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/multilingual-qa-with-cohere/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;In today’s interconnected world, language should never be a barrier to accessing information. This is especially crucial in the context of travel, where travelers often rely on reviews to make informed decisions about their accommodations (I know I do). But what if you’re an English-speaking traveler trying to make sense of reviews written in multiple languages and simply want to know, “Is this place too noisy to sleep?”&lt;/p&gt;
&lt;p&gt;In this blog post, we’ll delve into the details of multilingual retrieval and multilingual generation, and demonstrate how to build a &lt;strong&gt;Retrieval Augmented Generation (RAG)&lt;/strong&gt; pipeline to generate answers from multilingual hotel reviews using 
&lt;a href=&#34;https://cohere.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Cohere&lt;/a&gt; models and 
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This article is accompanied by a 
&lt;a href=&#34;https://github.com/bilgeyucel/multilingual-qa/blob/main/Multilingual_QA_with_Cohere_Haystack.ipynb&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Colab notebook.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;multilingual-embeddings&#34;&gt;Multilingual Embeddings&lt;/h2&gt;
&lt;p&gt;Typically, embedding models are trained to create word embeddings based on their semantic similarity. This means that the embeddings for “King” and “Queen” should be closer in the vector space compared to “King” and “Bread.” However, most of these embedding models are designed for a single language and cannot capture semantic similarities between words in different languages.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/multilingual-qa-with-cohere/monolingual.png alt=&#34;Monolingual Embedding Models&#34;  /&gt;
        &lt;figcaption&gt;Monolingual Embedding Models&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Multilingual embedding models, on the other hand, are trained to capture semantic similarities between words across languages. They can create similar embeddings for “King”, “König” (king in German), and “Kral” (king in Turkish). This remarkable capability of multilingual models makes it possible to retrieve documents regardless of the language in which the documents or the query are written.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/multilingual-qa-with-cohere/multilingual.png alt=&#34;Multilingual Embedding Models&#34;  /&gt;
        &lt;figcaption&gt;Multilingual Embedding Models&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Cohere’s multilingual embedding models are cutting-edge technology designed to handle text from multiple languages. These embeddings empower the model to understand the semantics of texts in languages like Spanish, French, German, and more while providing a consistent representation for efficient cross-lingual analysis.&lt;/p&gt;
&lt;p&gt;Let’s see these models in action using the &lt;code&gt;embed-multilingual-v2.0&lt;/code&gt; model to create embeddings for hotel reviews.&lt;/p&gt;
&lt;h2 id=&#34;storing-multilingual-embeddings&#34;&gt;Storing Multilingual Embeddings&lt;/h2&gt;
&lt;p&gt;To create a question answering system for hotel reviews, the first thing we need is a document store. We’ll use an &lt;code&gt;InMemoryDocumentStore&lt;/code&gt; to save the hotel reviews along with their embeddings.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryDocumentStore&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;embedding_dim&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;768&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;similarity&lt;/span&gt;= &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;dot_product&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;creating-an-indexing-pipeline&#34;&gt;Creating an Indexing Pipeline&lt;/h3&gt;
&lt;p&gt;Next, we’ll set up an indexing pipeline with a &lt;code&gt;PreProcessor&lt;/code&gt; and &lt;code&gt;EmbeddingRetriever&lt;/code&gt;. The PreProcessor will break down long reviews into smaller, meaningful chunks, while the EmbeddingRetriever will generate multilingual embeddings for each document.&lt;/p&gt;
&lt;p&gt;We’ll use the &lt;code&gt;embed-multilingual-v2.0&lt;/code&gt; model from Cohere in the &lt;code&gt;EmbeddingRetriever&lt;/code&gt;. You will need a Cohere API key for this model, which you can obtain 
&lt;a href=&#34;https://dashboard.cohere.com/api-keys&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;EmbeddingRetriever&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PreProcessor&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.pipelines&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;preprocessor&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PreProcessor&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;clean_empty_lines&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;clean_whitespace&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;clean_header_footer&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;split_by&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;word&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;split_length&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;200&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;split_respect_sentence_boundary&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;EmbeddingRetriever&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;embed-multilingual-v2.0&#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;COHERE_API_KEY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_node&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;component&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;preprocessor&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;preprocessor&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;File&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_node&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;component&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;preprocessor&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_node&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;component&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;document_store&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;retriever&#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;indexing-reviews-with-embeddings&#34;&gt;Indexing Reviews with Embeddings&lt;/h3&gt;
&lt;p&gt;Our indexing pipeline is ready to run. We will use hotel reviews in multiple languages as documents, including Portuguese, Polish, German, Spanish, French, Dutch, and English. All of these reviews belong to the same accommodation. Let’s index these reviews by running the pipeline.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;O ar condicionado de um dos quartos deu problema, mas levaram um ventilador para ser utilizado. Também por ser em uma área bem movimentada, o barulho da rua pode ser ouvido. Porém, eles deixam protetores auriculares para o uso. Também senti falta de um espelho de corpo inteiro no apartamento. Só havia o do banheiro que mostra apenas a parte superior do corpo.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Durchgängig Lärm, weil direkt an der Partymeile; schmutziges Geschirr; unvollständige Küchenausstattung; Abzugshaube über Herd ging für zwei Stunden automatisch an und lies sich nicht abstellen; Reaktionen auf Anfragen entweder gar nicht oder unfreundlich&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Das Personal ist sehr zuvorkommend! Über WhatsApp war man im guten Kontakt und konnte alles erfragen. Auch das Angebot des Shuttleservices war super und würde ich empfehlen - sehr unkompliziert! Unser Flug hatte Verspätung und der Shuttle hat auf uns gewartet. Die Lage zur Innenstadt ist sehr gut,jedoch ist die Fensterfront direkt zur Club-Straße deshalb war es nachts bis drei/vier Uhr immer recht laut. Die Kaffeemaschine oder auch die Couch hätten sauberer sein können. Ansonsten war das Appartement aber völlig ok.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Super appartement. Juste au dessus de plusieurs bars qui ferment très tard. A savoir à l&#39;avance. (Bouchons d&#39;oreilles fournis !)&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Zapach moczu przy wejściu do budynku, może warto zainstalować tam mocne światło na czujnik ruchu, dla gości to korzystne a dla kogoś kto chciałby zrobić tam coś innego niekorzystne :-). Świetne lokalizacje w centrum niestety są na to narażane.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;El apartamento estaba genial y muy céntrico, todo a mano. Al lado de la librería Lello y De la Torre de los clérigos. Está situado en una zona de marcha, así que si vais en fin de semana , habrá ruido, aunque a nosotros no nos molestaba para dormir&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;The keypad with a code is convenient and the location is convenient. Basically everything else, very noisy, wi-fi didn&#39;t work, check-in person didn&#39;t explain anything about facilities, shower head was broken, there&#39;s no cleaning and everything else one may need is charged.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;It is very central and appartement has a nice appearance (even though a lot IKEA stuff), *W A R N I N G** the appartement presents itself as a elegant and as a place to relax, very wrong place to relax - you cannot sleep in this appartement, even the beds are vibrating from the bass of the clubs in the same building - you get ear plugs from the hotel -&gt; now I understand why -&gt; I missed a trip as it was so loud and I could not hear the alarm next day due to the ear plugs.- there is a green light indicating &#39;emergency exit&#39; just above the bed, which shines very bright at night - during the arrival process, you felt the urge of the agent to leave as soon as possible. - try to go to &#39;RVA clerigos appartements&#39; -&gt; same price, super quiet, beautiful, city center and very nice staff (not an agency)- you are basically sleeping next to the fridge, which makes a lot of noise, when the compressor is running -&gt; had to switch it off - but then had no cool food and drinks. - the bed was somehow broken down - the wooden part behind the bed was almost falling appart and some hooks were broken before- when the neighbour room is cooking you hear the fan very loud. I initially thought that I somehow activated the kitchen fan&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Un peu salé surtout le sol. Manque de service et de souplesse&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;De comfort zo centraal voor die prijs.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Die Lage war sehr Zentral und man konnte alles sehenswertes zu Fuß erreichen. Wer am Wochenende nachts schlafen möchte, sollte diese Unterkunft auf keinen Fall nehmen. Party direkt vor der Tür so das man denkt, man schläft mitten drin. Sehr Sehr laut also und das bis früh 5 Uhr. Ab 7 kommt dann die Straßenreinigung die keineswegs leiser ist.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Ótima escolha! Apartamento confortável e limpo! O RoofTop é otimo para beber um vinho! O apartamento é localizado entre duas ruas de movimento noturno. Porem as janelas, blindam 90&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% d&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;o barulho. Não nos incomodou&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Nous avons passé un séjour formidable. Merci aux personnes , le bonjours à Ricardo notre taxi man, très sympathique. Je pense refaire un séjour parmi vous, après le confinement, tout était parfait, surtout leur gentillesse, aucune chaude négative. Je n&#39;ai rien à redire de négative, Ils étaient a notre écoute, un gentil message tout les matins, pour nous demander si nous avions besoins de renseignement et savoir si tout allait bien pendant notre séjour.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Boa localização. Bom pequeno almoço. A tv não se encontrava funcional.&#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;             &lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Céntrico. Muy cómodo para moverse y ver Oporto. Edificio con terraza propia en la última planta. Todo reformado y nuevo. Te traen un estupendo desayuno todas las mañanas al apartamento. Solo que se puede escuchar algo de ruido de la calle a primeras horas de la noche. Es un zona de ocio nocturno. Pero respetan los horarios.&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;indexing_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;retrieval-augmented-generation-rag&#34;&gt;Retrieval Augmented Generation (RAG)&lt;/h2&gt;
&lt;p&gt;Now that we have multilingual embeddings indexed in our document store, let’s create the part of the demo where users interact the most: Retrieval-Augmented Generation (RAG) Pipeline.&lt;/p&gt;
&lt;p&gt;A RAG pipeline consists of two parts: document retrieval and answer generation.&lt;/p&gt;
&lt;h3 id=&#34;multilingual-document-retrieval&#34;&gt;Multilingual Document Retrieval&lt;/h3&gt;
&lt;p&gt;In the document retrieval step of a RAG pipeline, the retriever creates an embedding for the query in the multilingual vector space and retrieves the most similar &lt;em&gt;top_k&lt;/em&gt; documents to the query from the document store. In our case, the retrieved documents will be hotel reviews.&lt;/p&gt;
&lt;p&gt;We’ll use the same &lt;code&gt;EmbeddingRetriever&lt;/code&gt; instance that we used in the indexing pipeline.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/multilingual-qa-with-cohere/retrieval.png alt=&#34;Retrieval Mechanism with Document Store&#34;  /&gt;
        &lt;figcaption&gt;Retrieval Mechanism with Document Store&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;h3 id=&#34;multilingual-answer-generation&#34;&gt;Multilingual Answer Generation&lt;/h3&gt;
&lt;p&gt;In the generation step of the RAG pipeline, we’ll use an LLM (a generative model) to generate an answer based on the retrieved documents.&lt;/p&gt;
&lt;p&gt;Let’s create a prompt template to use for hotel reviews. In this template, we’ll have two prompt variables: &lt;code&gt;{join(documents)}&lt;/code&gt; and &lt;code&gt;{query}&lt;/code&gt;. These variables will later be filled with the user question and the retrieved hotel reviews outputted from the retriever.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;AnswerParser&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;PromptTemplate&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;You will be provided with reviews in various languages for an accommodation. 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Create a concise and informative answer for a given question based solely on the given reviews.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Reviews: {join(documents)}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Question: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{query}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Answer:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptTemplate&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;output_parser&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AnswerParser&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After defining the &lt;code&gt;PromptTemplate&lt;/code&gt;, we need to define the &lt;code&gt;PromptNode&lt;/code&gt;. For this generative case, we’ll use Cohere’s &lt;code&gt;command&lt;/code&gt; model and initialize the &lt;code&gt;PromptNode&lt;/code&gt; with an API key, model name, and the &lt;code&gt;PromptTemplate&lt;/code&gt; we defined previously. Then, we’ll connect the &lt;code&gt;EmbeddingRetriever&lt;/code&gt; to the &lt;code&gt;PromptNode&lt;/code&gt; to finalize our RAG pipeline.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_node&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptNode&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model_name_or_path&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;command&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;COHERE_API_KEY&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;default_prompt_template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_node&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;component&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Query&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_node&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;component&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_node&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;PromptNode&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Retriever&#34;&lt;/span&gt;])  
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we can run the pipeline with various questions and learn if this accommodation is good to stay at! 🏡&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;rag_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Is this place too noisy to sleep?&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;params&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Retriever&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;top_k&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;}})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answers&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Based on the information provided in the reviews, it seems that the accommodation can be very noisy, especially at night. Multiple reviewers mentioned...&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To learn more about how you can use Cohere models in Haystack pipelines, check out our 
&lt;a href=&#34;https://haystack.deepset.ai/integrations/cohere&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;integration page for Cohere&lt;/a&gt;🩵&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Guide to Using Zephyr Models to Generate Answers on Your Data</title>
      <link>https://haystack.deepset.ai/blog/guide-to-using-zephyr-with-haystack2/</link>
      <pubDate>Mon, 06 Nov 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/guide-to-using-zephyr-with-haystack2/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;Hugging Face recently announced their new open-source LLM, Zephyr-7B Beta, which is a fine-tuned version of Mistral 7B V.01 that focuses on helpfulness and outperforms many larger models on MT-Bench and AlpacaEval benchmarks. In this article, we’re going to show you how to use the new Zephyr models in a full retrieval-augmented generation pipeline, in a way that can work on your own private data.&lt;/p&gt;
&lt;p&gt;Following the theme of 
&lt;a href=&#34;https://haystack.deepset.ai/blog/customizing-rag-to-summarize-hacker-news-posts-with-haystack2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our previous article&lt;/a&gt;, we will show you how to build a pipeline that uses Zephyr with Haystack, but we will also take the opportunity and show you how to do this with the preview package of Haystack 2.0.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Update: we released Haystack 2.0-Beta on December 4th 2023, the code in this article has been updated to work with this new release.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;querying-zephyr-with-haystack&#34;&gt;Querying Zephyr with Haystack&lt;/h2&gt;
&lt;p&gt;To start with, let’s see how to query 
&lt;a href=&#34;https://huggingface.co/HuggingFaceH4/zephyr-7b-beta&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Zephyr-7B-Beta&lt;/a&gt; with Haystack as it is. To load and manage open-source LLMs in Haystack 2.0, we can use the &lt;code&gt;HuggingFaceLocalGenerator&lt;/code&gt;. If you are using a free Colab instance (with limited resources), you can load the model using &lt;strong&gt;4-bit quantization&lt;/strong&gt; (passing the appropriate &lt;code&gt;huggingface_pipeline_kwargs&lt;/code&gt; to our Generator). For an introduction to Quantization in Hugging Face Transformers, you can read 
&lt;a href=&#34;https://huggingface.co/blog/merve/quantization&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this simple blog post&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;HuggingFaceLocalGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;HuggingFaceLocalGenerator&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;HuggingFaceH4/zephyr-7b-beta&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                          &lt;span style=&#34;color:#268bd2&#34;&gt;huggingface_pipeline_kwargs&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;device_map&#34;&lt;/span&gt;:&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;auto&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                                        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;model_kwargs&#34;&lt;/span&gt;:{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;load_in_4bit&#34;&lt;/span&gt;:&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                                          &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;bnb_4bit_use_double_quant&#34;&lt;/span&gt;:&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                                          &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;bnb_4bit_quant_type&#34;&lt;/span&gt;:&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;nf4&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                                          &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;bnb_4bit_compute_dtype&#34;&lt;/span&gt;:&lt;span style=&#34;color:#268bd2&#34;&gt;torch&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;bfloat16&lt;/span&gt;}},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                          &lt;span style=&#34;color:#268bd2&#34;&gt;generation_kwargs&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;max_new_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;350&lt;/span&gt;})
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once we have initialized this generator, we can simply call &lt;code&gt;run()&lt;/code&gt; with a query. Here’s an example that we tried:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Please write a rhyme about Italy.&#34;&lt;/span&gt;) 
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&#39;replies&#39;: [
        &#34; &lt;|assistant|&gt;\n\nIn sunny Italy, the land so bright,\nWhere pasta&#39;s served with every sight,\nThe streets
are filled with laughter&#39;s light,\nAnd love is in the air, day and night.\n\nThe Colosseum stands, a testament,\nTo
history&#39;s might, a story told,\nThe Vatican&#39;s beauty, a grandament,\nA sight that leaves one&#39;s heart so 
bold.\n\nThe rolling hills, a painter&#39;s dream,\nThe Tuscan sun, a golden hue,\nThe Amalfi Coast, a scene so 
gleam,\nA place where love and beauty pursue.\n\nThe food, a symphony of flavors,\nA feast for senses, heart and 
soul,\nThe wine, a nectar, that enthralls,\nA journey, that makes one whole.\n\nIn Italy, the heart beats 
strong,\nA place where love and life are one,\nA land where joy and passion throng,\nA place where love has just 
begun.&#34;
    ]
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;using-zephyr-in-a-retrieval-augmented-generative-pipeline&#34;&gt;Using Zephyr in a Retrieval-Augmented Generative Pipeline&lt;/h2&gt;
&lt;p&gt;Now that we’ve seen the pretty simple process of initializing a Haystack 2.0 (preview) component that can query Zephyr, let’s have a look at how we can set up a full pipeline that works on our own data. This pipeline will use Zephyr-7B-Beta as the LLM to generate responses.&lt;/p&gt;
&lt;h3 id=&#34;the-prompt&#34;&gt;The Prompt&lt;/h3&gt;
&lt;p&gt;The first step is to define a prompt template that will effectively describe the manner in which we interact with an LLM. In Haystack 2.0 (preview, but eventually also the actual major release), prompt templates can be defined using the Jinja2 templating language. This provides a wide range of customizability to your prompts at query time. These templates are then used by a &lt;code&gt;PromptBuilder&lt;/code&gt; that builds a full prompt before sending it to an LLM.&lt;/p&gt;
&lt;p&gt;We create a prompt template with the following format, using tags that Zephyr expects to identify “user”, “assistant” and “system”: &lt;code&gt;&lt;|system|&gt;\nSYSTEM MESSAGE&lt;/s&gt;\n&lt;|user|&gt;\nUSER MESSAGE&lt;/s&gt;\n&lt;|assistant|&gt;&lt;/code&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;|system|&gt;Using the information contained in the context, 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;give a comprehensive answer to the question.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;If the answer is contained in the context, also report the source URL.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;If the answer cannot be deduced from the context, do not give an answer.&lt;/s&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&lt;|user|&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Context:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  {&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or doc in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  {{ doc.content }} URL:{{ doc.meta[&#39;url&#39;] }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  {&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %};
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  Question: {{query}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  &lt;/s&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&lt;|assistant|&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s quickly walk through what we are seeing above. We will look at both the syntax present to prompt Zephyr specifically, and the syntax that we use in Haystack 2.0 (preview) to build prompts programmatically.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Zephyr ‘syntax’&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Everything you see tagged with  &lt;code&gt;&lt;|system|&gt;&lt;/code&gt; is what the LLM expects as the main instruction, defining how it should approach user input. Everything tagged with &lt;code&gt;&lt;|user|&gt;&lt;/code&gt; is what the LLM expects as user input, or even more broadly put, what it expects at “query time”.  Finally, &lt;code&gt;&lt;|assistant|&gt;&lt;/code&gt; tags the part of the prompt that the model will &lt;em&gt;complete&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Haystack Syntax&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;You will also notice that there is a for loop in the prompt. What you are seeing is fully Jinja2 templating language. We use Jinja2 with Haystack 2.0 prompt templates. You can modify this section depending on what sort of input you want to provide the LLM.&lt;/p&gt;
&lt;p&gt;Here, we are looping through every &lt;code&gt;doc&lt;/code&gt; in &lt;code&gt;documents&lt;/code&gt; and adding &lt;code&gt;doc.content&lt;/code&gt; and &lt;code&gt;doc.meta[&#39;url&#39;]&lt;/code&gt; to the prompt. After the for loop, we are also adding the &lt;code&gt;query&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In the next section, we will see how we use this prompt template in a full Haystack 2.0 (preview) pipeline so that we can fill this template to create a fully formed prompt for Zephyr.&lt;/p&gt;
&lt;h3 id=&#34;the-pipeline&#34;&gt;The Pipeline&lt;/h3&gt;
&lt;p&gt;The core concept of a pipeline is to provide a structure with which data can be funnelled from one component to the other. Each component might be responsible for doing one specific thing such as keyword search, embedding retrieval, web search, translation and so on. In this case, we would like to build a retrieval-augmented generative pipeline that performs 2 major steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Given a query by a user, it &lt;strong&gt;retrieves&lt;/strong&gt; the most relevant &lt;code&gt;documents&lt;/code&gt; from our document store. This way, we can fill the &lt;code&gt;documents&lt;/code&gt; loop in our prompt template with only the most relevant context.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prompts&lt;/strong&gt; the LLM (in this case Zephyr) with the fully formed (retrieval-augmented) prompt and receives a response.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To do so, we will be using 4 components and 1 document store. For demonstration purposes, we’ve used the &lt;code&gt;InMemoryDocumentStore&lt;/code&gt;. However, you could use a number of other document stores such as Weaviate, Pincone, Milvus etc. (see our 
&lt;a href=&#34;https://haystack.deepset.ai/integrations?type=Document&#43;Store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;document store integrations&lt;/a&gt; for a full list).&lt;/p&gt;
&lt;p&gt;Our document store has our “own” data in it. As an example, we’ve indexed documents that have information about music in our document store. Specifically, we indexed the Wikipedia pages of various bands.&lt;/p&gt;
&lt;p&gt;You can have a look at the full list of the bands we used, as well as how we set up our 
&lt;a href=&#34;https://colab.research.google.com/drive/1gvfDSWyx2uJQokxO2DG1EXBZIFI-2D0N#scrollTo=FN3CR1uGtTxH&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;indexing pipeline in our Colab example&lt;/a&gt; 😊.&lt;/p&gt;
&lt;p&gt;Now that we have our data in place in a document store, here are the rest of the 4 components we use:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A &lt;code&gt;SentenceTransformersTextEmbedder&lt;/code&gt;: We use this component to create an embedding for the query we receive from the user. That way we can use this embedding to do embedding retrieval from our document store. In this example, we use &lt;code&gt;&#34;thenlper/gte-large&#34;&lt;/code&gt; as our embedding model&lt;/li&gt;
&lt;li&gt;An &lt;code&gt;InMemoryEmbeddingRetriever&lt;/code&gt;: We use this component to retrieve the most relevant documents from the &lt;code&gt;InMemoryDocumentStore&lt;/code&gt; by doing embedding retrieval.&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;PromptBuilder&lt;/code&gt;: We use the builder we defined in the previous section&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;HuggingFaceLocalGenerator&lt;/code&gt;: We use the generator we created above, which uses the &lt;code&gt;HuggingFaceH4/zephyr-7b-beta&lt;/code&gt; model&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once again, let’s define all of our components:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.builders.prompt_builder&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.embedders&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersTextEmbedder&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.generators&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;HuggingFaceLocalGenerator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.components.retrievers.in_memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.utils&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ComponentDevice&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;text_embedder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersTextEmbedder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;thenlper/gte-large&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                                 &lt;span style=&#34;color:#268bd2&#34;&gt;device&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;ComponentDevice&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;from_str&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cuda:0&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;InMemoryEmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_template&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;|system|&gt;Using the information contained in the context, 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;give a comprehensive answer to the question.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;If the answer is contained in the context, also report the source URL.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;If the answer cannot be deduced from the context, do not give an answer.&lt;/s&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&lt;|user|&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Context:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  {&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;or doc in documents %}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  {{ doc.content }} URL:{{ doc.meta[&#39;url&#39;] }}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  {&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% e&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;ndfor %};
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  Question: {{query}}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  &lt;/s&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&lt;|assistant|&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptBuilder&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;HuggingFaceLocalGenerator&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;HuggingFaceH4/zephyr-7b-beta&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                      &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline_kwargs&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;device_map&#34;&lt;/span&gt;:&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;auto&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                                       &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;model_kwargs&#34;&lt;/span&gt;:{&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;load_in_4bit&#34;&lt;/span&gt;:&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                                                       &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;bnb_4bit_use_double_quant&#34;&lt;/span&gt;:&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                                                       &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;bnb_4bit_quant_type&#34;&lt;/span&gt;:&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;nf4&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                                                       &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;bnb_4bit_compute_dtype&#34;&lt;/span&gt;:&lt;span style=&#34;color:#268bd2&#34;&gt;torch&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;bfloat16&lt;/span&gt;}},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                      &lt;span style=&#34;color:#268bd2&#34;&gt;generation_kwargs&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;max_new_tokens&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;350&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                      )
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And then we build our pipeline. Our first step is to add these components to the pipeline, and next, we make the connections between each component:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text_embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;text_embedder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;prompt_builder&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_component&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text_embedder&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;retriever.documents&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.documents&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;connect&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder.prompt&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm.prompt&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can also visualize what our final pipeline looks like with &lt;code&gt;rag.draw(&#34;rag.png&#34;)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/guide-to-using-zephyr-with-haystack2/pipeline.png alt=&#34;A RAG Pipeline Using Zephyr 7B Beta&#34;  /&gt;
&lt;/p&gt;
&lt;h3 id=&#34;running-the-pipeline-and-restuls&#34;&gt;Running the Pipeline and Restuls&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;SentenceTransformersTextEmbedder&lt;/code&gt; with name &lt;code&gt;text_embedder&lt;/code&gt; expects &lt;code&gt;text&lt;/code&gt; as input. While the &lt;code&gt;prompt_builder&lt;/code&gt; expects &lt;code&gt;query&lt;/code&gt;. Our aim is to provide these inputs when we run the pipeline. The rest of the required inputs for each component are provided to them by the preceding components. Let’s run the pipeline with the question “What is the style of the Cure?”&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What is the style of the Cure?&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;rag&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;({
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text_embedder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;prompt_builder&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;query&#34;&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;results&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;llm&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;replies&#34;&lt;/span&gt;][&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The result we got from the pipeline above is the following:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;The style of The Cure, as described in the context, is &#34;increasingly dark and tormented&#34; and had a strong influence
on the emerging genre of gothic rock. This style is evident in their second album, &#34;Seventeen Seconds,&#34; and is 
characterized by their use of &#34;dominant, melodic bass lines; whiny, strangulated vocals; and a lyric obsession with
existential, almost literary despair.&#34; (Source: https://en.wikipedia.org/wiki/The_Cure)

Additionally, the Cure&#39;s influence on other musical artists spanning multiple genres, as listed in the context, 
further supports the description of their style as gothic rock. Some of these artists include Billy Corgan of the 
Smashing Pumpkins, Slowdive, Ride, Miki Berenyi of Lush, and Britt Daniel of Spoon, among others. (Source: 
https://en.wikipedia.org/wiki/The_Cure)

Overall, The Cure&#39;s style can be characterized as gothic rock, with a focus on dark, introspective lyrics and a 
distinctive sound that includes dominant bass lines and whiny, strangulated vocals.
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Notice that the source URL is also reported. This was because in the prompt template, we added the URL for each &lt;code&gt;doc&lt;/code&gt; in &lt;code&gt;documents&lt;/code&gt; to the prompt with &lt;code&gt;URL:{{ doc.meta[&#39;url&#39;]&lt;/code&gt; and the instructions said: “If the answer is contained in the context, also report the source URL.”&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Customizing RAG Pipelines to Summarize Latest Hacker News Posts</title>
      <link>https://haystack.deepset.ai/blog/customizing-rag-to-summarize-hacker-news-posts-with-haystack2/</link>
      <pubDate>Fri, 22 Sep 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/customizing-rag-to-summarize-hacker-news-posts-with-haystack2/</guid>
      <description>
      
      
      

      
      

      <p>Over the last few months, the team at 
<a href="https://deepset.ai" target="_blank" rel="noopener">deepset</a> has been working on a major upgrade in the Haystack repository. Along the way, we’ve been sharing our updates and design process for the upcoming 
<a href="https://github.com/deepset-ai/haystack/discussions/5568" target="_blank" rel="noopener">Haystack 2.0</a> with the community, as well as releasing new components in a preview package. This means that you can already start exploring features coming to Haystack 2.0 using the preview components available in the <code>haystack-ai</code> package (<code>pip install haystack-ai</code>).</p>
<blockquote>
<p>Update: we released Haystack 2.0-Beta on December 4th 2023, the code in this article has been updated to work with this new release.</p>
</blockquote>
<p>In this article, I’ll cover two major concepts in Haystack</p>
<ul>
<li><strong>Components:</strong> These are the smallest building blocks in Haystack. They are meant to cover one simple task. As well as using components available in the core Haystack project, it will be easier than ever in 2.0, to create your own custom components.</li>
<li><strong>Pipelines:</strong> These are made by connecting components to each other. Pipelines in 2.0 are more flexible than ever and enable you various new connection patterns between your components.</li>
</ul>
<p>While components and pipelines have been at the core of Haystack since the beginning, Haystack 2.0 introduces some significant changes to how they are constructed.</p>
<p>We’ll look at how to create custom components and pipelines using the Haystack 2.0 preview. I’ll share a custom Haystack component that fetches the latest posts from Hacker News, and show how we can use it in a retrieval-augmented generative (RAG) pipeline to generate summaries of Hacker News posts.</p>
<h2 id="components-in-haystack-20">Components in Haystack 2.0</h2>
<p>A component is a class that does <em>one thing.</em> That thing could be to ‘prompt GPT3.5’, or ‘translate’, or ‘retrieve documents’, and so on.</p>
<p>While Haystack comes with a set of components in the core project, we hope that with Haystack 2.0 you will also be able to easily build components to your own custom requirements.</p>
<p>In Haystack 2.0, a class can become a component with just two additions:</p>
<ul>
<li>A <code>@component</code> decorator on the class declaration.</li>
<li>A <code>run</code> function with a decorator <code>@component.output_types(my_output_name=my_output_type)</code> that describes what output the pipeline should expect from this component.</li>
</ul>
<p>And that’s about it.</p>
<h3 id="building-a-custom-hacker-news-component">Building a Custom Hacker News Component</h3>
<p>I’ll admit, the idea for this custom component came from one of our amazing Haystack ambassadors on Discord during a live coding session (thanks rec 💙) — and it turned out pretty well! So let’s take a look at how we create a custom component that fetches the latest <em>k</em> posts from Hacker News.</p>
<p>First, we create a <code>HackernewsNewestFetcher</code>. For it to be a valid Haystack component, it will also need a <code>run</code> function. For now, let’s create a stub function that simply returns a dictionary containing a single key <code>‘articles’</code> with the value ‘Hello world!’.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">component</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">@component</span>  
</span></span><span style="display:flex;"><span><span style="color:#859900">class</span> <span style="color:#cb4b16">HackernewsNewestFetcher</span>():  
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>  <span style="color:#268bd2">@component.output_types</span>(<span style="color:#268bd2">articles</span>=<span style="color:#cb4b16">str</span>)  
</span></span><span style="display:flex;"><span>  <span style="color:#859900">def</span> <span style="color:#268bd2">run</span>(<span style="color:#cb4b16">self</span>):  
</span></span><span style="display:flex;"><span>    <span style="color:#859900">return</span> {<span style="color:#2aa198">'articles'</span>: <span style="color:#2aa198">'Hello world!'</span>}
</span></span></code></pre></div><p>Now let’s make our component actually fetch the latest posts from Hacker News. We can use the 
<a href="https://newspaper.readthedocs.io/en/latest/" target="_blank" rel="noopener"><code>newspapers3k</code></a> package to crawl and get the contents of given URLs. We will also change the output type to return a list of Document objects.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">typing</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">List</span>  
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">component</span>, <span style="color:#268bd2">Document</span>  
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">newspaper</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Article</span>  
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">requests</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">@component</span>  
</span></span><span style="display:flex;"><span><span style="color:#859900">class</span> <span style="color:#cb4b16">HackernewsNewestFetcher</span>():  
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>  <span style="color:#268bd2">@component.output_types</span>(<span style="color:#268bd2">articles</span>=<span style="color:#268bd2">List</span>[<span style="color:#268bd2">Document</span>])  
</span></span><span style="display:flex;"><span>  <span style="color:#859900">def</span> <span style="color:#268bd2">run</span>(<span style="color:#cb4b16">self</span>, <span style="color:#268bd2">last_k</span>: <span style="color:#cb4b16">int</span>):  
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">newest_list</span> = <span style="color:#268bd2">requests</span>.<span style="color:#268bd2">get</span>(<span style="color:#268bd2">url</span>=<span style="color:#2aa198">'https://hacker-news.firebaseio.com/v0/newstories.json?print=pretty'</span>)  
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">articles</span> = []  
</span></span><span style="display:flex;"><span>    <span style="color:#859900">for</span> <span style="color:#cb4b16">id</span> <span style="color:#859900">in</span> <span style="color:#268bd2">newest_list</span>.<span style="color:#268bd2">json</span>()[<span style="color:#2aa198;font-weight:bold">0</span>:<span style="color:#268bd2">last_k</span>]:  
</span></span><span style="display:flex;"><span>      <span style="color:#268bd2">article</span> = <span style="color:#268bd2">requests</span>.<span style="color:#268bd2">get</span>(<span style="color:#268bd2">url</span>=<span style="color:#2aa198">f</span><span style="color:#2aa198">"https://hacker-news.firebaseio.com/v0/item/</span><span style="color:#2aa198">{</span><span style="color:#cb4b16">id</span><span style="color:#2aa198">}</span><span style="color:#2aa198">.json?print=pretty"</span>)  
</span></span><span style="display:flex;"><span>      <span style="color:#859900">if</span> <span style="color:#2aa198">'url'</span> <span style="color:#859900">in</span> <span style="color:#268bd2">article</span>.<span style="color:#268bd2">json</span>():  
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">articles</span>.<span style="color:#268bd2">append</span>(<span style="color:#268bd2">article</span>.<span style="color:#268bd2">json</span>()[<span style="color:#2aa198">'url'</span>])  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">docs</span> = []  
</span></span><span style="display:flex;"><span>    <span style="color:#859900">for</span> <span style="color:#268bd2">url</span> <span style="color:#859900">in</span> <span style="color:#268bd2">articles</span>:  
</span></span><span style="display:flex;"><span>      <span style="color:#859900">try</span>:  
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">article</span> = <span style="color:#268bd2">Article</span>(<span style="color:#268bd2">url</span>)  
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">article</span>.<span style="color:#268bd2">download</span>()  
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">article</span>.<span style="color:#268bd2">parse</span>()  
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">docs</span>.<span style="color:#268bd2">append</span>(<span style="color:#268bd2">Document</span>(<span style="color:#268bd2">content</span>=<span style="color:#268bd2">article</span>.<span style="color:#268bd2">text</span>, <span style="color:#268bd2">meta</span>={<span style="color:#2aa198">'title'</span>: <span style="color:#268bd2">article</span>.<span style="color:#268bd2">title</span>, <span style="color:#2aa198">'url'</span>: <span style="color:#268bd2">url</span>}))  
</span></span><span style="display:flex;"><span>      <span style="color:#859900">except</span>:  
</span></span><span style="display:flex;"><span>        <span style="color:#cb4b16">print</span>(<span style="color:#2aa198">f</span><span style="color:#2aa198">"Couldn't download </span><span style="color:#2aa198">{</span><span style="color:#268bd2">url</span><span style="color:#2aa198">}</span><span style="color:#2aa198">, skipped"</span>)  
</span></span><span style="display:flex;"><span>    <span style="color:#859900">return</span> {<span style="color:#2aa198">'articles'</span>: <span style="color:#268bd2">docs</span>}
</span></span></code></pre></div><p>We now have a component that, when run, returns a list of Documents containing the contents of the (<code>last_k</code>) latest posts on Hacker News. Here we store the output in the <code>articles</code> key of the dictionary.</p>
<h2 id="pipelines-in-haystack-20">Pipelines in Haystack 2.0</h2>
<p>A pipeline is a structure that connects one component’s output to another component’s input until a final result is reached.</p>
<p>A pipeline is created with a few steps:</p>
<ol>
<li>Create a Pipeline:<br>
<code>pipeline = Pipeline()</code></li>
<li>Add components to the pipeline:<br>
<code>pipeline.add_component(instance=component_a, name=”ComponentA”)</code><br>
<code>pipeline.add_component(instance=component_b, name=”ComponentB”)</code></li>
<li>Connect an output from one component to the input of another:<br>
<code>pipeline.connect("component_a.output_a", "component_b.input_b")</code></li>
</ol>
<p>There are already enough components available in the Haystack 2.0 preview for us to build a simple RAG pipeline that uses our new<code>HackernewsNewestFetcher</code> for the retrieval augmentation step.</p>
<h3 id="building-a-rag-pipeline-to-generate-summaries-of-hacker-news-posts">Building a RAG Pipeline to Generate Summaries of Hacker News Posts</h3>
<p>To build a RAG pipeline that can create a summary for each of the latest <em>k</em> posts on Hacker News, we will use two components from the Haystack 2.0 preview:</p>
<ul>
<li>The <code>PromptBuilder</code>: This component allows us to create prompt templates using 
<a href="https://jinja.palletsprojects.com/en/3.1.x/" target="_blank" rel="noopener">Jinja</a> as our templating language.</li>
<li>The <code>OpenAIGenerator</code>: This component simply prompts the specified GPT model. We can connect the <code>PromptBuilder</code> output to this component to customize how we interact with our chosen model.</li>
</ul>
<p>First, we initialize all of the components we will need for the pipeline:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>  
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.builders.prompt_builder</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">PromptBuilder</span>  
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.components.generators</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">OpenAIGenerator</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.utils</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Secret</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_template</span> = <span style="color:#2aa198">"""  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">You will be provided a few of the latest posts in HackerNews, followed by their URL.  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">For each post, provide a brief summary followed by the URL the full post can be found at.  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">Posts:  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% f</span><span style="color:#2aa198">or article in articles %}  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">  {{article.content}}  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">  URL: {{article.meta['url']}}  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% e</span><span style="color:#2aa198">ndfor %}  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">"""</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_builder</span> = <span style="color:#268bd2">PromptBuilder</span>(<span style="color:#268bd2">template</span>=<span style="color:#268bd2">prompt_template</span>)  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">llm</span> = <span style="color:#268bd2">OpenAIGenerator</span>(<span style="color:#268bd2">mode</span>=<span style="color:#2aa198">"gpt-4"</span>, <span style="color:#268bd2">api_key</span>=<span style="color:#268bd2">Secret</span>.<span style="color:#268bd2">from_token</span>(<span style="color:#2aa198">'YOUR_API_KEY'</span>))  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">fetcher</span> = <span style="color:#268bd2">HackernewsNewestFetcher</span>()
</span></span></code></pre></div><p>Next, we add the components to a Pipeline:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span> = <span style="color:#268bd2">Pipeline</span>()  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#2aa198">"hackernews_fetcher"</span>, <span style="color:#268bd2">fetcher</span>)  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#2aa198">"prompt_builder"</span>, <span style="color:#268bd2">prompt_builder</span>)  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_component</span>(<span style="color:#2aa198">"llm"</span>, <span style="color:#268bd2">llm</span>)
</span></span></code></pre></div><p>And finally, we connect the components to each other:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"hackernews_fetcher.articles"</span>, <span style="color:#2aa198">"prompt_builder.articles"</span>)  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"prompt_builder"</span>, <span style="color:#2aa198">"llm"</span>)
</span></span></code></pre></div><p>Here, notice how we connect <code>hackernews_fetcher.articles</code> to <code>prompt_builder.articles</code>. This is because <code>prompt_builder</code> is expecting <code>articles</code> in its template:</p>
<pre tabindex="0"><code>Posts:  
{% for article in articles %}  
  {{article.contnet}}  
  URL: {{article.meta['url']}}  
{% endfor %}
</code></pre><p>The output and input keys do not need to have matching names. Additionally, <code>prompt_builder</code> makes <em>all</em> of the input keys available to your prompt template. We could, for example, provide a <code>documents</code> input to <code>prompt_builder</code> instead of <code>articles</code>. Then our code might look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">prompt_template</span> = <span style="color:#2aa198">"""  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">You will be provided a few of the latest posts in HackerNews, followed by their URL.  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">For each post, provide a brief summary followed by the URL the full post can be found at.  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">Posts:  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% f</span><span style="color:#2aa198">or document in documents %}  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">  {{document.content}}  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">  URL: {{document.meta['url']}}  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">{</span><span style="color:#2aa198">% e</span><span style="color:#2aa198">ndfor %}  
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">"""</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span>[...]  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">connect</span>(<span style="color:#2aa198">"hackernews_fetcher.articles"</span>, <span style="color:#2aa198">"prompt_builder.documents"</span>)
</span></span></code></pre></div><p>Notice how the prompt now refers to <code>documents</code>, and the <code>connect</code> call now attaches to the corresponding <code>prompt_builder.documents</code> input.</p>
<p>Now that we have a pipeline, we can run it. Here is what I got as a response at about 22:45 CET on September 21st 🤗</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">result</span> = <span style="color:#268bd2">pipe</span>.<span style="color:#268bd2">run</span>(<span style="color:#268bd2">data</span>={<span style="color:#2aa198">"hackernews_fetcher"</span>:{<span style="color:#2aa198">"last_k"</span>: <span style="color:#2aa198;font-weight:bold">2</span>}})  
</span></span><span style="display:flex;"><span><span style="color:#cb4b16">print</span>(<span style="color:#268bd2">result</span>[<span style="color:#2aa198">'llm'</span>][<span style="color:#2aa198">'replies'</span>][<span style="color:#2aa198;font-weight:bold">0</span>])
</span></span></code></pre></div><p>Response:</p>
<pre tabindex="0"><code>1. "The translation world has legends of its own, but not all legends involve greatness.   
Many provide pain, confusion, or comedy, as these examples of bad game translation prove."   
- This post shares a humorous look at some examples of poor video game translations that have   
resulted in confusion and comedy. The author seeks to highlight that while translation is often   
necessary in game localization, it can sometimes yield suboptimal results.  
Link: https://legendsoflocalization.com/bad-translation/  
  
2. “Recently, I found myself returning to a compelling series of   
blog posts titled Zero-cost futures in Rust by Aaron Turon about what would   
become the foundation of Rust's async ecosystem.”   
- This post provides an in-depth analysis of the current state of Rust's   
'async' ecosystem, drawing upon the author's own experiences and Aaron Turon's   
blog series, "Zero-cost futures in Rust". The author also discusses the benefits and   
negatives of the current async ecosystem, the problems with ecosystem fragmentation,   
the state and issue of async-std, alternative runtimes, the complexities of writing async code,   
the benefits of synchronous threads over async, and the obsessiveness of Rust landscape with an   
async-first approach. The post concludes with the notion that async Rust should be used only   
when necessary and that the smaller, simpler language inside Rust (the synchronous Rust)   
should be the default mode.  
Link: https://corrode.dev/blog/async/
</code></pre><h2 id="further-improvements">Further Improvements</h2>
<p>This custom component was created as an experiment and you could certainly take it much further in a real-world application.</p>
<p>For example, our experimental component does nothing to reduce the length of the content in each article. This means that GPT-4 may struggle to give a good response, especially when setting <em>last_k</em> to a high number.</p>

      </description>
    </item>
    
    <item>
      <title>Deploying RAG to Production</title>
      <link>https://haystack.deepset.ai/blog/rag-deployment/</link>
      <pubDate>Wed, 13 Sep 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/rag-deployment/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;As data scientists, we have often mastered the art of prototyping. We can use machine learning frameworks like 
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt; to build, test, and fine-tune data-driven systems. We’re experts at gathering stakeholder feedback, quantifying it, and interpreting the various performance metrics. But it can get tricky for many of us when we want to move these systems into a production environment, where they become available for general use.&lt;/p&gt;
&lt;p&gt;With Haystack, developers can build complex LLM pipelines on top of their own text databases, using state-of-the-art tools: from conversational AI to semantic search and summarization. One of the most talked about architectures these days is RAG, which stands for retrieval augmented generative AI. RAG pipelines combine the power of a generative LLM with the insights contained in your data, to create truly helpful user interfaces. To learn more, check out our 
&lt;a href=&#34;https://www.deepset.ai/blog/llms-retrieval-augmentation&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;blog post on RAG&lt;/a&gt; on The Deep Dive.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/rag-deployment/rag.png alt=&#34;Sketch of a retrieval augmented generative pipeline&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;But deploying to production is an art in itself. And it can be intimidating. There are many elements that need to work together for a successful deployment, each of which is a potential source of failure. Fortunately, here at deepset, we have a team of knowledgeable people who deploy Haystack-powered pipelines to production on a regular basis. So in this article, I’ll share what I learned from them – so that you can get your RAG system into production ASAP.&lt;/p&gt;
&lt;h2 id=&#34;from-prototype-to-production&#34;&gt;From Prototype to Production&lt;/h2&gt;
&lt;p&gt;Prototyping is the process of building versions of your system – iteratively designing, deploying, and testing them – until you arrive at a configuration that meets your needs and has the potential to generate real value in production. While this process is challenging enough, it is only the beginning of your system’s life cycle.&lt;/p&gt;
&lt;p&gt;The second important step in an applied AI project is deploying the system to production. The difference between this and a development environment, to put it bluntly, is that a lot more can go wrong. That’s because you have no control over how many people will be querying your system - and expecting it to respond quickly. You also have no (or very little) way of knowing when people will query it. So you need to plan for contingencies, and build a system that is scalable (meaning that its processing power can grow and shrink as needed) and robust (so that a high load on the system doesn’t cause it to crash).&lt;/p&gt;
&lt;p&gt;When you deploy your system to production, it is also much harder to predict what people will use it for. That’s why you need to monitor your system’s performance and react accordingly. But LLM observability and pipeline monitoring are complex topics that we will leave for our next blog post.&lt;/p&gt;
&lt;h2 id=&#34;the-use-case&#34;&gt;The Use Case&lt;/h2&gt;
&lt;p&gt;Not every machine learning powered system is the same. In fact, you could argue that they’re all different – their nature depends on the problem we’re solving, the pipeline design, the underlying data, and the scale of the project.&lt;/p&gt;
&lt;p&gt;Let us consider a mid-sized project using a RAG pipeline. The pipeline is connected to a regularly updated data store. Such a project could be used in an online news environment, for example, or in an employee-facing system that ingests company reports on a regular basis. In summary, our system has the following requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;It can ingest a large batch of documents when we first build it (our existing collection of online news or company reports).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It can be updated with additional, smaller batches of documents on a regular basis (say, daily, or whenever a new document arrives).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It can be queried at any time from a user interface (probably within a browser).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While the first two points are handled by an indexing pipeline, the second is handled by a query pipeline. In Haystack, you define both in the same yaml file.&lt;/p&gt;
&lt;h2 id=&#34;moving-to-production&#34;&gt;Moving to Production&lt;/h2&gt;
&lt;p&gt;There are many tools you can use on your way to production. The exact choice depends on your needs and preferences – and if you have deployed a system before, you probably already have a preferred setup. Either way, let’s take stock of what you need for a successful deployment:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A production-ready, managed database, such as 
&lt;a href=&#34;https://opensearch.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenSearch&lt;/a&gt;, 
&lt;a href=&#34;https://weaviate.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Weaviate&lt;/a&gt;, or 
&lt;a href=&#34;https://www.pinecone.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Pinecone&lt;/a&gt;. The advantage of using a database managed by a third party is that they take care of all the complex stuff like database maintenance and security for you.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A server that can host your database and compute infrastructure: you will need CPUs as well as GPUs – possibly for indexing, quite definitely for inference. There are many options for cloud providers, so choose the one you’re most comfortable with. The largest are AWS, Azure, and GCP.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An orchestration tool like 
&lt;a href=&#34;https://kubernetes.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Kubernetes&lt;/a&gt; (often styled K8s), which communicates with both the server (where your data resides and the pipelines run) and the client (the user interface that sends requests through a REST API).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In most real-world projects, you’ll also want to test your application locally before deploying it to the external server. You can set up a local Kubernetes environment by using 
&lt;a href=&#34;https://k3d.io/v5.6.0/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;k3d&lt;/a&gt;. It allows you to create a lightweight Kubernetes cluster in docker on your own machine. See 
&lt;a href=&#34;https://haystack.deepset.ai/blog/scaling-nlp-indexing-pipelines-with-keda-and-haystack-part-2/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Kristof’s article&lt;/a&gt; for detailed instructions.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/rag-deployment/cloud.png alt=&#34;Sketch of an indexing pipeline and a RAG pipeline in the cloud, with interfaces for incoming documents and user queries.&#34;  /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;deploying-the-indexing-pipeline-to-production&#34;&gt;Deploying the indexing pipeline to production&lt;/h2&gt;
&lt;p&gt;Indexing is the process of adding documents to your database. How you go about indexing in production depends a bit on whether you’re working with a keyword retriever or an embedding retriever (or both, in a 
&lt;a href=&#34;https://haystack.deepset.ai/blog/hybrid-retrieval/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;hybrid retrieval&lt;/a&gt; setup). A keyword retriever is fast and doesn’t require any special hardware. An embedding retriever is a different beast.&lt;/p&gt;
&lt;p&gt;Embedding methods take more time because they have to run your documents through a Transformer-based language model. The model spits out a dense, semantically rich vector that is added to the database, making it searchable for the later retrieval step. Because this step is computationally expensive, you’ll want to speed it up using GPUs.&lt;/p&gt;
&lt;h3 id=&#34;preparing-the-database&#34;&gt;Preparing the database&lt;/h3&gt;
&lt;p&gt;During indexing, documents and their corresponding text embeddings are stored in the database’s memory. Before signing up for a managed database, it’s recommended to have at least a rough idea of how much space your documents and vectors will take up. That’s because adding more space later is a hassle – yet buying too much space and then not using it can cost you thousands of $€¥.&lt;/p&gt;
&lt;p&gt;The amount of space you need depends largely on the length of your vectors. For example, 
&lt;a href=&#34;https://cohere.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Cohere&lt;/a&gt;’s giant text embeddings are five times longer than the ones we often use in production – and therefore require five times as much space. So optimizing for vector length can therefore save you a lot of money every month.&lt;/p&gt;
&lt;p&gt;Another concept you’ll encounter when setting up your managed database is “high availability.” This refers to storing your documents on more than one server in more than one physical location. This practice of redundancy ensures that your documents will still be available, even if one server is temporarily or permanently down.&lt;/p&gt;
&lt;h3 id=&#34;preprocessing-and-indexing-your-documents&#34;&gt;Preprocessing and indexing your documents&lt;/h3&gt;
&lt;p&gt;During prototyping, you have defined the indexing pipeline, which specifies how you preprocess your documents, before adding them to the database. For production, you’ll move the indexing pipeline to your cloud provider, where it will be deployed on virtual machines by Kubernetes. It’s useful to summarize all the details of your configuration – such as the credentials for your document store, the pipeline yaml itself, and rules for the scaling of pipelines and hardware – in a 
&lt;a href=&#34;https://helm.sh/docs/topics/charts/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Helm chart&lt;/a&gt;. Kubernetes will then deploy your system according to the settings specified in the chart.&lt;/p&gt;
&lt;p&gt;How you send the raw files to your external service depends largely on your application and the origin of your files. For example, you can write a script that sends batches of files to your indexing endpoint at a fixed time each day or week, or you can configure a stream that sends the files as they come in. When you expect new data to index, you can enable autoscaling in Kubernetes, which will cause it to create replicas of your indexing pipeline that can then run in parallel.&lt;/p&gt;
&lt;p&gt;Because indexing as a process is not as time-sensitive as querying, you can queue your files for indexing. Queues are used to ensure that requests can be accepted almost immediately, but processed sometime in the (near) future. To learn more about queuing with KEDA, see our 
&lt;a href=&#34;https://haystack.deepset.ai/blog/scaling-nlp-indexing-pipelines-with-keda-and-haystack-part-1/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;series of articles about scaling indexing pipelines&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Once your document embeddings are ready, the service adds them to the database, where they can now be queried.&lt;/p&gt;
&lt;h2 id=&#34;deploying-the-query-pipeline-to-production&#34;&gt;Deploying the Query Pipeline to Production&lt;/h2&gt;
&lt;p&gt;Unlike indexing, querying is extremely time-sensitive. When your users query your RAG system, they expect answers quickly – even the slightest delay can cause them to give up on your product and look elsewhere. Therefore, it is critical that your production query pipeline is always available and able to handle even a large number of queries at once. This means, among other things, that the pipeline must be able to scale on demand.&lt;/p&gt;
&lt;p&gt;Again, Kubernetes is our friend. It receives the requests through the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/rest_api&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack REST API&lt;/a&gt; endpoint and creates replicas of the query pipeline when many requests come in at once. It also ensures that these pipeline replicas have the necessary hardware resources available when needed. The queries are processed by the pipelines and the retrieval-augmented, LLM-generated responses are returned to your application, where your users can interact with them.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/rag-deployment/replicas.png alt=&#34;Sketch of multiple replicas of the RAG pipeline, receiving many user queries and communicating with the database.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;The nitty-gritty technical details of scaling are handled by our orchestration tool. Additionally, model hosting services like 
&lt;a href=&#34;https://aws.amazon.com/sagemaker/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SageMaker&lt;/a&gt; or 
&lt;a href=&#34;https://huggingface.co/inference-api&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hugging Face Inference&lt;/a&gt; can be helpful to scale model inference separately. Aside from these automated solutions, we have the ability to tweak the scaling of our pipelines ourselves. To do this, it’s useful to think about the following questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;How many requests do I want each replica to handle simultaneously?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What is the expected peak load?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Can we accept a higher latency to reduce costs?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Answering these questions will help you optimize your setup and can save you money – but don’t worry too much about these settings, because they can always be adjusted. First, you need to get your system into production, then you can monitor and improve it – more on that in our next blog post. 🙂&lt;/p&gt;
&lt;h2 id=&#34;from-rags-to-riches&#34;&gt;From RAGs to Riches&lt;/h2&gt;
&lt;p&gt;It’s one thing to offer prototyping with LLMs, and quite another to provide all the elements needed to bring a system to production. Only the latter can actually help your users and customers gain valuable insights in the blink of an eye.&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt; is the framework for AI engineers and data scientists to build and deploy production-ready systems powered by the latest in AI.&lt;/p&gt;
&lt;p&gt;Are you interested in learning more about building advanced systems with LLMs to solve real-world use cases? Perhaps you’re even working on building and deploying your own custom RAG system? We’d love to hear from you – 
&lt;a href=&#34;https://discord.com/invite/xYvH6drSmA&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;join the Haystack Discord&lt;/a&gt;, where we talk about LLMs, retrieval augmentation, and much, much more.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Talk to YouTube Videos with Haystack Pipelines</title>
      <link>https://haystack.deepset.ai/blog/talk-to-youtube-videos-with-haystack-pipelines/</link>
      <pubDate>Fri, 08 Sep 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/talk-to-youtube-videos-with-haystack-pipelines/</guid>
      <description>
      
      
      

      
      

      <p><em>You can use this</em> 
<a href="https://colab.research.google.com/drive/1sZM5Y1NkPOy3y8HCsecsmhjImrARIVru?usp=sharing" target="_blank" rel="noopener"><em>Colab</em></a> <em>for a working example of the application described in this article.</em></p>
<p>In this article, I’ll be showing an example of how to leverage transcription models like OpenAI’s Whisper, so as to build a retrieval augmented generation (RAG) pipeline that will allow us to effectively search through video content.</p>
<p>The example application I’ll showcase is able to answer questions based on the transcript extracted from the video. I’ll use the 
<a href="https://www.youtube.com/watch?v=h5id4erwD4s" target="_blank" rel="noopener">video by Erika Cardenas</a> as an example. In the video, she talks about chunking and preprocessing documents for RAG pipelines. Once we’re done, we will be able to query a Haystack pipeline that will respond based on the contents of the video.</p>
<h2 id="transcribing-and-storing-the-video">Transcribing and Storing the Video</h2>
<p>To get started, we first need to set up an 
<a href="https://docs.haystack.deepset.ai/docs/pipelines#indexing-pipelines" target="_blank" rel="noopener">indexing pipeline</a>. These pipelines in Haystack are designed to be given files of some form (.pdf, .txt, .md and in our case, a YouTube link), and store them in a database. The indexing pipeline is also used to design and define how we would like files to be prepared. This often involves 
<a href="https://docs.haystack.deepset.ai/docs/file_converters" target="_blank" rel="noopener">file conversion</a> steps, some 
<a href="https://docs.haystack.deepset.ai/docs/preprocessor" target="_blank" rel="noopener">preprocessing</a>, and maybe also some 
<a href="https://docs.haystack.deepset.ai/docs/retriever#embedding-retrieval-recommended" target="_blank" rel="noopener">embedding</a> creation and so on.</p>
<p>The way we design the components and structure of this pipeline will also be important for another type of pipeline we will create in the next section: The RAG pipeline, also often referred to as the query or LLM pipeline too. While the indexing pipeline defines how we prepare and store data, an LLM pipeline <strong><em>uses</em></strong> said stored data. A simple example of the impact an indexing pipeline has on the RAG pipeline is that depending on the model we’re using, we may have to chunk our files to be longer or shorter.</p>
<h3 id="reusability">Reusability</h3>
<p>The idea behind Haystack pipelines is that once created, they can be re-invoked when needed. This ensures that data is treated the same way each time. In terms of indexing pipelines, this means we have a way to keep our databases for RAG pipelines always up to date. In a practical sense for this example application, when there’s a new video we want to be able to query, we re-use the same indexing pipeline and run the new video through it.</p>
<h3 id="creating-the-indexing-pipeline">Creating the Indexing Pipeline</h3>
<p>In this example, we’re using Weaviate as our vector database for storage. However, Haystack provides a number of 
<a href="https://haystack.deepset.ai/integrations?type=Document+Store" target="_blank" rel="noopener">Document Stores</a> which you can pick from.</p>
<p>First, we create our <code>WeaviateDocumentStore</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">weaviate</span>  
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">weaviate.embedded</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">EmbeddedOptions</span>  
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.document_stores</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">WeaviateDocumentStore</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">client</span> = <span style="color:#268bd2">weaviate</span>.<span style="color:#268bd2">Client</span>(  
</span></span><span style="display:flex;"><span>  <span style="color:#268bd2">embedded_options</span>=<span style="color:#268bd2">weaviate</span>.<span style="color:#268bd2">embedded</span>.<span style="color:#268bd2">EmbeddedOptions</span>()  
</span></span><span style="display:flex;"><span>)  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">document_store</span> = <span style="color:#268bd2">WeaviateDocumentStore</span>(<span style="color:#268bd2">port</span>=<span style="color:#2aa198;font-weight:bold">6666</span>)
</span></span></code></pre></div><p>Next, we build the indexing pipeline. Here, our aim is to create a pipeline that will create transcripts of YouTube videos. So, we use the 
<a href="https://docs.haystack.deepset.ai/docs/whisper_transcriber" target="_blank" rel="noopener"><strong><code>WhisperTranscriber</code></strong></a> as our first component. This component uses 
<a href="https://openai.com/research/whisper" target="_blank" rel="noopener">Whisper</a> by OpenAI, an automatic speech recognition (ASR) system which can be used to transcribe audio into text. The component expects audio files, and returns transcripts in 
<a href="https://docs.haystack.deepset.ai/docs/documents_answers_labels" target="_blank" rel="noopener">Haystack Document</a> form, ready to be used in any Haystack pipeline.</p>
<p>We also include preprocessing, as well as embedding creations in our pipeline. This is because when it’s time to create the RAG pipeline, we would like to do semantic search on the indexed files.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.nodes</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">EmbeddingRetriever</span>, <span style="color:#268bd2">PreProcessor</span>  
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.nodes.audio</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">WhisperTranscriber</span>  
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.pipelines</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">preprocessor</span> = <span style="color:#268bd2">PreProcessor</span>()  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">embedder</span> = <span style="color:#268bd2">EmbeddingRetriever</span>(<span style="color:#268bd2">document_store</span>=<span style="color:#268bd2">document_store</span>,   
</span></span><span style="display:flex;"><span>                              <span style="color:#268bd2">embedding_model</span>=<span style="color:#2aa198">"sentence-transformers/multi-qa-mpnet-base-dot-v1"</span>)  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">whisper</span> = <span style="color:#268bd2">WhisperTranscriber</span>(<span style="color:#268bd2">api_key</span>=<span style="color:#2aa198">'OPENAI_API_KEY'</span>)  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span> = <span style="color:#268bd2">Pipeline</span>()  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">whisper</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"Whisper"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"File"</span>])  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">preprocessor</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"Preprocessor"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"Whisper"</span>])  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">embedder</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"Embedder"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"Preprocessor"</span>])  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">document_store</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"DocumentStore"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"Embedder"</span>])
</span></span></code></pre></div><p>Next, we create a helper function that extracts the audio of YouTube videos, and we can run the pipeline, for this, we install the <code>pytube</code> package 👇</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">pytube</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">YouTube</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#859900">def</span> <span style="color:#268bd2">youtube2audio</span> (<span style="color:#268bd2">url</span>: <span style="color:#cb4b16">str</span>):  
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">yt</span> = <span style="color:#268bd2">YouTube</span>(<span style="color:#268bd2">url</span>)  
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">video</span> = <span style="color:#268bd2">yt</span>.<span style="color:#268bd2">streams</span>.<span style="color:#268bd2">filter</span>(<span style="color:#268bd2">abr</span>=<span style="color:#2aa198">'160kbps'</span>).<span style="color:#268bd2">last</span>()  
</span></span><span style="display:flex;"><span>    <span style="color:#859900">return</span> <span style="color:#268bd2">video</span>.<span style="color:#268bd2">download</span>()
</span></span></code></pre></div><p>Now, we can run our indexing pipeline with a URL to a YouTube video:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">file_path</span> = <span style="color:#268bd2">youtube2audio</span>(<span style="color:#2aa198">"https://www.youtube.com/watch?v=h5id4erwD4s"</span>)  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">run</span>(<span style="color:#268bd2">file_paths</span>=[<span style="color:#268bd2">file_path</span>])
</span></span></code></pre></div><h2 id="the-retrieval-augmented-generative-rag-pipeline">The Retrieval Augmented Generative (RAG) Pipeline</h2>
<p>This part is certainly the fun part. We now define our RAG pipeline. This will be the pipeline that defines <em>how</em> we query our videos. Although RAG pipelines often are built for question-answering, they can be designed for a number of other use cases. What the pipeline does in this case, is largely defined by what prompt you provide the LLM. You can find various prompts for different use cases in the 
<a href="https://prompthub.deepset.ai/" target="_blank" rel="noopener">PromptHub</a>.</p>
<h3 id="the-prompt">The Prompt</h3>
<p>For this example, we’ve gone with a commonly used style of question-answering prompts, although you can of course change this prompt to do what you want to achieve. For example, changing it to a prompt that asks for a summary might be interesting. You could also make it more general. Here we’re also informing the model that the transcripts belong to Weaviate videos.</p>
<pre tabindex="0"><code>You will be provided some transcripts from Weaviate YouTube videos.   
Please answer the query based on what is said in the videos.  
Video Transcripts: {join(documents)}  
Query: {query}  
Answer:
</code></pre><p>In Haystack, these prompts can be included in a pipeline with the 
<a href="https://docs.haystack.deepset.ai/docs/prompt_node#prompttemplates" target="_blank" rel="noopener"><code>PromptTemplate</code></a> and 
<a href="https://docs.haystack.deepset.ai/docs/prompt_node" target="_blank" rel="noopener"><code>PromptNode</code></a> components.</p>
<p>While the <code>PromptTemplate</code> is where we define the prompt and the variables the prompt expects as inputs (in our case <em>documents</em> and <em>query</em>), the <code>PromptNode</code> is really the interface with which we interact with LLMs. In this example, we’re using GPT-4 as our model of choice, but you can 
<a href="https://docs.haystack.deepset.ai/docs/prompt_node#models" target="_blank" rel="noopener">change this to use other models from Hugging Face, SageMaker, Azure</a> and so on.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.nodes</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">PromptNode</span>, <span style="color:#268bd2">PromptTemplate</span>, <span style="color:#268bd2">AnswerParser</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">video_qa_prompt</span> = <span style="color:#268bd2">PromptTemplate</span>(<span style="color:#268bd2">prompt</span>=<span style="color:#2aa198">"You will be provided some transcripts from Weaviate YouTube videos. Please answer the query based on what is said in the videos.</span><span style="color:#2aa198">\n</span><span style="color:#2aa198">"</span>  
</span></span><span style="display:flex;"><span>                                        <span style="color:#2aa198">"Video Transcripts: {join(documents)}</span><span style="color:#2aa198">\n</span><span style="color:#2aa198">"</span>  
</span></span><span style="display:flex;"><span>                                        <span style="color:#2aa198">"Query: </span><span style="color:#2aa198">{query}</span><span style="color:#2aa198">\n</span><span style="color:#2aa198">"</span>  
</span></span><span style="display:flex;"><span>                                        <span style="color:#2aa198">"Answer:"</span>, <span style="color:#268bd2">output_parser</span> = <span style="color:#268bd2">AnswerParser</span>())  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_node</span> = <span style="color:#268bd2">PromptNode</span>(<span style="color:#268bd2">model_name_or_path</span>=<span style="color:#2aa198">"gpt-4"</span>, 
</span></span><span style="display:flex;"><span>                         <span style="color:#268bd2">api_key</span>=<span style="color:#2aa198">'OPENAI_KEY'</span>, 
</span></span><span style="display:flex;"><span>                         <span style="color:#268bd2">default_prompt_template</span>=<span style="color:#268bd2">video_qa_prompt</span>)
</span></span></code></pre></div><h3 id="the-pipeline">The Pipeline</h3>
<p>Finally, we define our RAG pipeline. The important thing to note here is how the <em>documents</em> input gets provided to the prompt we are using.</p>
<p>Haystack retrievers always return <code>documents</code>. Notice below how the first component to get the query is the same <code>EmbeddingRetriever</code> that we used in the indexing pipeline above. This allows us to embed the query using the same model that was used for indexing the transcript. The embeddings of the query and indexed transcripts are then used to retrieve the most relevant parts of the transcript. Since these are returned by the retriever as <strong><em>documents,</em></strong> we are able to fill in the <em>documents</em> parameter of the prompt with whatever the retriever returns:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">video_rag_pipeline</span> = <span style="color:#268bd2">Pipeline</span>()  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">video_rag_pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">embedder</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"Retriever"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"Query"</span>])  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">video_rag_pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">prompt_node</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"PromptNode"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"Retriever"</span>])
</span></span></code></pre></div><p>We can run the pipeline with a query. The response will be based on what Erika said in the example video we’re using 🤗</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">result</span> = <span style="color:#268bd2">video_rag_pipeline</span>.<span style="color:#268bd2">run</span>(<span style="color:#2aa198">"Why do we do chunking?"</span>)
</span></span></code></pre></div><p>The result I got for this was the following:</p>
<pre tabindex="0"><code>Chunking is done to ensure that the language model is receiving the most   
relevant information and not going over the context window. It involves   
splitting up the text once it hits a certain token limit, depending on   
the model or the chunk size defined. This is especially useful in documents   
where subsequent sentences or sections may not make sense without the   
information from previous ones. Chunking can also help in providing extremely   
relevant information when making queries that are specific to titles or   
sections.
</code></pre><h2 id="further-improvements">Further Improvements</h2>
<p>In this example, we’ve used a transcription model that is able to transcribe audio into text, but it is unable to distinguish between speakers. A follow up step I would like to try is to use a model that allows for speaker distinction. This would allow me to ask questions and in the response from the model, get an understanding of who provided that answer in the video.</p>
<p>Another point I would like to make is that this pipeline, which was for demonstration purposes, uses a light-weight yet quite effective <strong>sentence-transformers</strong> model for retrieval, and the default setting for preprocessing. More could definitely be done to find out what the best embedding model for retrieval would be. And taking inspiration from Erika’s video, chunking and preprocessing of the transcribed documents could be evaluated and improved.</p>
<p>To discover more about the available pipelines and components that would help you build custom LLM applications, check out the 
<a href="https://docs.haystack.deepset.ai/" target="_blank" rel="noopener">Haystack documentation</a>.</p>

      </description>
    </item>
    
    <item>
      <title>Talk to Haystack Docs: Creating a Domain-Focused Q&amp;A RAG Pipeline with WebRetriever</title>
      <link>https://haystack.deepset.ai/blog/talk-to-haystack-docs/</link>
      <pubDate>Mon, 04 Sep 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/talk-to-haystack-docs/</guid>
      <description>
      
      
      

      
      

      <p>Ever since its introduction, WebRetriever has proven useful in the Haystack ecosystem. As its name implies, WebRetriever allows fetching documents from the Internet and channelling them into Haystack pipelines.</p>
<p>Under the hood, WebRetriever uses a search engine to look up relevant documents to retrieve from the web. Although users could customize the domain scope of the search queries even before (using the “site:” AND/OR syntax mixed with the actual query), it often felt like a workaround rather than a feature. Even worse – it created a barrier for users to exploit the WebRetriever capabilities fully. We needed to make it more intuitive and less “hacky”.</p>
<p>With the release of Haystack 1.20, WebRetriever can now limit searches to specific domains instead of searching the entire Internet. In this article, we’ll demonstrate how this small adjustment allows the easy creation of a specialized Q&A system that answers questions using documents only from a specific domain (Haystack docs, in this case).</p>
<p>Although we utilize Haystack docs as a use case in this article, we can easily envision many other use cases. Businesses can create RAG pipelines pointing WebRetriever to product-specific FAQs or troubleshooting guides from the product’s official website, thus effectively creating AI-powered assistants to answer customer queries. Students and researchers can scope WebRetriever to academic journals, arXiv, or PubMed to answer questions related to their field of study. Users can set the WebRetriever to collect product reviews or specifications from trusted websites when purchasing products. It takes a considerable amount of effort to exhaust all the potential use cases.</p>
<h2 id="the-power-of-webretriever">The Power of WebRetriever</h2>
<p>The WebRetriever is a Haystack component that extracts relevant documents from the web. It leverages an instance of WebSearch to obtain search result links; then it fetches content from the links and extracts the raw text content as a Document list.</p>
<p>The WebRetriever supports three distinct modes of operation:</p>
<p>Snippets Mode: In this mode, the WebRetriever generates a list of Document instances, where each Document represents a snippet or a segment from a web page result. It’s important to note that this mode does not involve actual web page retrieval.</p>
<p>Raw Documents Mode: In this mode, the WebRetriever generates a list of Document instances, where each Document represents an entire web page (retrieved from the search result link) devoid of any HTML and containing only the raw text content.</p>
<p>Preprocessed Documents Mode: This mode is similar to the Raw Documents Mode but includes an additional step – the raw text from each retrieved web page is divided into shorter Document instances using a specified PreProcessor.</p>
<p>Although knowledgeable users could use Google “site:” search syntax to manually hack a query, the new “allowed_domains” WebRetriever init parameter empowers users to limit the search to specific websites easily, making their Q&A focused and reliable.</p>
<h2 id="rag-pipeline-setup">RAG pipeline setup</h2>
<p>To better understand the building blocks of the “talk to the website” use case, let’s review each code section of the pipeline example [1]. Before assembling the RAG pipeline, we must set up the 
<a href="https://serper.dev" target="_blank" rel="noopener">https://serper.dev</a> API key and choose an LLM to use. It could be any LLM, but we’ll limit the choice to Anthropic Claude, OpenAI GPT models, and open-source models available on the free tier of Hugging Face Inference API.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">os</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">typing</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Dict</span>, <span style="color:#268bd2">Any</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">search_key</span> = <span style="color:#268bd2">os</span>.<span style="color:#268bd2">environ</span>.<span style="color:#268bd2">get</span>(<span style="color:#2aa198">"SERPERDEV_API_KEY"</span>)
</span></span><span style="display:flex;"><span><span style="color:#859900">if</span> <span style="color:#859900">not</span> <span style="color:#268bd2">search_key</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#859900">raise</span> <span style="color:#268bd2">ValueError</span>(<span style="color:#2aa198">"Please set the SERPERDEV_API_KEY environment variable"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">models_config</span>: <span style="color:#268bd2">Dict</span>[<span style="color:#cb4b16">str</span>, <span style="color:#268bd2">Any</span>] = {
</span></span><span style="display:flex;"><span>    <span style="color:#2aa198">"openai"</span>: {<span style="color:#2aa198">"api_key"</span>: <span style="color:#268bd2">os</span>.<span style="color:#268bd2">environ</span>.<span style="color:#268bd2">get</span>(<span style="color:#2aa198">"OPENAI_API_KEY"</span>), <span style="color:#2aa198">"model_name"</span>: <span style="color:#2aa198">"gpt-3.5-turbo"</span>},
</span></span><span style="display:flex;"><span>    <span style="color:#2aa198">"anthropic"</span>: {<span style="color:#2aa198">"api_key"</span>: <span style="color:#268bd2">os</span>.<span style="color:#268bd2">environ</span>.<span style="color:#268bd2">get</span>(<span style="color:#2aa198">"ANTHROPIC_API_KEY"</span>), <span style="color:#2aa198">"model_name"</span>: <span style="color:#2aa198">"claude-instant-1"</span>},
</span></span><span style="display:flex;"><span>    <span style="color:#2aa198">"hf"</span>: {<span style="color:#2aa198">"api_key"</span>: <span style="color:#268bd2">os</span>.<span style="color:#268bd2">environ</span>.<span style="color:#268bd2">get</span>(<span style="color:#2aa198">"HF_API_KEY"</span>), <span style="color:#2aa198">"model_name"</span>: <span style="color:#2aa198">"tiiuae/falcon-7b-instruct"</span>},
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Next, we define a simple prompt template to instruct LLM on how to construct the answer. Using the defined PromptTemplate, we also initialize PromptNode with additional parameters like API key, maximum answer length, streaming settings, etc.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.nodes</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">PromptNode</span>, <span style="color:#268bd2">PromptTemplate</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_text</span> = <span style="color:#2aa198">"""
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">Synthesize a comprehensive answer from the provided paragraphs and the given question.</span><span style="color:#2aa198">\n</span><span style="color:#2aa198">
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">Focus on the question and avoid unnecessary information in your answer.</span><span style="color:#2aa198">\n</span><span style="color:#2aa198">
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">\n\n</span><span style="color:#2aa198"> Paragraphs: {join(documents)} </span><span style="color:#2aa198">\n\n</span><span style="color:#2aa198"> Question: </span><span style="color:#2aa198">{query}</span><span style="color:#2aa198"> </span><span style="color:#2aa198">\n\n</span><span style="color:#2aa198"> Answer:
</span></span></span><span style="display:flex;"><span><span style="color:#2aa198">"""</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_node</span> = <span style="color:#268bd2">PromptNode</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">model</span>[<span style="color:#2aa198">"model_name"</span>],
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">default_prompt_template</span>=<span style="color:#268bd2">PromptTemplate</span>(<span style="color:#268bd2">prompt_text</span>),
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">api_key</span>=<span style="color:#268bd2">model</span>[<span style="color:#2aa198">"api_key"</span>],
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">max_length</span>=<span style="color:#2aa198;font-weight:bold">768</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">model_kwargs</span>={<span style="color:#2aa198">"stream"</span>: <span style="color:#268bd2">stream</span>},
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><p>We initialize the WebRetriever component with specific parameters like the search engine API key, allowed domains for fetching documents, and document caching. WebRetriever is now set to retrieve documents only from the domain “haystack.deepset.ai”.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.document_stores</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">InMemoryDocumentStore</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.nodes.retriever</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">WebRetriever</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">web_retriever</span> = <span style="color:#268bd2">WebRetriever</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">api_key</span>=<span style="color:#268bd2">search_key</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">allowed_domains</span>=[<span style="color:#2aa198">"haystack.deepset.ai"</span>],
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">top_search_results</span>=<span style="color:#2aa198;font-weight:bold">10</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">mode</span>=<span style="color:#2aa198">"preprocessed_documents"</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">top_k</span>=<span style="color:#2aa198;font-weight:bold">50</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">cache_document_store</span>=<span style="color:#268bd2">InMemoryDocumentStore</span>(),
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><p>Next, we create our Haystack RAG pipeline. One by one, we add our components: WebRetriever, sampler, ranker, and finally, PromptNode as an answer generation. We add the pipeline nodes in a sequence, defining the data flow through the pipeline.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.nodes.sampler</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">TopPSampler</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.nodes.ranker</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">LostInTheMiddleRanker</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span> = <span style="color:#268bd2">Pipeline</span>()
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">web_retriever</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"Retriever"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"Query"</span>])
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">TopPSampler</span>(<span style="color:#268bd2">top_p</span>=<span style="color:#2aa198;font-weight:bold">0.90</span>), <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"Sampler"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"Retriever"</span>])
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">LostInTheMiddleRanker</span>(<span style="color:#2aa198;font-weight:bold">1024</span>), <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"LostInTheMiddleRanker"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"Sampler"</span>])
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">prompt_node</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"PromptNode"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"LostInTheMiddleRanker"</span>])
</span></span></code></pre></div><p>Finally, we enter a loop that allows users to input their queries. Our example code runs these questions through the pipeline to produce answers which are then printed to the console. Super simple!</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#859900">while</span> <span style="color:#859900;font-weight:bold">True</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">user_input</span> = <span style="color:#cb4b16">input</span>(<span style="color:#2aa198">"</span><span style="color:#2aa198">\n</span><span style="color:#2aa198">Ask question (type 'exit' or 'quit' to quit): "</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#859900">if</span> <span style="color:#268bd2">user_input</span>.<span style="color:#268bd2">lower</span>() == <span style="color:#2aa198">"exit"</span> <span style="color:#859900">or</span> <span style="color:#268bd2">user_input</span>.<span style="color:#268bd2">lower</span>() == <span style="color:#2aa198">"quit"</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#859900">break</span>
</span></span><span style="display:flex;"><span>    <span style="color:#859900">if</span> <span style="color:#268bd2">stream</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#cb4b16">print</span>(<span style="color:#2aa198">"Answer:"</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">response</span> = <span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">run</span>(<span style="color:#268bd2">query</span>=<span style="color:#268bd2">user_input</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#859900">if</span> <span style="color:#859900">not</span> <span style="color:#268bd2">stream</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#cb4b16">print</span>(<span style="color:#2aa198">f</span><span style="color:#2aa198">"Answer: </span><span style="color:#2aa198">{</span><span style="color:#268bd2">response</span>[<span style="color:#2aa198">'results'</span>][<span style="color:#2aa198;font-weight:bold">0</span>]<span style="color:#2aa198">}</span><span style="color:#2aa198">"</span>)
</span></span></code></pre></div><h2 id="conclusion">Conclusion</h2>
<p>Haystack’s WebRetriever enables users to quickly and effortlessly set up a highly-specialized Q&A system drawing content directly from the specified domains. Whether you are setting up an AI-powered Q&A assistant for your website, or if you are an engineer sifting through a list of arXiv publications, Haystack’s modular architecture lets you quickly set up a RAG pipeline that fits your particular use case.</p>
<p>We’ve only scratched the surface with this simple example, but we hope it encourages you to further explore Haystack RAG pipelines and components, allowing you to customize Haystack for your particular scenario. We look forward to seeing how the community utilizes WebRetriever and other Haystack components to address today’s complex information retrieval challenges.</p>
<h3 id="references">References</h3>
<p>[1] 
<a href="https://github.com/deepset-ai/haystack/blob/main/examples/talk_to_website.py" target="_blank" rel="noopener">https://github.com/deepset-ai/haystack/blob/main/examples/talk_to_website.py</a></p>

      </description>
    </item>
    
    <item>
      <title>Generative Documentation Q&amp;A with Weaviate and Haystack</title>
      <link>https://haystack.deepset.ai/blog/generative-documentation-qa-with-weaviate-and-haystack/</link>
      <pubDate>Sat, 02 Sep 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/generative-documentation-qa-with-weaviate-and-haystack/</guid>
      <description>
      
      
      

      
      

      <p><em>You can use this</em> 
<a href="https://colab.research.google.com/drive/1nO0tBVOAgo-bayfUnIqnWLZby_7zejOz?usp=sharing" target="_blank" rel="noopener"><em>Colab</em></a> <em>for a working example of the application described in this article.</em></p>
<p>Retrieval augmented generation is the golden child of LLM applications lately. The idea behind it is simple: LLMs do not know the entire world, least of all your specific world. But, with the use of retrieval techniques, we can provide the most useful pieces of information to an LLM so that it has the context with which to reply to queries that it otherwise would not have been trained to know about or answer.</p>
<p>This technique is now being used to power many search systems. In this article, we show how to build such a system using  
<a href="https://haystack.deepset.ai/" target="_blank" rel="noopener">Haystack</a>, an open-source LLM framework, and Weaviate, a vector database. Our resulting pipeline will answer queries about Haystack, and provide references to the documentation pages containing the answer.</p>
<p>A few weeks ago, a colleague and I built a custom component for Haystack: the  
<a href="https://haystack.deepset.ai/integrations/readmedocs-fetcher" target="_blank" rel="noopener"><strong><code>ReadmeDocsFetcher</code></strong></a><em>.</em>  Haystack’s design is centered around small units called components. The idea behind the framework is to provide simple building blocks that allow you to create your own custom components beyond the ones provided inherently through the Haystack project. The Haystack documentation is hosted through ReadMe, so we designed this node to fetch requested documentation pages from ReadMe and process them in a way that can be used in a full LLM pipeline.</p>
<h2 id="the-indexing-pipeline">The Indexing Pipeline</h2>
<p>Now we can get started building our pipeline. First, we create an  
<a href="https://docs.haystack.deepset.ai/docs/pipelines#indexing-pipelines" target="_blank" rel="noopener">indexing pipeline</a>  to write all the requested documentation pages on  
<a href="https://haystack.deepseet.ai/" target="_blank" rel="noopener">https://docs.haystack.deepset.ai</a>  to our Weaviate database. The nice thing about building indexing pipelines is that they can be reused. If there are any new pages, we can push them through our indexing pipeline to ensure that the dabase that we we use for our RAG pipelines is always up to date.</p>
<p>For this indexing pipeline, we make use of the custom built  <strong><code>ReadmeDocsFetcher</code></strong>. Eventually we’ll want to do embedding retrieval so that we can get the most relevant documents from the database. So for this demo, we’re using a  <strong><code>sentence-transformers</code></strong> model to create the vector representations of my documents.</p>
<p>Weaviate has a convenient feature called  
<a href="https://weaviate.io/blog/embedded-local-weaviate" target="_blank" rel="noopener">Weaviate Embedded</a>  that we can use here. It allows us to run a Weaviate database within Colab:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">weaviate</span>  
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">weaviate.embedded</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">EmbeddedOptions</span>  
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.document_stores</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">WeaviateDocumentStore</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">client</span> = <span style="color:#268bd2">weaviate</span>.<span style="color:#268bd2">Client</span>(  
</span></span><span style="display:flex;"><span>  <span style="color:#268bd2">embedded_options</span>=<span style="color:#268bd2">weaviate</span>.<span style="color:#268bd2">embedded</span>.<span style="color:#268bd2">EmbeddedOptions</span>()  
</span></span><span style="display:flex;"><span>)  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">document_store</span> = <span style="color:#268bd2">WeaviateDocumentStore</span>(<span style="color:#268bd2">port</span>=<span style="color:#2aa198;font-weight:bold">6666</span>)
</span></span></code></pre></div><p>Once we have that, we initialize all of the components we’ll need for the final indexing pipeline:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">readmedocs_fetcher_haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">ReadmeDocsFetcher</span>  
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.nodes</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">EmbeddingRetriever</span>, <span style="color:#268bd2">MarkdownConverter</span>, <span style="color:#268bd2">PreProcessor</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">converter</span> = <span style="color:#268bd2">MarkdownConverter</span>(<span style="color:#268bd2">remove_code_snippets</span>=<span style="color:#859900;font-weight:bold">False</span>)  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">readme_fetcher</span> = <span style="color:#268bd2">ReadmeDocsFetcher</span>(<span style="color:#268bd2">api_key</span>=<span style="color:#268bd2">readme_api_key</span>,   
</span></span><span style="display:flex;"><span>                                   <span style="color:#268bd2">markdown_converter</span>=<span style="color:#268bd2">converter</span>,   
</span></span><span style="display:flex;"><span>                                   <span style="color:#268bd2">base_url</span>=<span style="color:#2aa198">"https://docs.haystack.deepset.ai"</span>)  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">embedder</span> = <span style="color:#268bd2">EmbeddingRetriever</span>(<span style="color:#268bd2">document_store</span>=<span style="color:#268bd2">document_store</span>,   
</span></span><span style="display:flex;"><span>                              <span style="color:#268bd2">embedding_model</span>=<span style="color:#2aa198">"sentence-transformers/multi-qa-mpnet-base-dot-v1"</span>)  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">preprocessor</span> = <span style="color:#268bd2">PreProcessor</span>()
</span></span></code></pre></div><p>And then we simply build and run the pipeline. It will preprocess and create embeddings for all the documentation pages under  
<a href="https://docs.haystack.deepset.ai/" target="_blank" rel="noopener">https://docs.haystack.deepset.ai</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span> = <span style="color:#268bd2">Pipeline</span>()  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">readme_fetcher</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"ReadmeFetcher"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"File"</span>])  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">preprocessor</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"Preprocessor"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"ReadmeFetcher"</span>])  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">embedder</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"Embedder"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"Preprocessor"</span>])  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span>=<span style="color:#268bd2">document_store</span>, <span style="color:#268bd2">name</span>=<span style="color:#2aa198">"DocumentStore"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"Embedder"</span>])  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">indexing_pipeline</span>.<span style="color:#268bd2">run</span>()
</span></span></code></pre></div><h2 id="the-retrieval-augmented-generative-rag-pipeline">The Retrieval Augmented Generative (RAG) Pipeline</h2>
<p>Before we jump into the RAG pipeline itself, I’d like to talk about two key building blocks of the pipeline in isolation: the prompt and the LLM of choice.</p>
<p>As mentioned above, my aim here is to build a pipeline that can reference the documentation pages a reply comes from. Specifically, I want to get a URL for me to click and read in more detail. Whether a RAG pipeline can achieve this depends heavily on the instruction the LLM is provided. It also depends on whether the LLM itself is designed to consume such an instruction.</p>
<p>Here, we can say I ‘splashed out’. While you can use open-source LLMs with Haystack (from Hugging Face, hosted on SageMaker, locally deployed, the choice is really yours) I went with GPT-4. One of the main reasons for my choice is simply because, from experience, GPT-4 has been the most performant with the type of prompt (instruction) I intended to use for this application. That being said, please let me know if you have different observations 🙏</p>
<h3 id="the-prompt">The Prompt</h3>
<p>Here is the prompt we use for this demo. It asks for each retrieved document to be followed by the URL that it comes from. The URL of each document is present in the metadata of the documents that we wrote to the  <code>WeaviateDocumentStore</code>  👇</p>
<pre tabindex="0"><code>You will be provided some conetent from technical documentation,   
where each paragraph is followed by the URL that it appears in.   
Answer the query based on the provided Documentation Content. Your answer   
should reference the URLs that it was generated from.   
Documentation Content: {join(documents,   
                             delimiter=new_line,   
                             pattern='---'+new_line+'$content'+new_line+'URL: $url',   
                             str_replace={new_line: ' ', '[': '(', ']': ')'})}  
Query: {query}  
Answer:
</code></pre><p>Notice how we’re structuring the prompt so that documents (which will be provided by the retriever when we add this to the pipeline) are separated from each other, and the content is always followed by the URL it comes from. We can do this because each document that we wrote into our database has the  <code>url</code>  in its  <code>metadata</code>.</p>
<p>We use the prompt above to create a  <code>PromptTemplate</code>  called the  <code>**answer_with_references_prompt**</code>.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.nodes</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">PromptTemplate</span>, <span style="color:#268bd2">AnswerParser</span>  
</span></span><span style="display:flex;"><span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">answer_with_references_prompt</span> = <span style="color:#268bd2">PromptTemplate</span>(<span style="color:#268bd2">prompt</span> = <span style="color:#2aa198">"""You will be provided some conetent from technical documentation, where each paragraph is followed by the URL that it appears in. Answer the query based on the provided Documentation Content. Your answer should reference the URLs that it was generated from. Documentation Content: {join(documents, delimiter=new_line, pattern='---'+new_line+'$content'+new_line+'URL: $url', str_replace={new_line: ' ', '[': '(', ']': ')'})}</span><span style="color:#2aa198">\n</span><span style="color:#2aa198">Query: </span><span style="color:#2aa198">{query}</span><span style="color:#2aa198">\n</span><span style="color:#2aa198">Answer:"""</span>, <span style="color:#268bd2">output_parser</span>=<span style="color:#268bd2">AnswerParser</span>())
</span></span></code></pre></div><p>You can explore other example prompts we’ve made use of, including a similar one for referencing on the  
<a href="https://prompthub.deepset.ai/?prompt=deepset%2Fquestion-answering-with-references" target="_blank" rel="noopener">PromptHub</a>.</p>
<h3 id="the-pipeline">The Pipeline</h3>
<p>Now to simply piece this all together. First, we define a  <code>PromptNode</code>, which is the interface for interacting with an LLM.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack.nodes</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">PromptNode</span>  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">prompt_node</span> = <span style="color:#268bd2">PromptNode</span>(<span style="color:#268bd2">model_name_or_path</span>=<span style="color:#2aa198">"gpt-4"</span>,   
</span></span><span style="display:flex;"><span>                        <span style="color:#268bd2">api_key</span>=<span style="color:#2aa198">'YOUR_OPENAI_KEY'</span>,   
</span></span><span style="display:flex;"><span>                        <span style="color:#268bd2">default_prompt_template</span>=<span style="color:#268bd2">answer_with_references_prompt</span>,   
</span></span><span style="display:flex;"><span>                        <span style="color:#268bd2">max_length</span>=<span style="color:#2aa198;font-weight:bold">500</span>)
</span></span></code></pre></div><p>Finally we create our pipeline. It has two components. First, we reuse the same  <code>EmbeddingRetriever</code>  from earlier to retrieve relevant documents from the database. Second, we use the  <code>PromptNode</code>  to generate an answer based on these retrieved documents.</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span> = <span style="color:#268bd2">Pipeline</span>()  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span> = <span style="color:#268bd2">embedder</span>, <span style="color:#268bd2">name</span> = <span style="color:#2aa198">"Retriever"</span>, <span style="color:#268bd2">inputs</span> = [<span style="color:#2aa198">"Query"</span>])  
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">add_node</span>(<span style="color:#268bd2">component</span> = <span style="color:#268bd2">prompt_node</span>, <span style="color:#268bd2">name</span> = <span style="color:#2aa198">"GPT-4"</span>, <span style="color:#268bd2">inputs</span>=[<span style="color:#2aa198">"Retriever"</span>])
</span></span></code></pre></div><p>When we run it, this final pipeline will: retrieve some relevant documents, add them into the prompt we created, and then send the resulting complete prompt to GPT-4 for an answer.</p>
<p>For example:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">run</span>(<span style="color:#2aa198">"What are the optional installations of Haystack?"</span>, <span style="color:#268bd2">params</span> = {<span style="color:#2aa198">"Retriever"</span>: {<span style="color:#2aa198">"top_k"</span>: <span style="color:#2aa198;font-weight:bold">5</span>}})
</span></span></code></pre></div><p>Resulted in the following complete prompt (abridged):</p>
<pre tabindex="0"><code>You will be provided some conetent from technical documentation, where each paragraph is followed   
by the URL that it appears in. Answer the query based on the provided Documentation Content.   
Your answer should reference the URLs that it was generated from.   
Documentation Content:   
---  
", "1-0": "all-gpu", "1-1": "Install Haystack and all optional dependencies, including developer   
tools with GPU support.", "2-0": "aws", "2-1": "Install Haystack and AWS SageMaker PromptNode support",   
"3-0": "colab", "3-1": "Install Haystack and all dependencies needed to run Haystack in Google Colab.",   
"4-0": "crawler", "4-1": "Install Haystack and all web crawling tools.",  
URL: https://docs.haystack.deepset.ai/docs/installation  
---  
[...]  
URL: https://docs.haystack.deepset.ai/docs/extras  
---  
[...]  
Query: What are the optional installations of Haystack?  
Answer:
</code></pre><p>When we ran this, the reply we got was the following:</p>
<pre tabindex="0"><code>Haystack offers several optional installations to support different use cases. These include:  
1. "all-gpu" to install Haystack with all optional dependencies, including developer tools with GPU support.  
2. "aws" for AWS SageMaker PromptNode support.  
3. "colab" to install all dependencies needed to run Haystack in Google Colab.  
4. "crawler" for web crawling tools.  
5. "dev" for all development tools needed by contributors.  
6. "ocr" for all OCR tools.  
7. "docstores" for all DocumentStores.  
8. "docstores-gpu" for all DocumentStores with GPU support.  
9. "faiss" for FAISS support for the FAISSDocumentStore.  
10. "faiss-gpu" for FAISS support for the FAISSDocumentStore with GPU.  
11. "inference" for all dependencies for model inference on local machine, including torch.  
12. "opensearch" for OpenSearch support for the OpenSearchDocumentStore.  
13. "elasticsearch", "elasticsearch7", and "elasticsearch8" for Elasticsearch support for the ElasticsearchDocumentStore.  
14. "pdf" for PyMuPDF for PDFToTextConverter.  
15. "pinecone" for Pinecone support for the PineconeDocumentStore.  
16. "preprocessing" for the basic preprocessing tools, such as langdetect for language identification and nltk for precise document splitting.  
17. "file-conversion" for all the dependencies for file conversion and parsing, like python-docx, tika, markdown.  
18. "ray" for Ray support for the RayPipeline.  
19. "weaviate" for Weaviate support for the WeaviateDocumentStore.  
  
Additionally, there are other components like "AnswerToSpeech" and "DocumentToSpeech" which are not installed with Haystack core but can be installed separately.  
  
For the latest release of Haystack and all its dependencies, you can use the command 'pip install farm-haystack(all)' or 'pip install farm-haystack(all-gpu)' for GPU-enabled dependencies.  
  
URLs:   
- https://docs.haystack.deepset.ai/docs/installation  
- https://docs.haystack.deepset.ai/docs/extras
</code></pre><h2 id="further-improvements">Further Improvements</h2>
<p>So far, we have only used a single retrieval technique. This could be considerably improved with a hybrid retrieval approach, which you can also implement with Weaviate and Haystack. This, in my opinion, would provide a healthier system that is intended for documentation search. While I am able to ask fully formed questions in this setup, I may want to provide a way for a user to simply search for ‘EmbeddingRetrievers’ for example.</p>
<p>In this article, we’ve seen how to build a simple RAG setup that uses a clever prompt to get replies with references to documentation. To discover more about the available pipelines and components that would help you build custom LLM applications, check out the  
<a href="https://docs.haystack.deepset.ai/" target="_blank" rel="noopener">Haystack documentation</a>.</p>

      </description>
    </item>
    
    <item>
      <title>Enhancing RAG Pipelines in Haystack: Introducing DiversityRanker and LostInTheMiddleRanker</title>
      <link>https://haystack.deepset.ai/blog/enhancing-rag-pipelines-in-haystack/</link>
      <pubDate>Tue, 29 Aug 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/enhancing-rag-pipelines-in-haystack/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;The recent improvements in Natural Language Processing (NLP) and Long-Form Question Answering (LFQA) would have, just a few years ago, sounded like something from the domain of science fiction. Who could have thought that nowadays we would have systems that can answer complex questions with the precision of an expert, all while synthesizing these answers on the fly from a vast pool of sources? LFQA is a type of Retrieval-Augmented Generation (RAG) which has recently made significant strides, utilizing the best retrieval and generation capabilities of Large Language Models (LLMs).&lt;/p&gt;
&lt;p&gt;But what if we could refine this setup even further? What if we could optimize how RAG selects and utilizes information to enhance its performance? This article introduces two innovative components aiming to improve RAG with concrete examples drawn from LFQA, based on the latest research and our experience — the DiversityRanker and the LostInTheMiddleRanker.&lt;/p&gt;
&lt;p&gt;Consider the LLM’s context window as a gourmet meal, where each paragraph is a unique, flavorful ingredient. Just as a culinary masterpiece requires diverse, high-quality ingredients, LFQA question-answering demands a context window filled with high-quality, varied, relevant, and non-repetitive paragraphs.&lt;/p&gt;
&lt;p&gt;In the intricate world of LFQA and RAG, making the most of the LLM’s context window is paramount. Any wasted space or repetitive content limits the depth and breadth of the answers we can extract and generate. It’s a delicate balancing act to lay out the content of the context window appropriately. This article presents new approaches to mastering this balancing act, which will enhance RAG’s capacity for delivering precise, comprehensive responses.&lt;/p&gt;
&lt;p&gt;Let’s explore these exciting advancements and how they improve LFQA and RAG.&lt;/p&gt;
&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;
&lt;p&gt;Haystack is an open-source framework providing end-to-end solutions for practical NLP builders. It supports a wide range of use cases, from question-answering and semantic document search all the way to LLM agents. Its modular design allows the integration of state-of-the-art NLP models, document stores, and various other components required in today’s NLP toolbox.&lt;/p&gt;
&lt;p&gt;One of the key concepts in Haystack is the idea of a pipeline. A pipeline represents a sequence of processing steps that a specific component executes. These components can perform various types of text processing, allowing users to easily create powerful and customizable systems by defining how data flows through the pipeline and the order of nodes that perform their processing steps.&lt;/p&gt;
&lt;p&gt;The pipeline plays a crucial role in web-based long-form question answering. It starts with a WebRetriever component, which searches and retrieves query-relevant documents from the web, automatically stripping HTML content into raw text. But once we fetch query-relevant documents, how do we make the most of them? How do we fill the LLM’s context window to maximize the quality of the answers? And what if these documents, although highly relevant, are repetitive and numerous, sometimes overflowing the LLM context window?&lt;/p&gt;
&lt;p&gt;This is where the components we’ll introduce today come into play — the DiversityRanker and the LostInTheMiddleRanker. Their aim is to address these challenges and improve the answers generated by the LFQA/RAG pipelines.&lt;/p&gt;
&lt;p&gt;The DiversityRanker enhances the diversity of the paragraphs selected for the context window. LostInTheMiddleRanker, usually positioned after DiversityRanker in the pipeline, helps to mitigate the LLM performance degradation observed when models must access relevant information in the middle of a long context window. The following sections will delve deeper into these two components and demonstrate their effectiveness in a practical use case.&lt;/p&gt;
&lt;h2 id=&#34;diversityranker&#34;&gt;DiversityRanker&lt;/h2&gt;
&lt;p&gt;The DiversityRanker is a novel component designed to enhance the diversity of the paragraphs selected for the context window in the RAG pipeline. It operates on the principle that a diverse set of documents can increase the LLM’s ability to generate answers with more breadth and depth.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/enhancing-rag-pipelines-in-haystack/thumbnail.png alt=&#34;An artistic interpretation of the DiversityRanker algorithm’s document ordering process, courtesy of MidJourney&#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;An artistic interpretation of the DiversityRanker algorithm’s document ordering process, courtesy of MidJourney. Please note that this visualization is more illustrative than precise.&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;The DiversityRanker uses sentence transformers to calculate the similarity between documents. The sentence transformers library offers powerful embedding models for creating meaningful representations of sentences, paragraphs, and even whole documents. These representations, or embeddings, capture the semantic content of the text, allowing us to measure how similar two pieces of text are.&lt;/p&gt;
&lt;p&gt;DiversityRanker processes the documents using the following algorithm:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It starts by calculating the embeddings for each document and the query using a sentence-transformer model.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It then selects the document semantically closest to the query as the first selected document.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For each remaining document, it calculates the average similarity to the already selected documents.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It then selects the document that is, on average, least similar to the already selected documents.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;This selection process continues until all documents are selected, resulting in a list of documents ordered from the document contributing the most to the overall diversity to the document that contributes the least.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A technical note to keep in mind: the DiversityRanker uses a greedy local approach to select the next document in order, which might not find the most optimal overall order for the documents. DiversityRanker focuses on diversity more than relevance, so it should be placed in the pipeline after another component like TopPSampler or another similarity ranker that focuses more on relevance. By using it after a component that selects the most relevant documents, we ensure that we select diverse documents from a pool of already relevant documents.&lt;/p&gt;
&lt;h2 id=&#34;lostinthemiddleranker&#34;&gt;LostInTheMiddleRanker&lt;/h2&gt;
&lt;p&gt;The LostInTheMiddleRanker optimizes the layout of the selected documents in the LLM’s context window. This component is a way to work around a problem identified in recent research [1] that suggests LLMs struggle to focus on relevant passages in the middle of a long context. The LostInTheMiddleRanker alternates placing the best documents at the beginning and end of the context window, making it easy for the LLM’s attention mechanism to access and use them. To understand how LostInTheMiddleRanker orders the given documents, imagine a simple example where documents consist of a single digit from 1 to 10 in ascending order. LostInTheMiddleRanker will order these ten documents in the following order: [1 3 5 7 9 10 8 6 4 2].&lt;/p&gt;
&lt;p&gt;Although the authors of this research focused on a question-answering task — extracting the relevant spans of the answer from the text — we are speculating that the LLM’s attention mechanism will also have an easier time focusing on the paragraphs in the beginning and the end of the context window when generating answers.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/enhancing-rag-pipelines-in-haystack/figure_2.png alt=&#34;A graph showing that accuracy of the LLM drops when the answer is needed to be extracted from the middle of the context&#34; class=&#34;small&#34; /&gt;
        &lt;figcaption&gt;LLMs struggle to extract answers from the middle of the context, adapted from Liu et al. (2023)[1]&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;LostInTheMiddleRanker is best positioned as the last ranker in the RAG pipeline as the given documents are already selected based on similarity (relevance) and ordered by diversity.&lt;/p&gt;
&lt;h2 id=&#34;using-the-new-rankers-in-pipelines&#34;&gt;Using the new rankers in pipelines&lt;/h2&gt;
&lt;p&gt;In this section, we’ll look into the practical use case of the LFQA/RAG pipeline, focusing on how to integrate the DiversityRanker and LostInTheMiddleRanker. We’ll also discuss how these components interact with each other and the other components in the pipeline.&lt;/p&gt;
&lt;p&gt;The first component in the pipeline is a WebRetriever which retrieves query relevant documents from the web using a programmatic search engine API (SerperDev, Google, Bing etc). The retrieved documents are first stripped of HTML tags, converted to raw text, and optionally preprocessed into shorter paragraphs. They are then, in turn passed to a TopPSampler component, which selects the most relevant paragraphs based on their similarity to the query.&lt;/p&gt;
&lt;p&gt;After TopPSampler selects the set of relevant paragraphs, they are passed to the DiversityRanker. DiversityRanker, in turn, orders the paragraphs based on their diversity, reducing the repetitiveness of the TopPSampler-ordered documents.&lt;/p&gt;
&lt;p&gt;The selected documents are then passed to the LostInTheMiddleRanker. As we previously mentioned, LostInTheMiddleRanker places the most relevant paragraphs at the beginning and the end of the context window, while pushing the worst-ranked documents to the middle.&lt;/p&gt;
&lt;p&gt;Finally, the merged paragraphs are passed to a PromptNode, which conditions an LLM to answer the question based on these selected paragraphs.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/enhancing-rag-pipelines-in-haystack/figure_3.jpg alt=&#34;An illustration of a pipeline consists of WebRetriever, TopPSampler, DiversityRanker, LostInTheMiddleRanker and PromptNode&#34;  /&gt;
        &lt;figcaption&gt;LFQA/RAG pipeline — Image by author&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;The new rankers are already merged into Haystack’s main branch and will be available in the upcoming 1.20 release slated for the end of August 2023. We included a new LFQA/RAG pipeline demo in the project’s examples folder.&lt;/p&gt;
&lt;p&gt;The demo shows how DiversityRanker and LostInTheMiddleRanker can be easily integrated into a RAG pipeline to improve the quality of the generated answers.&lt;/p&gt;
&lt;h2 id=&#34;case-study&#34;&gt;Case study&lt;/h2&gt;
&lt;p&gt;To demonstrate the effectiveness of the LFQA/RAG pipelines that include the two new rankers, we’ll use a small sample of half a dozen questions requiring detailed answers. The questions include: “What were the key events and influences that led to Renaissance; how did these developments shape modern Western culture?”, “What are the primary causes of climate change on both global and local scales?”, and more. To answer these questions well, LLMs require a wide range of historical, political, scientific, and cultural sources, making them ideal for our use case.&lt;/p&gt;
&lt;p&gt;Comparing the generated answers of the RAG pipeline with two new rankers (optimized pipeline) and a pipeline without them (non-optimized) would require complex evaluation involving human expert judgment. To simplify evaluation and to evaluate the effect of the DiversityRanker primarily, we calculated the average pairwise cosine distance of the context documents injected into the LLM context instead. We limited the context window size in both pipelines to 1024 words. By running these sample Python scripts [2], we have found that the optimized pipeline has an average 20–30% increase in pairwise cosine distance [3] for the documents injected into the LLM context. This increase in the pairwise cosine distance essentially means that the documents used are more diverse (and less repetitive), thus giving the LLM a wider and richer range of paragraphs to draw upon for its answers. We’ll leave the evaluation of LostInTheMiddleRanker and its effect on generated answers for one of our upcoming articles.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;We’ve explored how Haystack users can enhance their RAG pipelines by using two innovative rankers: DiversityRanker and LostInTheMiddleRanker.&lt;/p&gt;
&lt;p&gt;DiversityRanker ensures that the LLM’s context window is filled with diverse, non-repetitive documents, providing a broader range of paragraphs for the LLM to synthesize the answer from. At the same time, the LostInTheMiddleRanker optimizes the placement of the most relevant paragraphs in the context window, making it easier for the model to access and utilize the best-supporting documents.&lt;/p&gt;
&lt;p&gt;Our small case study confirmed the effectiveness of the DiversityRanker by calculating the average pairwise cosine distance of the documents injected into the LLM’s context window in the optimized RAG pipeline (with two new rankers) and the non-optimized pipeline (no rankers used). The results showed that an optimized RAG pipeline increased the average pairwise cosine distance by approximately 20–30%.&lt;/p&gt;
&lt;p&gt;We have demonstrated how these new rankers can potentially enhance Long-Form Question-Answering and other RAG pipelines. By continuing to invest in and expand on these and similar ideas, we can further improve the capabilities of Haystack’s RAG pipelines, bringing us closer to crafting NLP solutions that seem more like magic than reality.&lt;/p&gt;
&lt;h3 id=&#34;references&#34;&gt;References&lt;/h3&gt;
&lt;p&gt;[1] “Lost in the Middle: How Language Models Use Long Contexts” at 
&lt;a href=&#34;https://arxiv.org/abs/2307.03172&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://arxiv.org/abs/2307.03172&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[2] Script: 
&lt;a href=&#34;https://gist.github.com/vblagoje/430def6cda347c0b65f5f244bc0f2ede&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://gist.github.com/vblagoje/430def6cda347c0b65f5f244bc0f2ede&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[3] Script output (answers): 
&lt;a href=&#34;https://gist.github.com/vblagoje/738253f87b7590b1c014e3d598c8300b&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://gist.github.com/vblagoje/738253f87b7590b1c014e3d598c8300b&lt;/a&gt;&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Hybrid Document Retrieval</title>
      <link>https://haystack.deepset.ai/blog/hybrid-retrieval/</link>
      <pubDate>Tue, 22 Aug 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/hybrid-retrieval/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Document retrieval is the art of extracting relevant documents from a corpus in response to an input. Like many language-related tasks today, it can benefit greatly from the dense, semantic embeddings produced by encoder models. These models have learned to embed documents in an abstract vector space that captures their content, allowing users to phrase their queries freely in natural language, rather than trying to match the exact keywords contained in a document.&lt;/p&gt;
&lt;p&gt;Still, there are many scenarios where a more basic, keyword-based approach can outperform the semantic one. For example, keyword-based methods like BM25 have been shown to perform better in an out-of-domain setting.&lt;/p&gt;
&lt;p&gt;That’s because dense encoder models need to be trained on data, and if that data is too different from the use case at hand, they’re likely to underperform. In practice, therefore, the keyword approach can yield better results for highly technical or “niche” use cases than an encoder model that hasn’t been fine-tuned.&lt;/p&gt;
&lt;p&gt;Since both approaches have their advantages and disadvantages, it is a good idea to combine them. All you need is two retrievers and a way to merge their output. We also recommend using a ranker on top of them. As you’ll see, 
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt; makes it easy to put all these components together in a modular and reusable pipeline.&lt;/p&gt;
&lt;h2 id=&#34;recap-what-is-document-retrieval-and-why-do-we-care&#34;&gt;Recap: what is document retrieval and why do we care?&lt;/h2&gt;
&lt;p&gt;In large-scale NLP systems, we often work with huge collections of documents. Now, what happens when we pass a query to such a system? It’s not really feasible to run a computationally demanding language model – like those used in generative AI or extractive question answering – on the entire corpus. Doing so would be a waste of resources and time. The solution? Retrievers.&lt;/p&gt;
&lt;p&gt;Retrievers leverage different 
&lt;a href=&#34;https://www.deepset.ai/blog/art-of-finding-document-search&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;document search techniques&lt;/a&gt; to extract the right documents from a database. In document search, each document is represented as a single vector. Thus, the retrieval module only needs to process one embedding per document. This is a highly efficient way to pre-select the right documents for subsequent processing steps.&lt;/p&gt;
&lt;p&gt;Retrievers are super useful in their own right, as they drive most search applications. But they’re most often discussed in the context of composite systems or pipelines. As a pre-selection mechanism, retrievers extract related documents in response to a query. They then pass those documents on to downstream tasks like extractive QA, generative AI (in what’s known as 
&lt;a href=&#34;https://www.deepset.ai/blog/llms-retrieval-augmentation&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;a RAG scenario&lt;/a&gt;) or summarization.&lt;/p&gt;
&lt;h2 id=&#34;what-types-of-retrievers-are-there&#34;&gt;What types of retrievers are there?&lt;/h2&gt;
&lt;p&gt;Retrievers can be roughly divided into two categories: on the one side, there’s the sparse, keyword-based approach à la Tf-Idf, on the other side, the dense, embedding-based approach that uses Transformers.&lt;/p&gt;
&lt;h3 id=&#34;characteristics-of-the-sparse-approach&#34;&gt;Characteristics of the sparse approach&lt;/h3&gt;
&lt;p&gt;Sparse retrievers produce vectors whose length equals the size of the vocabulary. Because every document in the corpus only contains a fraction of all the words in the corpus, these vectors are typically sparse: long, with many zeroes, and only few non-zero values. The 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/retriever#bm25-recommended&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;sparse retrieval algorithm most commonly used today is BM25&lt;/a&gt;, an improved version of the classic Tf-Idf.&lt;/p&gt;
&lt;p&gt;Sparse embedding techniques are by definition lexical: they can only represent and match those words that are part of the vocabulary. They don’t require any training, making them language- and domain-agnostic.&lt;/p&gt;
&lt;h3 id=&#34;characteristics-of-the-dense-approach&#34;&gt;Characteristics of the dense approach&lt;/h3&gt;
&lt;p&gt;The biggest difference to the sparse approach is that a dense retriever requires data and training. During training, the language model learns how to best embed documents as vectors from the data itself.&lt;/p&gt;
&lt;p&gt;Dense retrievers produce shorter vectors than their sparse counterparts. These condensed vectors consist mostly of scalar values, which 
&lt;a href=&#34;https://www.deepset.ai/blog/the-beginners-guide-to-text-embeddings&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;represent semantic features&lt;/a&gt; instead of lexical occurrences. The features, which the model acquires during training, are harder to interpret than the sparse ones.&lt;/p&gt;
&lt;p&gt;Because dense retriever models have learned to represent their training data, they’re likely to perform poorly on any use case that is out of that data’s domain. For example, an embedding model trained on Wikipedia articles may fail to handle tweets properly: the language used is too different. Similarly, a model trained on financial data will not be able to do a good job with medical reports.&lt;/p&gt;
&lt;p&gt;While it is always possible to fine-tune an existing model, this requires data, resources, and the know-how.&lt;/p&gt;
&lt;h3 id=&#34;combining-dense-and-sparse-retrievers&#34;&gt;Combining dense and sparse retrievers&lt;/h3&gt;
&lt;p&gt;To address the weaknesses of both retrievers (or, more positively, to take advantage of both), we can simply use two retrievers in our pipeline and merge their output. Such a hybrid retrieval pipeline is easy to set up in a modular framework like Haystack.&lt;/p&gt;
&lt;h2 id=&#34;a-hybrid-retrieval-pipeline-in-haystack&#34;&gt;A hybrid retrieval pipeline in Haystack&lt;/h2&gt;
&lt;p&gt;Haystack uses modular pipelines and nodes to ensure straightforward customization. A basic retriever pipeline consists of an input (the query), a retriever node, optional additional nodes, and an output.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/hybrid-retrieval/one-retriever.png alt=&#34;Sketch of a retrieval augmented pipeline with one retriever.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;But we can easily customize the pipeline to contain two retriever nodes instead of only one. We just need to remember to combine the documents retrieved by both methods in a meaningful way. To do this, we’ll first use a 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/join_documents&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;JoinDocuments node&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/hybrid-retrieval/two-retrievers.png alt=&#34;Sketch of a retrieval augmented pipeline with two retrievers and one node to join their results.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Both retrievers return ranked lists of results with relevance scores attached to each document (note that, since they employ different scoring techniques, the scores are not really meaningful in a hybrid retrieval setting). You can use different methods for joining these results lists. Which one you use – and whether you add another module on top of the JoinDocuments node – depends on your use case.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Concatenation: all documents (sans duplicates) are simply appended to the final list of results. This method is sufficient if you plan to use all the results and don’t care about their order. This could be the case, for example, in an extractive question answering pipeline. Concatenation can also be used in combination with a powerful ranking model – more on that later.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Reciprocal rank fusion (RRF): this formula reranks the documents returned by both retrievers, giving priority to those which appear in both results lists. Its purpose is to push the most relevant documents to the top of the list. It is useful if the order of your results is important, or if you only want to pass on a subset of your results to the next node.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Merging: documents are ranked according to the scores returned by the retrievers. This method is useful if you want to prioritize the results from one retriever over another – and if the relevance scores from retrievers are comparable. For example, if you want to join documents from two different dense retrievers because they return documents from different document stores. This option is not useful for hybrid retrieval.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Depending on your application, you now have the option to add an intermediate ranking step after merging the documents. This is the most sophisticated method for ordering documents returned by two different retrievers. For example, if your pipeline uses a generative LLM or a summarizer as the next node, you can re-rank your documents to ensure that the most relevant ones are at the top for better results.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/hybrid-retrieval/reranker.png alt=&#34;Sketch of a retrieval augmented pipeline with two retrievers, a node to join their results, and a ranker node.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Adding a 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/ranker&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SentenceTransformersRanker node&lt;/a&gt; produces the most relevant results, while also adding some latency. It uses a powerful cross-encoder trained to determine the relevance of a document to a given query – similar to the dense retriever’s embedding model, but with a slightly different architecture.&lt;/p&gt;
&lt;p&gt;Unlike the embedding model used for retrieval, the ranker can only feasibly process a small number of documents, making it particularly suitable as a scoring mechanism after the retriever. Not only does it put the results of both retrievers in a more meaningful order, but it also standardizes the relevance scores of the documents, making it possible to use these scores for further downstream tasks.&lt;/p&gt;
&lt;h2 id=&#34;superior-search-with-haystack&#34;&gt;Superior search with Haystack&lt;/h2&gt;
&lt;p&gt;For a hands-on experience with hybrid retrieval, check out this 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/26_hybrid_retrieval&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;tutorial from Nicola&lt;/a&gt;, one of our community members. Follow along in Colab or in your IDE to see how you can build a hybrid retrieval pipeline in just a few lines of code!&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://github.com/deepset-ai/haystack/tree/main&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt; is the framework of choice for developers looking to build powerful, yet easily customizable natural language search systems that leverage state-of-the-art language models from any source.&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://discord.com/invite/xYvH6drSmA&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Join our friendly Discord community&lt;/a&gt; for help with questions regarding Haystack and open source NLP in general, as well as interesting discussions around the latest LLMs. Let’s build something amazing! 🚀&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Shaping Haystack 2.0</title>
      <link>https://haystack.deepset.ai/blog/shaping-haystack-v2/</link>
      <pubDate>Mon, 14 Aug 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/shaping-haystack-v2/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;Since Haystack v1.15, we’ve been slowly introducing new components and features to Haystack in the background in preparation for Haystack 2.0 (or v2). After the work we’ve put into the new design of the Haystack API over the last few months, we’re at a point where we would love to start involving the Haystack community in our thought process and slowly gather your input and feedback. In this article, we would like to highlight where we are at for the design of the new Haystack API for v2, what we want to achieve with the new design, and what our current considerations are.&lt;/p&gt;
&lt;h2 id=&#34;-what-does-the-new-20-version-mean&#34;&gt;❓ What does the new 2.0 version mean?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Haystack v2 will be a major update to the design of Haystack nodes and pipelines.&lt;/strong&gt; We believe that the pipeline concept is a fundamental requirement and an optimal fit for building applications with LLMs. Therefore, Pipelines and Nodes will continue to be the foundation of Haystack v2. However, the general pipeline structure, Nodes API, and the connection between DocumentStore and Retrievers will change. So, this is going to be a breaking change for Haystack users.&lt;/p&gt;
&lt;h2 id=&#34;-motivation-behind-haystack-20&#34;&gt;🏆 Motivation behind Haystack 2.0&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;At deepset, we put a lot of thought and care into maintaining Haystack as a robust, user-friendly, and production-ready LLM framework.&lt;/strong&gt; As we have collected feedback from the Haystack community over the years and observed the advancements in the NLP field, such as LLMs and Agents, we see the need to update the pipeline structure with Haystack v2 to better align with our users’ needs and state-of-the-art NLP approaches.&lt;/p&gt;
&lt;p&gt;When ready, Haystack v2 will introduce many improvements, flexibility and, most importantly, it will allow Haystack users to implement customizations and extensions to Haystack much more easily. The new pipeline structure will allow for more flexible, robust, and powerful pipelines. As we change the pipeline structure, we’ll be adapting all components to the new structure, therefore, rewriting many of them. This update gives us the opportunity to enhance the pipeline structure to better make use of LLMs, improve our Agent and Memory implementations, better define the connection between the DocumentStore and Retriever, and so on.&lt;/p&gt;
&lt;h2 id=&#34;-current-status-of-haystack-20&#34;&gt;📍 Current status of Haystack 2.0&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Haystack v2 is still a work in progress.&lt;/strong&gt; We are defining the requirements for a more powerful and robust LLM framework with continuous feedback from the community, and we’re implementing the new Haystack API so that it’s aligned with the advances in NLP.&lt;/p&gt;
&lt;p&gt;Although still in beta, you can find what’s been implemented so far in the 
&lt;a href=&#34;https://github.com/deepset-ai/haystack/tree/main/haystack/preview&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;preview&lt;/a&gt; package of the Haystack repository. To learn how and when components will be migrated, have a look at the 
&lt;a href=&#34;https://github.com/deepset-ai/haystack/issues/5265&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Migrate Components to Pipeline v2&lt;/a&gt; roadmap item, where we keep track of issues and PRs about Haystack v2. For a detailed overview of the current state of v2, check out 
&lt;a href=&#34;https://drive.google.com/drive/folders/1DoS3NDCI_YGNFUy-QRL0XzeRT7vJRrn7&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Sara’s presentation&lt;/a&gt; about Haystack v2.&lt;/p&gt;
&lt;p&gt;Additionally, here is the complete list of proposals so far shaping the design of Haystack v2:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://github.com/deepset-ai/haystack/blob/main/proposals/text/4284-drop-basecomponent.md&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Drop BaseComponent and re-implement Pipeline&lt;/a&gt; 🟢&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://github.com/deepset-ai/haystack/blob/main/proposals/text/4370-documentstores-and-retrievers.md&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;DocumentStores and Retrievers in Haystack 2.0&lt;/a&gt; 🟢&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://github.com/deepset-ai/haystack/blob/main/proposals/text/3558-embedding_retriever.md&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;New EmbeddingRetriever for Haystack 2.0&lt;/a&gt; 🟢&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://github.com/deepset-ai/haystack/blob/main/proposals/text/5390-embedders.md&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Embedders design for Haystack 2.0&lt;/a&gt; 🟢&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://github.com/deepset-ai/haystack/pull/5540&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;LLM support in Haystack 2.0&lt;/a&gt; 🟡&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;-highlights-of-haystack-20&#34;&gt;⭐ Highlights of Haystack 2.0&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Pipeline &lt;strong&gt;Nodes&lt;/strong&gt; will be now called &lt;strong&gt;Components.&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The new pipeline structure will provide better support for LLMs.&lt;/strong&gt; The flexible connection between components will introduce new mechanisms, such as parallel branching and looping, that extend the capabilities of pipelines. Components will control the input and output of the pipeline. Thus, components with dynamic input parameters, such as those that use prompts with variables, will easily integrate into the pipeline. Overall, these refinements will not only improve the linear workflows but also ensure that pipelines seamlessly align with the nature of LLMs.&lt;/p&gt;
&lt;p&gt;Here is what a RAG pipeline might look like in Haystack v2.👇🏼&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Keep in mind that the components are still work in progress and being discussed in the “
&lt;a href=&#34;https://github.com/deepset-ai/haystack/pull/5540&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;LLM Support in Haystack 2.0&lt;/a&gt;” proposal.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/shaping-haystack-v2/rag-representation.svg alt=&#34;Representation of a RAG pipeline in Haystack v2&#34;  /&gt;
        &lt;figcaption&gt;&lt;em&gt;Representation of a RAG pipeline in Haystack v2&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The Components API will change.&lt;/strong&gt; Components will define the name and the type of all of their inputs and outputs. The new API will reduce complexity and make it easier to create custom components such as 
&lt;a href=&#34;https://haystack.deepset.ai/integrations&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack integrations&lt;/a&gt; for third-party APIs and databases. The connections between components will be validated before query time, and Haystack will generate better error messages with instructions on fixing the errors.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Retrievers will be customized for DocumentStore, not for retrieval methods.&lt;/strong&gt; Each DocumentStore will have its own Retriever, highly specialized for that specific DocumentStore, handling all its requirements without being bound to a generic interface. Integrating a new DocumentStore will be easier, and the specialized Retriever will be able to adapt more quickly to the new features of the DocumentStore.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The Embedder will be a separate component instead of being a part of a Retriever.&lt;/strong&gt; Retrievers won’t be responsible for creating embeddings, the new Embedder component will handle the creation of embeddings. The Retriever class will be simplified, and adding support for new embedding providers and approaches will be more straightforward.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Pipeline serialization will be more flexible and optimized for humans.&lt;/strong&gt; JSON, TOML, HCL will be used as serialization formats. Serialization and deserialization of pipelines sharing the same component instance will be possible.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;-whats-next&#34;&gt;➡️ What’s next?&lt;/h2&gt;
&lt;p&gt;As we iterate on Haystack 2.0, we’ll share the design proposals in 
&lt;a href=&#34;https://github.com/deepset-ai/haystack/discussions/5568&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GitHub Discussion: Shaping Haystack 2.0&lt;/a&gt; and start a conversation about topics where we need your input. As we share more information about Haystack 2.0, please feel free to share your feedback or concerns. You can always contact us using the comments section in the 
&lt;a href=&#34;https://github.com/deepset-ai/haystack/discussions/5568&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GitHub Discussion&lt;/a&gt; or the 
&lt;a href=&#34;https://discord.com/invite/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack Discord server&lt;/a&gt; to ask questions.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Question Answering in the Cockpit</title>
      <link>https://haystack.deepset.ai/blog/airbus-case-study/</link>
      <pubDate>Wed, 26 Jul 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/airbus-case-study/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Large language models (LLMs) are incredibly powerful, and we at deepset are stoked about their seemingly limitless potential. But while models like Llama 2 and GPT-4 continue to make waves, a related but less buzzworthy technology has consistently been delivering great value for a range of use cases.&lt;/p&gt;
&lt;p&gt;Based on smaller, open-source Transformer models, extractive question answering (QA) is one of the most compelling knowledge management techniques to emerge from NLP in recent years. As a result, organizations are beginning to recognize the opportunities that extractive QA can bring to large knowledge bases.&lt;/p&gt;
&lt;p&gt;One such company is Airbus. Using our open-source framework 
&lt;a href=&#34;https://haystack.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt; for applied NLP, they built a complex QA system to extract answers from their extensive manuals for pilots, which are used both in training and during flight. What makes this use case particularly interesting is that the manuals contain not only text but also a lot of tables. Thankfully, modern QA systems can handle those too, using a special technique known as table question answering (Table QA).&lt;/p&gt;
&lt;p&gt;This case study describes how 
&lt;a href=&#34;https://www.airbus.com/en/innovation/industry-4-0/artificial-intelligence&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Airbus’ artificial intelligence research unit&lt;/a&gt; investigated how NLP could help human agents find solutions in critical situations, based on their technical documentation. They used Haystack to solve the complex problem of processing both text and tables. The result, a composite text and table QA system, will make work easier and more efficient for different teams in the company.&lt;/p&gt;
&lt;h2 id=&#34;the-problem&#34;&gt;The problem&lt;/h2&gt;
&lt;p&gt;Aircraft are complex machines, and flying them requires a high degree of mental and physical fitness. Airbus’ Flight Crew Operating Manual (FCOM) serves to address all the issues a pilot might encounter during training or even while flying the airplane. Questions such as “What is the standard procedure when the cockpit windshield has been cracked by a bird?” or “Under which weather conditions is it unsafe to land?” are all answered in the manual, which spans thousands of pages.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/airbus-case-study/manuals.png alt=&#34;“Eight folders with training manuals on a table.”&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Modern commercial aircraft like the Airbus A320 are actually quite good at detecting issues through the use of sensors, and handling them in an automated fashion according to protocol. But sensors can fail, and there are problems that can’t be detected by sensors, such as a passenger being sick. In those cases, it is crucial for pilots to find the solution in the minimum amount of time.&lt;/p&gt;
&lt;p&gt;Currently, Airbus pilots are using a digitized version of the manual, which they can search using basic keyword-based functionalities. On this blog, we have written extensively about keyword searches, and how they are 
&lt;a href=&#34;https://www.deepset.ai/blog/understanding-semantic-search&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;surpassed by semantic search systems&lt;/a&gt; in many respects.&lt;/p&gt;
&lt;p&gt;A major downside of the keyword-based approach is that users need to know the exact words that are used to describe an issue. Users can sometimes struggle to find the exact combination of words that will lead them to the correct result, which can waste valuable time, especially in critical situations. This meant that Airbus’ existing information retrieval system had significant potential for improvement in terms of both speed and relevance.&lt;/p&gt;
&lt;p&gt;It should be stressed that the current system works quite well for most of the experienced pilots, who have worked with it for many years, and have gotten very good at finding the information they need. But the AI team at Airbus wanted to see if it was possible to use newer, more advanced approaches to build a more intuitive system with a flatter learning curve that would return more accurate information within a shorter time.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;In stressful situations especially, pilots need to get to the info as quickly as possible to be able to react in time. We wanted to assess whether a deep learning-based system for advanced question answering could reduce the retrieval time.&lt;/em&gt;&lt;/strong&gt; — Alexandre Arnold, AI research unit, Airbus&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Like many technical documents, the FCOM contains a lot of tables. During the project’s initial phase, the Airbus team realized that tables play such an integral part in the pilots’ information extraction process that their system wouldn’t be complete without a module that could do question answering on tables. Luckily, 
&lt;a href=&#34;https://huggingface.co/docs/transformers/model_doc/tapas&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;TaPas&lt;/a&gt; — Google’s popular Table QA model — does just that. Having worked with 
&lt;a href=&#34;https://www.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset&lt;/a&gt; before and knowing of our expertise in working with TaPas-based models, the AI team at Airbus found it easy to decide on Haystack as the framework of choice for implementing a QA system for pilots that could handle both text and tables.&lt;/p&gt;
&lt;h3 id=&#34;what-does-it-mean-to-do-question-answering-on-tables&#34;&gt;What does it mean to do question answering on tables?&lt;/h3&gt;
&lt;p&gt;Tables contain a lot of information in an extremely condensed format. They differ from text in that they come in a predictable, structured form, and can typically provide answers to many different questions. For instance, we could ask “What is the maximum crosswind for takeoff when the ground is frozen?” and a human agent could, with perhaps some initial difficulties, retrieve the answer from the following table. Then we could go on to ask questions about other actions and weather conditions, all equally answerable by looking at this table alone.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/airbus-case-study/example-table.png alt=&#34;Example of a table from the pilot manuals.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Table QA models like TaPas can do much the same. They combine the deep language model’s capacity for understanding the semantics of natural language (such a model would be able to grasp that the word “frozen” in our question refers to the condition of “ice” in the table) with the additional knowledge about how to navigate a table’s rows and columns to retrieve the right cell. That combination results in a powerful technology able to extract information from thousands of tables much more quickly than even the most experienced human could ever accomplish.&lt;/p&gt;
&lt;p&gt;Even more than that, these models can do aggregations on tables — just like a SQL query, but in natural language. For instance, we could ask “What is the maximum velocity when the flaps are in the first position?” and Table QA would return “230 kt” — the larger of the two values associated with flaps lever position 1 in the table below.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/airbus-case-study/example-table-2.png alt=&#34;Another table from the manuals.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;If you want to learn more about Table QA, be sure to check out 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/15_tableqa&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our tutorial&lt;/a&gt; and 
&lt;a href=&#34;https://www.deepset.ai/blog/question-answering-on-structured-data&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;blog post&lt;/a&gt; on the topic. But of course, Airbus’ actual task is much more complex than the use cases illustrated so far. First of all, the final system wouldn’t know whether the answer to a given query could be found in a table or in a piece of text. And second, it would have to retrieve the relevant table or text from the collection of FCOM pages itself.&lt;/p&gt;
&lt;h2 id=&#34;data-annotation-and-preparation&#34;&gt;Data annotation and preparation&lt;/h2&gt;
&lt;p&gt;For a machine learning model to really shine, it has to be adapted to the specific data that it will encounter in the real world. This is especially true for technical domains like the aerospace industry. So the first step for Airbus was to annotate some training data that could be used for fine-tuning both the general language model (for handling text) and the TaPas model (for handling tables).&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://www.deepset.ai/blog/labeling-data-with-haystack-annotation-tool&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Annotation, for QA&lt;/a&gt;, consists of coming up with questions and highlighting the text passages that answer them. For instance, we could ask “How should the crew handle illness in the cabin?” and a possible answer span could be “In case of a sick passenger, first verify if there is a medical doctor on board.” The variation in wording and sentence structure makes sure that the system learns to match questions and answers on the basis of meaning, rather than the words used.&lt;/p&gt;
&lt;p&gt;To annotate the first set of data, the team organized an internal hackathon and invited operations experts with deep knowledge of the documentation — some of whom had even helped write the manual. These experts were invited to come up with questions that are representative of what a pilot would ask in a real-world scenario — like the one in the previous paragraph — and identify the answers in the manual.&lt;/p&gt;
&lt;p&gt;For the annotation of the tabular data, on the other hand, the team made use of the fact that tables follow a much more structured format than natural language. They came up with a set of 240 question templates, which they filled with different values, matching each question to its answer in a semi-automated fashion. For instance, given the tables above, one could come up with questions like “What is the maximum speed of the aircraft?” or “What is the maximum crosswind for landing?”&lt;/p&gt;
&lt;p&gt;Finally, one of the biggest sources of confusion in FCOMs is the fact that different problems and solutions can be described in very similar terms. Context is key in that case, and is often provided by headers or captions. For instance, different sections could talk about “engine failure,” but which specific engine was concerned would be clarified by the chapter or section header.&lt;/p&gt;
&lt;p&gt;Using the traditional, keyword-based approach, an experienced pilot could quickly determine which engine a text is talking about by considering additional information found on the page. However, since the language model can only parse text passages of about 200 words at a time, that additional information would have to be included elsewhere. Thankfully, 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/nodes_overview&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack’s nodes&lt;/a&gt; — the modular building blocks which make up the final NLP system — make it easy to include and retrieve such information in the form of metadata that is added to a textual or tabular data point during preprocessing.&lt;/p&gt;
&lt;h2 id=&#34;the-solution&#34;&gt;The solution&lt;/h2&gt;
&lt;p&gt;The final system uses a 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/ready_made_pipelines#extractiveqapipeline&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retriever-reader pipeline&lt;/a&gt; — a classic architecture for question answering. Because the computationally expensive question answering model itself shouldn’t be run on the entire corpus, such a pipeline uses a 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/retriever&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retriever&lt;/a&gt;. It preselects the documents that to be processed by the reader, which contains the QA model. In Airbus’ case, however, the team actually used two separate retriever-reader pipelines (one for text and one for tables), which consisted of a single entry point for the query and then were joined at the end.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/airbus-case-study/pipeline.png alt=&#34;Sketch of the question answering pipeline with table and text retrievers.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;This complex setup is needed because, when a pilot asks a question, the system doesn’t know whether the answer to the query will be found in a table or in text. It therefore runs the query through both the Table QA pipeline and the “classic” text QA pipeline. The results are then joined in the final node, which picks the correct answer based on the confidence values attached to the results.&lt;/p&gt;
&lt;h2 id=&#34;results-and-outlook&#34;&gt;Results and outlook&lt;/h2&gt;
&lt;p&gt;The team at Airbus is excited about the results, especially about the performance of the table section of the pipeline: “These documents are so long and have so much information encoded in tables,” Alexandre explains. “Being able to pinpoint the right cell in a table within more than a thousand pages, and on that basis, provide the right answer in less than one second: that is a big value.”&lt;/p&gt;
&lt;p&gt;While the system in its current form may still be too experimental for critical “life and death” situations, it can already be used for many other applications, such as in engineering and pilot training.&lt;/p&gt;
&lt;p&gt;Alexandre and his team had started the project to find out whether NLP technologies could be used in highly technical domains with strict accuracy requirements — or whether the quest for “browsing documentation in a more natural way,” as Alexandre puts it, was still out of reach. After concluding this phase of the project, Alexandre is confident:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;The results are promising. While we still need to work on the robustness of such systems and the operational performance before any imaginable deployment, we now see the value and the potential of the technology. We want to make sure that we keep up to speed with the space of NLP, so that our organization can leverage it from day one once it’s really perfectly useful for operations. This is really a gold mine!&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;take-your-information-retrieval-system-to-new-heights&#34;&gt;Take your information retrieval system to new heights&lt;/h2&gt;
&lt;p&gt;As an extremely active field of research, NLP is coming up with new, ingenious solutions for real-world applications all the time. You can use natural language to retrieve documents, tables, 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/retriever#multimodal-retrieval&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;and even images&lt;/a&gt;: there are fewer and fewer limits to what kind of information these systems can process.&lt;/p&gt;
&lt;p&gt;If you want to see the results of such a system in action, 
&lt;a href=&#34;https://haystack.deepset.ai/overview/quick-start&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;get started with Haystack&lt;/a&gt; today.&lt;/p&gt;
&lt;p&gt;We’re always happy to see new members join our community of applied-NLP enthusiasts. Head over to our 
&lt;a href=&#34;https://haystack.deepset.ai/community&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;community page&lt;/a&gt; to join the deepset Discord server or 
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;find us on GitHub&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Good Listener: How Memory Enables Conversational Agents</title>
      <link>https://haystack.deepset.ai/blog/memory-conversational-agents/</link>
      <pubDate>Fri, 07 Jul 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/memory-conversational-agents/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Memory is what turns a powerful LLM into an empathetic interlocutor who can remember what you’ve said before. But how does it work?&lt;/p&gt;
&lt;p&gt;Today we want to lift the hood on how memory is implemented in Haystack. We’ll explain the differences between memory injection and memory as a tool and show you how to get around the context window’s length limitation by summarizing.&lt;/p&gt;
&lt;p&gt;If you want to understand how memory works computationally or start building your own conversational AI interface with Haystack, this article is for you.&lt;/p&gt;
&lt;h2 id=&#34;why-memory-matters&#34;&gt;Why memory matters&lt;/h2&gt;
&lt;p&gt;For an exchange of words to feel like a conversation, both parties require some kind of memory: they need to remember what’s been said before, and they need to be able to refer back to salient entities and concepts by using, for example, pronouns.&lt;/p&gt;
&lt;p&gt;Consider the following 
&lt;a href=&#34;https://en.wikiquote.org/wiki/Booksmart&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dialogue from the movie “Booksmart:”&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Amy:&lt;/strong&gt; Name one person whose life was so much better because they broke a couple of rules.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Molly:&lt;/strong&gt; Picasso.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Amy:&lt;/strong&gt; He broke art rules. Name a person who broke a real rule.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Molly:&lt;/strong&gt; Rosa Parks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Amy:&lt;/strong&gt; Name another one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Molly:&lt;/strong&gt; Susan B. Anthony.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Amy:&lt;/strong&gt; God dammit.&lt;/p&gt;
&lt;p&gt;This conversation only works because Amy and Molly know what was said previously. It’s clear from the context that the pronoun “he” refers back to Picasso, and when Amy says, “Name another one,” she doesn’t have to spell out that she means “another person who broke the rules.”&lt;/p&gt;
&lt;h2 id=&#34;large-language-models-and-memory&#34;&gt;Large language models and memory&lt;/h2&gt;
&lt;p&gt;By default, LLMs don’t have an inbuilt notion of memory. As far as an out-of-the-box LLM is concerned, every prompt it receives is the beginning of an entirely new interaction.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/memory-conversational-agents/no-memory.png alt=&#34;Exchange between human and AI. The human asks: “Who was Picasso? The AI answers “A Spanish artist.” The human asks: “When was he born?” The AI answers “Idk what you’re talking about”&#34;  /&gt;
        &lt;figcaption&gt;Agent without Memory&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;For a conversation to feel natural, an advanced chatbot needs to have some context beyond the current question-answer pair. This property is known as either memory or history.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/memory-conversational-agents/yes-memory.png alt=&#34;Exchange between human and AI. The human asks: “Who was Picasso? The AI answers “A Spanish artist.” The human asks: “When was he born?” The AI answers “October 25, 1881”&#34;  /&gt;
        &lt;figcaption&gt;Agent with Memory&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;To make conversational agents aware of previous exchanges, different implementations of memory exist.&lt;/p&gt;
&lt;h2 id=&#34;memory-injection-for-agents&#34;&gt;Memory injection for Agents&lt;/h2&gt;
&lt;p&gt;A 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/agent#conversational-agent&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Conversational Agent in Haystack&lt;/a&gt; is a basic chatbot that can generate answers to user queries through an LLM. Memory is passed on to the prompt along with every new query.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/memory-conversational-agents/memory-convag.png alt=&#34;A graph of a conversational agent. Lines from query and memory lead into the prompt. Arrows lead from the query and the agent’s output into memory.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;As the diagram shows, the memory stores each answer by the Agent along with the user’s query. In this implementation, memory is simply a list. Every new exchange between human and AI is appended to the list so it can be injected back into the next prompt.&lt;/p&gt;
&lt;p&gt;To illustrate this, let’s look at a dialogue with multiple exchanges. I used OpenAI’s gpt-3.5-turbo for these examples, the same model that powers ChatGPT. (For the full tutorial on how to set up a conversational agent with Haystack, have a look at 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/24_building_chat_app&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our tutorial&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;First, we import the PromptNode and initialize it with the model’s name, the 
&lt;a href=&#34;https://platform.openai.com/account/api-keys&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;API key from OpenAI&lt;/a&gt;, and the maximum length for the node’s output:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptNode&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;model_name&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;gpt-3.5-turbo&#39;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_node&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptNode&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model_name&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;MY_API_KEY&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;max_length&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;256&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can now set up our conversational agent (aka chatbot) by passing the prompt node to it. By default, this initializes a memory, which is injected back into the user’s prompt with each new call.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.agents.conversational&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ConversationalAgent&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conversational_agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ConversationalAgent&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_node&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_node&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let’s now ask the model a few questions to accumulate some memory:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conversational_agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Who was Pablo Picasso?&#34;&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pablo&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Picasso&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;was&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;a&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;renowned&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Spanish&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;artist&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;who&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;considered&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;one&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;most&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;influential&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;artists&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;20&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;th&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;century&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;He&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;known&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;co&lt;/span&gt;-&lt;span style=&#34;color:#268bd2&#34;&gt;founding&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Cubist&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;movement&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt;  &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;contributions&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Surrealism&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;His&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;work&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;encompasses&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;various&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;styles&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;mediums&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;including&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;painting&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;sculpture&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;ceramics&lt;/span&gt;, &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;collage&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can easily verify that the memory injection works by referring to the painter with a pronoun in our next query:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conversational_agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Where did he live?&#34;&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Picasso&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;was&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;born&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Malaga&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;Spain&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;but&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;he&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;spent&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;most&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;adult&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;life&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;France&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;particularly&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Paris&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;He&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;also&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;had&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;periods&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;residency&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Barcelona&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;various&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;other&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;locations&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;throughout&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Europe&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Interesting! But now to the burning question:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conversational_agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;How can I learn to paint like him?&#34;&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;To&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;learn&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;paint&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;like&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Picasso&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;you&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;can&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;start&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;by&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;studying&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;works&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;understanding&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;artistic&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;style&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Analyze&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;use&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;color&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;form&lt;/span&gt;, &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;composition&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Experiment&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;different&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;techniques&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;mediums&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;that&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;he&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;used&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;such&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Cubism&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;or&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;collage&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Practice&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;regularly&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;don&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;t be afraid to take risks with your art.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can now take a closer look at the agent’s memory by calling:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;conversational_agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;memory&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;load&lt;/span&gt;())  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Human&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Who&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;was&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pablo&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Picasso&lt;/span&gt;?  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;AI&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Pablo&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Picasso&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;was&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;a&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;renowned&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Spanish&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;artist&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;who&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;considered&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;one&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;most&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;influential&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;artists&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;20&lt;/span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;th&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;century&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;He&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;known&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;co&lt;/span&gt;-&lt;span style=&#34;color:#268bd2&#34;&gt;founding&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Cubist&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;movement&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt;  &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;contributions&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Surrealism&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;His&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;work&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;encompasses&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;various&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;styles&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;mediums&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;including&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;painting&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;sculpture&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;ceramics&lt;/span&gt;, &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;collage&lt;/span&gt;.  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Human&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Where&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;did&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;he&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;live&lt;/span&gt;?  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;AI&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Picasso&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;was&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;born&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Malaga&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;Spain&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;but&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;he&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;spent&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;most&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;adult&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;life&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;France&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;particularly&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Paris&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;He&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;also&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;had&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;periods&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;residency&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Barcelona&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;various&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;other&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;locations&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;throughout&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Europe&lt;/span&gt;.  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Human&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;How&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;can&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;I&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;learn&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;paint&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;like&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;him&lt;/span&gt;?  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;AI&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;To&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;learn&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;paint&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;like&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Picasso&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;you&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;can&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;start&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;by&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;studying&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;works&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;understanding&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;artistic&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;style&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Analyze&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;use&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;color&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;form&lt;/span&gt;, &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;composition&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Experiment&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;different&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;techniques&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;mediums&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;that&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;he&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;used&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;such&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Cubism&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;or&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;collage&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Practice&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;regularly&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;don&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;t be afraid to take risks with your art.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Evidently, all our exchanges with the model have been stored in memory. It also becomes clear why, using this solution, you can quickly run into problems. As the conversation between human and AI goes on and the list of previous exchanges in the memory grows, the LLM’s context window may become too small. When the prompt starts to overflow the context window, the quality of the model’s output suffers.&lt;/p&gt;
&lt;p&gt;Cost is another factor: companies like OpenAI charge users on the basis of tokens, so you’ll want to keep your prompts short. As a solution, you can summarize your memory periodically.&lt;/p&gt;
&lt;h2 id=&#34;summarizing-memory&#34;&gt;Summarizing memory&lt;/h2&gt;
&lt;p&gt;By using a separate model to summarize your Agent’s memory at regular intervals, you keep your memory manageable. In Haystack, you can initialize a 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/agent#conversational-agent-memory&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;conversation summary memory&lt;/a&gt; that generates summaries of the last few exchanges between human and AI. In our example, we’ll simply use the same model for summarizing as for the conversational agent – so we initialize the memory with our prompt node again:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.agents.memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ConversationSummaryMemory&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;summary_memory&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ConversationSummaryMemory&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_node&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that this memory class uses a specific prompt template, which instructs the model to condense the conversation. We’ll see later how we can adjust those instructions ourselves. Let’s initialize the conversational agent again, this time with our summary memory:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.agents.conversational&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ConversationalAgent&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;conversational_agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ConversationalAgent&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_node&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_node&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;memory&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;summary_memory&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After running this agent with the same questions as before, we take a look at the memory again:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;conversational_agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;memory&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;load&lt;/span&gt;())  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pablo&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Picasso&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;was&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;a&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;highly&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;influential&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Spanish&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;artist&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;known&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;contributions&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;various&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;art&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;movements&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;He&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;lived&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Spain&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;France&lt;/span&gt;.  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Human&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;How&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;can&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;I&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;learn&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;paint&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;like&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;him&lt;/span&gt;?  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;AI&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;To&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;learn&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;paint&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;like&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Picasso&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;you&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;can&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;start&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;by&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;studying&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;works&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;understanding&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;artistic&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;style&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Analyze&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;his&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;use&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;color&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;form&lt;/span&gt;, &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;composition&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Experiment&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;different&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;techniques&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;mediums&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;that&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;he&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;used&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;such&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Cubism&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;or&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;collage&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Practice&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;regularly&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;don&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;t be afraid to take risks with your art.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can see that by using the summary, we’ve managed to condense our memory considerably. Interestingly, the model has decided to keep the last exchange almost in its entirety.&lt;/p&gt;
&lt;p&gt;By default, the node creates a summary after three exchanges, but you can change that by adjusting the summary_frequency parameter. You can even set that parameter to 1 to create a summary after every exchange. In the next example, we do exactly that, and we also tell the model to keep the memories extra short (ten words at most). To that end, we import the prompt template class and specify our requirements in a custom template, which we then pass on to the summary memory.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.agents.memory&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ConversationSummaryMemory&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptTemplate&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;my_template&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptTemplate&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Create a short summary (max 10 words) of the following chat transcript by shortening and summarizing the content without losing important information:&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{chat_transcript}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Condensed Transcript:&#34;&lt;/span&gt;,)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;summary_memory&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ConversationSummaryMemory&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_node&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;prompt_template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;my_template&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;summary_frequency&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;How does our memory look after three exchanges with the new prompt template?&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;conversational_agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;memory&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;load&lt;/span&gt;())  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Picasso&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Influential&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Spanish&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;painter&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;known&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Cubism&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;innovation&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;Picasso&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;lived&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Spain&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;France&lt;/span&gt;, &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;United&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;States&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;Study&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Picasso&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;s style, experiment with materials, take art classes.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We’ve been able to considerably shorten our memory, while preserving all the important information!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The example uses a proprietary model from OpenAI. Nevertheless, you can also use our conversation summary memory class with open-source models from the Hugging Face model hub. A good choice is 
&lt;a href=&#34;https://huggingface.co/philschmid/bart-large-cnn-samsum&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this BART-large model&lt;/a&gt;, which was designed specifically for summarizing conversations. Note, however, that older models can’t follow instructions like LLMs do. To make it work, you can use the following code:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;summary_node&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptNode&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;philschmid/bart-large-cnn-samsum&#39;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;max_length&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;256&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;model_kwargs&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;task_name&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text2text-generation&#34;&lt;/span&gt;})  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptTemplate&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{chat_transcript}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;summary_memory&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ConversationSummaryMemory&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;summary_node&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;prompt_template&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;template&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that while the above solutions manage to keep the memory much shorter than the accumulative version we used initially, they still result in a growing memory, because the summaries are appended. This 
&lt;a href=&#34;https://github.com/deepset-ai/haystack/issues/5091&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;open issue aims to solve this problem&lt;/a&gt; by summarizing the entire memory, thus allowing you to keep it at a stable length throughout the entire conversation.&lt;/p&gt;
&lt;h2 id=&#34;memory-for-conversational-agents-with-tools&#34;&gt;Memory for Conversational Agents with Tools&lt;/h2&gt;
&lt;p&gt;The Conversational Agent lets you set up a basic chatbot functionality. But it’s nowhere near using 
&lt;a href=&#34;https://haystack.deepset.ai/blog/introducing-haystack-agents/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;all the capabilities an Agent offers&lt;/a&gt;. Agents are so powerful because they can use an LLM’s reasoning skills for dynamic solution planning: given a set of tools, agents can 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/agent#tools&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;use those tools&lt;/a&gt; at their own discretion to arrive at the optimal final answer.&lt;/p&gt;
&lt;p&gt;Agents keep track of their thinking process – the solution strategy which leads them to their final answer – in a “transcript.” If you don’t want to pass both a memory record and a transcript to the conversational agent, you can simply implement memory as another tool.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/memory-conversational-agents/memory-convag-tools.png alt=&#34;A graph of a conversational agent with tools. A line leads from the query into the prompt, which in turn is connected to a box labeled ““Conversational Agent.” Arrows lead from the agent box to and from three diamond-shaped boxes labeled “Tool1: Memory”, “Tool 2: Web Search”, “Tool 3: …”  Another arrow leads from the agent to a “final answer”. The final answer and the query again lead back to the memory tool.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;In this setup, the Agent, upon receiving a prompt, first consults the memory tool to see whether it can provide any context for answering the user’s query. The Agent’s output is then stored in memory along with the query after every iteration, as in our examples before.&lt;/p&gt;
&lt;p&gt;Memory as a tool opens up a range of possibilities. You’re free to configure it in the way that works best for you. Perhaps you want to keep track of an enormous amount of human-AI interactions in a memory database. You could then serve your memory through a sophisticated tool like an extractive QA pipeline or a summarization pipeline.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/memory-conversational-agents/memory-tools.png alt=&#34;Two different memory tools, one depicts an extractive QA pipeline, the second one a summarization pipeline, both are connected to memory databases.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;In the extractive QA pipeline solution, your conversational agent sends a query to your memory database, and the pipeline seeks to return the most relevant snippets from the memory that help contextualize the user’s query. In the summarization pipeline solution, the memory tool returns a summary of relevant documents instead. Implementing memory as these tools again helps you avoid bumping into the LLM’s token limit.&lt;/p&gt;
&lt;h2 id=&#34;dont-forget-to-follow-us&#34;&gt;Don’t forget to follow us&lt;/h2&gt;
&lt;p&gt;As a company, deepset is invested in bringing LLMs to every application. We also like sharing the latest insights on all things NLP with our community. If that sounds good to you, 
&lt;a href=&#34;https://twitter.com/deepset_ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;follow us on Twitter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Our OSS framework Haystack comes with all the tools needed to build state-of-the-art systems with LLMs. Head over to the 
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack GitHub repo&lt;/a&gt; to learn more.&lt;/p&gt;
&lt;p&gt;To see what other people are building with Haystack, or if you need help with your own projects, 
&lt;a href=&#34;https://discord.com/invite/xYvH6drSmA&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;join our server on Discord&lt;/a&gt;. It’s a friendly community managed by our Devrel advocates, where you can meet others with an interest in NLP.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Share and Use Prompts with PromptHub</title>
      <link>https://haystack.deepset.ai/blog/share-and-use-prompt-with-prompthub/</link>
      <pubDate>Thu, 29 Jun 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/share-and-use-prompt-with-prompthub/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;With the release of Haystack 1.18, we’ve also officially rolled out a new prompt sharing and collaboration service: 
&lt;a href=&#34;https://prompthub.deepset.ai&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PromptHub&lt;/a&gt;. This new product currently (day of release: 29 June 2023) hosts a set of prompts we at deepset have created, with instructions on how to use them with the &lt;code&gt;PromptTemplate&lt;/code&gt; and &lt;code&gt;PromptNode&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Prompts that are provided in PromptHub are all maintained in their own public repository on GitHub, 
&lt;a href=&#34;https://github.com/deepset-ai/prompthub&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;deepset-ai/prompthub&lt;/code&gt;&lt;/a&gt;.  Each prompt comes with a YAML file, housing the prompt itself, and a “prompt card” of the same name, which is a markdown file explaining the intended use case of the prompt and how to use it with a Haystack &lt;code&gt;PromptNode&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In the following sections, we will walk through how you may use prompts hosted on the PromtHub, as well as your own local prompts.&lt;/p&gt;
&lt;h2 id=&#34;using-prompts-on-prompthub-via-the-api&#34;&gt;Using Prompts on PromptHub via the API&lt;/h2&gt;
&lt;p&gt;The easiest way to use prompts from the PromptHub is to simply copy over their identifier, and add this into the &lt;code&gt;prompt&lt;/code&gt; field in a &lt;code&gt;PromptTemplate&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/share-and-use-prompt-with-prompthub/prompthub-example.png alt=&#34;Example from Prompthub&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Copy the identifier of a prompt from the PromptHub. For example, let’s try &lt;code&gt;deepset/summarization&lt;/code&gt; and add this as your &lt;code&gt;prompt&lt;/code&gt; in a &lt;code&gt;PromptTemplate&lt;/code&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptTemplate&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;my_summarizer_prompt&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptTemplate&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;deepset/summarization&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Each prompt on the PromptHub will have instructions on how they are intended to be used, including instructions on what the best &lt;code&gt;output_parser&lt;/code&gt; for that prompt might be.&lt;/p&gt;
&lt;p&gt;For example, the &lt;code&gt;deepset/question-answering&lt;/code&gt; prompt is best used with an &lt;code&gt;output_parser&lt;/code&gt; as described on the PromptHub.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptTemplate&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;AnswerParser&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question_answering_template&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptTemplate&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;deepset/question-answering&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;output_shapers&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AnswerParser&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once you’ve created your &lt;code&gt;PromptTemplate&lt;/code&gt;, you can use it in a &lt;code&gt;PromptNode&lt;/code&gt; by setting it as the &lt;code&gt;default_prompt_template&lt;/code&gt; or even setting it as the &lt;code&gt;prompt_template&lt;/code&gt; in a single &lt;code&gt;prompt()&lt;/code&gt; call.&lt;/p&gt;
&lt;p&gt;Once you’ve pulled a prompt from the hub, the file will be cached on your machine. So you do not have to worry about making an API call each time.&lt;/p&gt;
&lt;h2 id=&#34;creating-and-using-local-prompts&#34;&gt;Creating and Using Local Prompts&lt;/h2&gt;
&lt;p&gt;You always have the option to create your own prompt and provided it as the &lt;code&gt;prompt&lt;/code&gt; to a &lt;code&gt;PromptTemplate&lt;/code&gt;. For example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptTemplate&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;my_prompt_template&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Answer the following query: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{query}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;But now, you can also make use of and maintain your own local collection of prompts as YAML files.&lt;/p&gt;
&lt;p&gt;Each prompt on the PromptHub is actually a YAML file containing a short description, the name of the author, a tag describing the main task it covers, as well as the prompt text itself. For example, below is the  content of the &lt;code&gt;deepset/question-answering&lt;/code&gt; prompt.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;description&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;A simple prompt to answer a question given a set of documents&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;meta&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;authors&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#2aa198&#34;&gt;deepset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;deepset/question-answering&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;tags&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- &lt;span style=&#34;color:#2aa198&#34;&gt;question-answering&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;text&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Given the context please answer the question. Context: {join(documents)};\n\nQuestion: {query}; \nAnswer:\n&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;version&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.1.1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Similarly, you can create your own files that hold prompts. For example, here’s a file called &lt;code&gt;my_yoda_prompt.yaml&lt;/code&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;description&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;A simple prompt to answer any question in the style of Yoda&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;meta&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;authors&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#2aa198&#34;&gt;deepset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;tuana/yoda-qa&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;tags&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- &lt;span style=&#34;color:#2aa198&#34;&gt;question-answering&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;text&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Answer the following query in the style of Yoda. Query: {query} \n Answer:&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;version&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.1.1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Which can be used in a &lt;code&gt;PromptTemplate&lt;/code&gt; with the following code:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptTemplate&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;yoda_prompt&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptTemplate&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;prompt&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;path/to/my_yoda_prompt.yaml&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Just in case you were curious: When asked ‘What is the meaning of life’ with this prompt, GPT-4’s result was “42, the meaning of life is. But truly, in one’s own understanding and experience, find the meaning of life, one must. Hmmmm.”&lt;/p&gt;
&lt;h2 id=&#34;contributing-to-the-prompthub&#34;&gt;Contributing to the PromptHub&lt;/h2&gt;
&lt;p&gt;If you’ve created a prompt that you would like to share with the Haystack community and the world, all you have to do is create a pull request that includes your YAML file with the prompt, and a Markdown file with your prompt card explaining how to use it 🥳. Once you’ve done this, your prompt will be available to everyone with the name you give it. A good idea is to prefix it with your name, like we’ve done with &lt;code&gt;deepset/&lt;/code&gt; prompts.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>What Is a Large Language Model (LLM)?</title>
      <link>https://haystack.deepset.ai/blog/what-is-an-llm/</link>
      <pubDate>Fri, 23 Jun 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/what-is-an-llm/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;AI is here to stay. But while applications like autonomous driving and even image generation have only touched few people’s lives so far, generative AI in the form of high-quality chatbots has taken the world by storm. The informative, creative, and at times deceptively eloquent responses by the likes of ChatGPT are made possible by a novel technology known as large language models (LLMs).&lt;/p&gt;
&lt;p&gt;In this article, we’ll talk about what LLMs are and how they’re produced, what kinds of LLMs exist, and whether it’s truly just their size that sets them apart from other language models. In the end, we’ll show you how you, too, can use LLMs in Haystack, our open source framework for NLP.&lt;/p&gt;
&lt;h2 id=&#34;what-is-a-large-language-model&#34;&gt;What is a large language model?&lt;/h2&gt;
&lt;p&gt;As with most groundbreaking technologies, LLMs didn’t come about overnight. Rather, they are part of a long-term trend in natural language processing that has been happening over the last few years. Since the introduction of 
&lt;a href=&#34;https://haystack.deepset.ai/blog/the-definitive-guide-to-bertmodels&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BERT&lt;/a&gt;, state-of-the-art language models have followed the same architectural paradigm, which saw them develop an ever-better understanding of the intricacies and nuances of natural (that is, human) language.&lt;/p&gt;
&lt;p&gt;Developers took that new-found ability for processing language to the next level by building ever larger language models, which require an incredible amount of computing power during training: the LLM was born. In addition to their size (which is calculated based on their number of trainable parameters), these models also have to ingest huge amounts of data when they’re trained.&lt;/p&gt;
&lt;p&gt;






    &lt;figure&gt;
        &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/what-is-an-llm/model-sizes.png alt=&#34;Four different-sized bubbles show the growing dimensions of the LLMs BERT, BERT-large, GPT-3, and GPT-4&#34;  /&gt;
        &lt;figcaption&gt;BERT has 110M parameters, BERT-large 336M. The largest GPT-3 model has 175B. The exact size of GPT-4, OpenAI’s latest LLM, is unknown, but it is estimated at above 1 trillion.&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;So technically, a large language model is simply, well, a language model that is large. In practice, though, the term LLM has taken on a more specific meaning: it is often used to refer to those large models that are able to generate coherent, human-like output in response to instructions by a user. In this article, therefore, we’ll concentrate on the models that satisfy this definition, and ignore other LLMs.&lt;/p&gt;
&lt;h2 id=&#34;how-are-llms-created&#34;&gt;How are LLMs created?&lt;/h2&gt;
&lt;p&gt;Like most modern machine learning models, LLMs are created by feeding a complex neural network architecture with data representing the domain the model needs to learn. Hence, computer vision models ingest image data, while language models need to see textual data during training. Have a look at our 
&lt;a href=&#34;https://haystack.deepset.ai/blog/what-is-a-language-model&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;entry-level discussion of language models&lt;/a&gt; to learn more.&lt;/p&gt;
&lt;h3 id=&#34;pre-training-an-llm&#34;&gt;Pre-training an LLM&lt;/h3&gt;
&lt;p&gt;In the first training phase, the LLM learns a representation of the data through text completion. Upon seeing a sequence of words, its training objective is to generate the most likely next word. Through this technique, it learns to emulate our own linguistic intuition. However – and here’s where the “large” part truly makes a difference – while small models arrive at a stage where they can form well-formed, grammatical responses in the language they’re trained on, LLMs go much further. These models have so many parameters that they can easily learn a representation of the world knowledge contained in the data.&lt;/p&gt;
&lt;p&gt;Let’s look at some examples to make that point clearer. Here’s the kind of intuition a small language model learns compared to that of an LLM:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/what-is-an-llm/lm-vs-llm.png alt=&#34;Three different prompts and outputs are shown that illustrate the different capabilities of LMs and LLMs. While a normal language model can only complete sentences, the large one can also answer the question “Who is Hamlet?” or translate the question “How are you?” into Turkish.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;When instructed correctly, the large model can complete text in a way that produces coherent and informative answers, translations, summaries, and much, much more. That is why prompting or prompt engineering plays such an important role when working with these models. Have a look at 
&lt;a href=&#34;https://haystack.deepset.ai/blog/beginners-guide-to-llm-prompting/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our article on LLM prompting&lt;/a&gt; to learn more.&lt;/p&gt;
&lt;p&gt;Evidently, an LLM doesn’t only know how language works. It also knows how the world works – at least the part of it that is represented in the training data. This training data consists of huge collections of texts gleaned from the internet (the Common Crawl corpus, for example) and other sources.&lt;/p&gt;
&lt;p&gt;So far, we only talked about the first step of training language models – be they small, medium-sized, or large. However, in the case of ChatGPT and other instruction-following LLMs like it, the developers included additional complex training steps to arrive at these models’ impressive conversational abilities.&lt;/p&gt;
&lt;h3 id=&#34;fine-tuning-an-llm&#34;&gt;Fine-tuning an LLM&lt;/h3&gt;
&lt;p&gt;To recap, a pre-trained LLM can complete text prompts, using not only linguistic intuition but also knowledge about the world. However, that text completion can still take many different forms. Imagine we input the prompt “Explain the moon landing to a 6 year old.” To the pre-trained model, the following two outputs are equally good completions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;“Many years ago, a group of people first landed on the moon…”&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;“Explain gravity to a 5 year old.”&lt;br&gt;
(Example 
&lt;a href=&#34;https://openai.com/research/instruction-following&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;adapted from OpenAI&lt;/a&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For us who know what a conversation looks like, the second answer sounds as though the model is trolling us. So during the fine-tuning phase, researchers need to teach the LLM the notion of a proper exchange of questions and answers. To that end, they create a dataset of human-generated answers in response to a set of prompts and fine-tune the model on that data. This step is called “supervised finetuning” (SFT).&lt;/p&gt;
&lt;p&gt;The LLM has now acquired a skeletal understanding of human language-based interactions. But as we all know, such interactions are not always appropriate, especially when you’re trying to emulate a neutral agent in the form of a chatbot. Because the model has ingested a large amount of the internet during training – complete with chat forums and other unvetted content – it is still prone to generating answers that are offensive, unqualified, or even harmful.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/what-is-an-llm/llm-evolution.png alt=&#34;The three stages of the instruction-following LLM’s evolution are personified by an increasingly tame dragon.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;To create a truly helpful chatbot, the developers of ChatGPT therefore introduced a second training step that uses “reinforcement learning” – a classic method from machine learning – to guide the model towards more optimal answers. Based on human rankings of different answers to the same prompt, they taught a secondary model to distinguish between good and bad outputs.&lt;/p&gt;
&lt;p&gt;So in the final step towards ChatGPT as we know it, this secondary model is used to teach the LLM to better match users’ expectations and provide answers that are informative, neutral, and politically correct. This last fine-tuning step is known as “reinforcement learning from human feedback” (RLHF).&lt;/p&gt;
&lt;h2 id=&#34;the-world-of-llms-size-vs-speed&#34;&gt;The world of LLMs: size vs. speed&lt;/h2&gt;
&lt;p&gt;While the release of ChatGPT marked the point where the general public learned about large language models and their abilities, LLMs were being developed well before that – and not just by OpenAI. Google’s LaMDA (137 billion parameters), NVIDIA and Microsoft’s MT NLG (530 billion parameters), and the Chinese BAAI’s model WuDao 2.0 (1.75 trillion parameters) all operate on the principle that bigger is better.&lt;/p&gt;
&lt;p&gt;All of these are proprietary models: some of them can be used via an API (and for a fee), while others are still entirely gatekept. This runs counter to a widespread standard in NLP, by which a language model’s trained weights are shared for fair general use. (For example, Google’s BERT and Meta’s RoBERTa were open-sourced from the start and could be used out of the box or fine-tuned by anyone.)&lt;/p&gt;
&lt;p&gt;A closed-source model, on the other hand, is even more of a black box than other neural nets: no one knows exactly which data went into these models or how their output is filtered. In the case of GPT-4, it’s not even clear how big the model actually is. What’s more, being trained on large collections of publicly available internet data, these models are only made possible by the practice of data-sharing in the first place.&lt;/p&gt;
&lt;p&gt;But the NLP community responded to the companies’ proprietary aspirations with lightning speed, training and open-sourcing its own large language models. You can take a look at the 
&lt;a href=&#34;https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Open LLM Leaderboard&lt;/a&gt; on Hugging Face or the 
&lt;a href=&#34;https://chat.lmsys.org/?arena&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Chatbot Arena&lt;/a&gt;, which hides the predictive models’ identities during experiments, to find out about the best performing models in the space.&lt;/p&gt;
&lt;p&gt;Open-source models are able to improve so much faster because they can build on each other’s successes. LLMs like Falcon, Vicuna, and Alpaca are trained transparently and on publicly available datasets. And though they are much smaller than their closed-source counterparts, the quality of these models’ output is beginning to catch up.&lt;/p&gt;
&lt;h2 id=&#34;llms--haystack--&#34;&gt;LLMs &#43; Haystack = 🚀&lt;/h2&gt;
&lt;p&gt;Naturally, LLMs have made an enormous impact on the world of NLP. Entirely new areas of research like prompt engineering and hallucinations have sprung up, generating heated debates and an increased speed of development that brings innovations every day.&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack, our OSS Python framework for NLP&lt;/a&gt;, lets you build your own LLM-powered systems by incorporating agents, prompt nodes, and your large language models of choice – be they closed- or open-source – within user-facing applications.&lt;/p&gt;
&lt;p&gt;Thanks to 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack’s modular structure&lt;/a&gt;, changing one line in your code is all that’s needed to switch between models. Experiment with various models on different platforms to see which ones work best for your use case. And once the next powerful OSS LLM comes along, you can simply plug it into your Haystack pipeline and start using it right away. With Haystack, you are ready for the future 🚀&lt;/p&gt;
&lt;p&gt;Do you like learning about all things NLP? We 
&lt;a href=&#34;https://haystack.deepset.ai/community&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;host a server in Discord&lt;/a&gt; where we answer questions and organize regular discussions about interesting topics from the world of natural language processing. Make sure to stop by – and if you’re interested in the latest research about LLMs, check out the nlp-discussions channel while you’re there!&lt;/p&gt;
&lt;p&gt;We hope to welcome you to our growing community of NLP enthusiasts soon! 🙂&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>The Beginner&#39;s Guide to LLM Prompting</title>
      <link>https://haystack.deepset.ai/blog/beginners-guide-to-llm-prompting/</link>
      <pubDate>Thu, 15 Jun 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/beginners-guide-to-llm-prompting/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Large language models (LLMs) have made it possible for everyone to interact with an AI – not through code but using prompts in natural language. The fact that language now acts as an interface to complex models makes it necessary to investigate the prompts we use more closely.&lt;/p&gt;
&lt;p&gt;When used correctly, generative models can produce highly valuable results for businesses. Getting your prompts right is therefore necessary to harness LLMs’ enormous potential, especially when you’re looking to incorporate it into your product. That is why an entire industry has formed around the topic of “prompt engineering.” In this post, we’ll explain our approach to this technique and share the dos and don’ts of prompting.&lt;/p&gt;
&lt;h2 id=&#34;what-is-a-prompt&#34;&gt;What is a prompt?&lt;/h2&gt;
&lt;p&gt;A prompt is an instruction to an LLM. If you have interacted with an LLM like ChatGPT, you have used prompts. Ideally, a prompt elicits an answer that is correct, adequate in form and content, and has the right length.&lt;/p&gt;
&lt;p&gt;Essentially, prompting is about packaging your intent in a natural-language query that will cause the model to return the desired response.&lt;/p&gt;
&lt;h2 id=&#34;how-to-come-up-with-good-prompts&#34;&gt;How to come up with good prompts&lt;/h2&gt;
&lt;p&gt;Good prompts follow two basic principles: clarity and specificity. Clarity describes the use of simple, unambiguous language that avoids jargon and overly complex vocabulary. So rather than keeping your queries short and snappy, better take the long-winded way to make your point sufficiently clear to the LLM.&lt;/p&gt;
&lt;p&gt;Example of an unclear prompt:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Who won the election?&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Example of a clear prompt:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Which party won the 2023 general election in Paraguay?&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Specificity, on the other hand, refers to the need for context. Tell your model as much as it needs to know to answer your question. Sometimes, this can veer almost into storytelling territory.&lt;/p&gt;
&lt;p&gt;Example of an unspecific prompt:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Generate a list of titles for my autobiography.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Example of a specific prompt:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Generate a list of ten titles for my autobiography. The book is about my journey as an adventurer who has lived an unconventional life, meeting many different personalities and finally finding peace in gardening.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Let’s look at a few tricks to make our prompts even better.&lt;/p&gt;
&lt;h2 id=&#34;prompt-engineering-hacks&#34;&gt;Prompt engineering hacks&lt;/h2&gt;
&lt;p&gt;Prompting is not a science, and calling it “engineering” may be a slight stretch. But researchers and users of LLMs have identified some tricks that consistently improve the models’ responses.&lt;/p&gt;
&lt;h3 id=&#34;do-say-do-dont-say-dont&#34;&gt;Do say “do,” don’t say “don’t”&lt;/h3&gt;
&lt;p&gt;This point actually continues our specificity theme because “do” instructions are by nature more specific than “don’t”. So rather than telling the model what not to do, it’s usually better to specify exactly what we want it to do – if we know what that is.&lt;/p&gt;
&lt;p&gt;Let’s imagine that, in our previous example, we want to make sure that the LLM doesn’t produce overly long titles (as those models tend to do). Rather than saying:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Don&#39;t make the titles too long&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;it is better to be specific:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Each title should be between two and five words long.&lt;/code&gt;&lt;/p&gt;
&lt;h3 id=&#34;use-few-shot-prompting&#34;&gt;Use few-shot prompting&lt;/h3&gt;
&lt;p&gt;In our example, we asked for a particular item (the title for an autobiography) of a certain length (two to five words). This type of instruction without any concrete examples is known as “zero-shot prompting.”&lt;/p&gt;
&lt;p&gt;Most models, however, benefit from “few-shot prompting,” where you feed your model some examples to learn the desired pattern. In our prompt, we can add a few existing titles as examples to help the model get a feel for what we’re looking for.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Here are some examples: Long walk to freedom, Wishful drinking, I know why the caged bird sings.&lt;/code&gt;&lt;/p&gt;
&lt;h3 id=&#34;structure-your-prompt-in-a-meaningful-way&#34;&gt;Structure your prompt in a meaningful way&lt;/h3&gt;
&lt;p&gt;Elements like quotation marks, bullet points, line breaks, etc. make it easier for humans to parse a text. And guess what? The same is true for LLMs. Let’s apply that insight to our previous example and spell out the full prompt once more.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;Generate a list of ten titles for my autobiography. The book is about my journey as an adventurer who has lived an unconventional life, meeting many different personalities and finally finding peace in gardening. Each title should be between two and five words long.

### Examples of great titles ###  
      
- “Long walk to freedom”  
- “Wishful drinking”  
- “I know why the caged bird sings”
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now that we’ve perfected our prompt, what does the LLM’s output look like? Here’s what ChatGPT came up with in response:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/beginners-guide-to-llm-prompting/titles.png alt=&#34;An output of a list of ten titles in the ChatGPT interface. The titles are: 1. Trails of Wanderlust: Roaming Hearts, 2. Serendipity’s Wild Ride: Life Unmapped, 3: Journeys Beyond Borders: Seeking Home, 4. Tales Unraveled: Nomadic Chronicles, 5. Discoveries in Unexpected Places, 6. From Vagabond to Green Thumb, 7. Ripples of Adventure: Tilling Tranquility, 8. Seeking Skies, Finding Roots, 9. Footprints and Seedlings: Unconventional Harmony, 10. Chasing Horizons, Cultivating Serenity&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;“Seeking Skies, Finding Roots” is hilarious and I would absolutely consider reading that book.&lt;/p&gt;
&lt;h3 id=&#34;use-leading-words&#34;&gt;Use leading words&lt;/h3&gt;
&lt;p&gt;Sometimes, the simplest techniques are the most effective. Researchers from Tokyo University have shown that telling a model to “think step by step” before its response can 
&lt;a href=&#34;https://arxiv.org/pdf/2205.11916.pdf&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;generate more accurate results&lt;/a&gt; and help the model correct its own mistakes. This method is known as using “leading words,” because we gently guide the model towards a more effective plan for problem-solving.&lt;/p&gt;
&lt;p&gt;The model is forced to break down its solution into multiple, more manageable steps rather than being allowed to just hazard a guess. Seeing how this simple trick improves the model’s performance is pretty striking. Here is an example of a math puzzle that requires just a bit of recursive thinking:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/beginners-guide-to-llm-prompting/wrong-math.png alt=&#34;ChatGPT input and output: The prompt “Add 1 and 3 and then divide by the latter and then add the former” elicits the answer “2” from the model.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;The LLM confidently responds – with the wrong answer. Here’s what happens when we ask it to “think step by step:”&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/beginners-guide-to-llm-prompting/correct-math.png alt=&#34;ChatGPT input and output: The prompt “Add 1 and 3 and then divide by the latter and then add the former. Let’s think step by step:” elicits the answer “Step1: Add 1 and 3. Result: 4, Step 2: Divide by the latter. Result: 4/3 = 1.333…, Step 3: Add the former. Result: 1.333…&#43;1=2.333…. The final answer is 2.333…” from the model.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;And this time, the final answer is correct.&lt;/p&gt;
&lt;h2 id=&#34;prompting-pitfalls&#34;&gt;Prompting pitfalls&lt;/h2&gt;
&lt;p&gt;In addition to these techniques for improvement, it helps to know which elements in an instruction can throw the model off. Some people, for example, try to incorporate requests for short, one-word answers into their prompts to prevent the models from rambling.&lt;/p&gt;
&lt;p&gt;However, as the last example has shown, wordier answers might not be a bad thing when it comes to LLMs – but rather, a technique these models use to arrive at a correct response. LLMs may soon learn to keep their thought processes to themselves. In the meantime, we should let them ramble and refrain from requests for one-word answers.&lt;/p&gt;
&lt;p&gt;Another problem happens when you exceed the model’s context length. Due to the nature of these large Transformer models, an LLM’s “context window” (which is the portion of text it can process at once) comprises both the prompt and its response. If you exceed a model’s context length, its output will deteriorate.&lt;/p&gt;
&lt;p&gt;While this is a fascinating topic, it is also a fairly complex one – context lengths vary widely between models, and it isn’t even trivial to measure the length of your prompt. Just remember that if the length of your prompt becomes an issue, you should look into breaking the problem down into smaller steps.&lt;/p&gt;
&lt;h2 id=&#34;become-a-better-prompter&#34;&gt;Become a better prompter&lt;/h2&gt;
&lt;p&gt;This article leans heavily on Vladimir Blagojevic’s 
&lt;a href=&#34;https://www.youtube.com/watch?v=-TfPkX3IoUQ&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;webinar on “Prompting LLMs Using Haystack&lt;/a&gt;.” In it, Vladimir, who is a member of the core engineering team here at deepset, takes a deep dive into the art of prompting and shows how Haystack, our open source framework for NLP, offers just the right tools to get prompting right.&lt;/p&gt;
&lt;p&gt;For even more tips on how to come up with great prompts, have a look at our 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/prompt-engineering-guidelines&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;guidelines for prompt engineering&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Prompts also play a huge role in Agents: the latest fascinating trend that lets you harness the power of LLMs. Agents receive a sort of “super-prompt” that instructs the LLM to break its reasoning into manageable steps and delegate those to the tools it deems most capable of solving the tasks. Have a look at our 
&lt;a href=&#34;https://haystack.deepset.ai/blog/introducing-haystack-agents&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;blog post on Agents&lt;/a&gt; to learn more.&lt;/p&gt;
&lt;p&gt;Finally, you don’t always need to write your own prompts. Head over to our 
&lt;a href=&#34;https://prompthub.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PromptHub&lt;/a&gt;, where you find prompts for all kinds of applications, which you can use out of the box or tune to your liking, to get the best answers from your LLM.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Run Haystack Pipelines in production with Ray Serve</title>
      <link>https://haystack.deepset.ai/blog/run-haystack-pipelines-with-ray-serve/</link>
      <pubDate>Wed, 14 Jun 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/run-haystack-pipelines-with-ray-serve/</guid>
      <description>
      
      
      

      
      

      <p>
<a href="https://docs.ray.io/en/latest/serve/index.html" target="_blank" rel="noopener">Ray Serve</a> is a library built on top of the Ray framework for building online inference APIs. Serve is designed to be framework-agnostic, and while its simple design lets you quickly integrate pretty much any Python logic you need to deploy, building up complex inference services is still possible and straightforward.</p>
<p>
<a href="https://docs.haystack.deepset.ai/docs/pipelines" target="_blank" rel="noopener">Haystack pipelines</a> can be quite complex, but even the simpler ones consist of multiple components, which in turn might rely on different models and technologies - this aspect make them a good benchmark to test out Ray Serve’s capabilities.</p>
<h2 id="serve-a-retriever---reader-qa-pipeline">Serve a Retriever - Reader QA pipeline</h2>
<h3 id="prepare-a-set-of-documents">Prepare a set of documents</h3>
<p>Let’s begin by serving locally a Haystack pipeline that’s simple but non-trivial: a question-answering system working on an existing set of documents. For simplicity, we’ll use a Docker image running Elasticsearch and populated with a set of documents about countries and capitals that can be queried right ahead. So before we start, let’s run the image:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-docker" data-lang="docker"><span style="display:flex;"><span>docker run -p 9200:9200 -p 9300:9300 -d deepset/elasticsearch-countries-and-capitals:latest
</span></span></code></pre></div><p>The Elasticsearch instance will accept connections on <code>localhost</code> through the port <code>9200</code>.</p>
<h3 id="run-the-haystack-pipeline">Run the Haystack pipeline</h3>
<p>Haystack pipelines can be defined through Python code or a 
<a href="https://docs.haystack.deepset.ai/docs/pipelines#yaml-file-definitions" target="_blank" rel="noopener">yaml definition</a> that is eventually and transparently converted into Python code. We’ll use the latter format, so open an editor and save the following definition in a file called <code>pipeline.yml</code></p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#268bd2;font-weight:bold">version</span>: <span style="color:#2aa198">ignore</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#93a1a1;font-style:italic"># define all the building-blocks for a Pipeline</span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2;font-weight:bold">components</span>:    
</span></span><span style="display:flex;"><span>  - <span style="color:#268bd2;font-weight:bold">name</span>: <span style="color:#2aa198">DocumentStore</span>
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2;font-weight:bold">type</span>: <span style="color:#2aa198">ElasticsearchDocumentStore</span>
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2;font-weight:bold">params</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#268bd2;font-weight:bold">host</span>: <span style="color:#2aa198">localhost</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#268bd2;font-weight:bold">name</span>: <span style="color:#2aa198">Retriever</span>
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2;font-weight:bold">type</span>: <span style="color:#2aa198">DensePassageRetriever</span>
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2;font-weight:bold">params</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#268bd2;font-weight:bold">document_store</span>: <span style="color:#2aa198">DocumentStore</span>
</span></span><span style="display:flex;"><span>      <span style="color:#268bd2;font-weight:bold">top_k</span>: <span style="color:#2aa198;font-weight:bold">5</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#268bd2;font-weight:bold">name</span>: <span style="color:#2aa198">Reader</span>
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2;font-weight:bold">type</span>: <span style="color:#2aa198">FARMReader</span>
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2;font-weight:bold">params</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#268bd2;font-weight:bold">model_name_or_path</span>: <span style="color:#2aa198">deepset/roberta-base-squad2</span>
</span></span><span style="display:flex;"><span>      <span style="color:#268bd2;font-weight:bold">context_window_size</span>: <span style="color:#2aa198;font-weight:bold">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#268bd2;font-weight:bold">return_no_answer</span>: <span style="color:#859900;font-weight:bold">true</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2;font-weight:bold">pipelines</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#268bd2;font-weight:bold">name</span>: <span style="color:#2aa198">query</span>
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2;font-weight:bold">nodes</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#268bd2;font-weight:bold">name</span>: <span style="color:#2aa198">Retriever</span>
</span></span><span style="display:flex;"><span>      <span style="color:#268bd2;font-weight:bold">inputs</span>: [<span style="color:#2aa198">Query]</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#268bd2;font-weight:bold">name</span>: <span style="color:#2aa198">Reader</span>
</span></span><span style="display:flex;"><span>      <span style="color:#268bd2;font-weight:bold">inputs</span>: [<span style="color:#2aa198">Retriever]</span>
</span></span></code></pre></div><p>To perform a query, we need a bit of Python code to read the yaml configuration above and tell Haystack to run the resulting pipeline:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">pathlib</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Path</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">config</span> = <span style="color:#268bd2">Path</span>(<span style="color:#2aa198">"pipeline.yml"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">pipeline</span> = <span style="color:#268bd2">Pipeline</span>.<span style="color:#268bd2">load_from_yaml</span>(<span style="color:#268bd2">config</span>, <span style="color:#2aa198">"query"</span>)
</span></span><span style="display:flex;"><span><span style="color:#268bd2">answer</span> = <span style="color:#268bd2">pipeline</span>.<span style="color:#268bd2">run</span>(<span style="color:#2aa198">"What is the capital of France?"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#cb4b16">print</span>(<span style="color:#2aa198">f</span><span style="color:#2aa198">"Answer: </span><span style="color:#2aa198">{</span><span style="color:#268bd2">answer</span><span style="color:#2aa198">}</span><span style="color:#2aa198">"</span>)
</span></span></code></pre></div><p>As you can see, each time we want to make a question we need to rebuild the pipeline and call the <code>run</code> method - let’s make an inference service out of this and query it over HTTP.</p>
<h3 id="run-the-haystack-pipeline-through-ray-serve">Run the Haystack pipeline through Ray Serve</h3>
<p>A Ray Serve deployment looks exactly like a regular HTTP server from the caller perspective, and to provide an “ingress” endpoint the bare minimum requirement is writing a Python class similar to the following:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">@serve.deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#859900">class</span> <span style="color:#cb4b16">MinimalIngress</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#859900">async</span> <span style="color:#859900">def</span> <span style="color:#268bd2">__call__</span>(<span style="color:#cb4b16">self</span>, <span style="color:#268bd2">request</span>: <span style="color:#268bd2">Request</span>) -> <span style="color:#cb4b16">str</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#268bd2">name</span> = <span style="color:#859900">await</span> <span style="color:#268bd2">request</span>.<span style="color:#268bd2">json</span>()[<span style="color:#2aa198">"name"</span>]
</span></span><span style="display:flex;"><span>      <span style="color:#859900">return</span> <span style="color:#2aa198">f</span><span style="color:#2aa198">"Hello </span><span style="color:#2aa198">{</span><span style="color:#268bd2">name</span><span style="color:#2aa198">}</span><span style="color:#2aa198">"</span>
</span></span></code></pre></div><p>The presence of an <code>async def __call__</code> method is the only contract we need to fulfill in order to spawn a Ray Serve deployment. To create a deployment that will run a Haystack pipeline, let’s create a file called <code>capitals.py</code> and add the following code:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">pathlib</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Path</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">ray</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">serve</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">starlette.requests</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Request</span>
</span></span><span style="display:flex;"><span><span style="color:#dc322f;font-weight:bold">from</span> <span style="color:#268bd2">haystack</span> <span style="color:#dc322f;font-weight:bold">import</span> <span style="color:#268bd2">Pipeline</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">@serve.deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#859900">class</span> <span style="color:#cb4b16">HaystackQA</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#859900">def</span> <span style="color:#268bd2">__init__</span>(<span style="color:#cb4b16">self</span>):
</span></span><span style="display:flex;"><span>        <span style="color:#93a1a1;font-style:italic"># Load the pipeline from file and store it in self._pipeline</span>
</span></span><span style="display:flex;"><span>        <span style="color:#93a1a1;font-style:italic"># so we can reuse it every time __call__ is invoked.</span>
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">config</span> = <span style="color:#268bd2">Path</span>(<span style="color:#2aa198">"pipeline.yml"</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#cb4b16">self</span>.<span style="color:#268bd2">_pipeline</span> = <span style="color:#268bd2">Pipeline</span>.<span style="color:#268bd2">load_from_yaml</span>(<span style="color:#268bd2">config</span>, <span style="color:#2aa198">"query"</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#859900">async</span> <span style="color:#859900">def</span> <span style="color:#268bd2">__call__</span>(<span style="color:#cb4b16">self</span>, <span style="color:#268bd2">request</span>: <span style="color:#268bd2">Request</span>) -> <span style="color:#cb4b16">str</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">query</span>: <span style="color:#cb4b16">str</span> = <span style="color:#cb4b16">str</span>(<span style="color:#859900">await</span> <span style="color:#268bd2">request</span>.<span style="color:#268bd2">body</span>())
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">res</span> = <span style="color:#cb4b16">self</span>.<span style="color:#268bd2">_pipeline</span>.<span style="color:#268bd2">run</span>(<span style="color:#268bd2">query</span>=<span style="color:#268bd2">query</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#268bd2">answers</span> = <span style="color:#268bd2">res</span>.<span style="color:#268bd2">get</span>(<span style="color:#2aa198">"answers"</span>, [])
</span></span><span style="display:flex;"><span>        <span style="color:#859900">if</span> <span style="color:#268bd2">answers</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#93a1a1;font-style:italic"># Return the first answer</span>
</span></span><span style="display:flex;"><span>            <span style="color:#859900">return</span> <span style="color:#268bd2">answers</span>[<span style="color:#2aa198;font-weight:bold">0</span>].<span style="color:#268bd2">answer</span>
</span></span><span style="display:flex;"><span>        <span style="color:#859900">return</span> <span style="color:#2aa198">""</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#268bd2">haystack_deployment</span> = <span style="color:#268bd2">HaystackQA</span>.<span style="color:#268bd2">bind</span>()
</span></span></code></pre></div><p>A couple of things to note about the code above:</p>
<ul>
<li>We added a <code>__init__</code> method where we initialize the pipeline once and for the whole lifetime of the deployment.</li>
<li>Every class decorated with <code>@serve.deployment</code> gets a <code>bind</code> method that tells Ray Serve to attach our <code>HaystackQA</code> class to a deployment object that we name <code>haystack_deployment</code>. In a moment, when we will start the server process, Ray Serve will pick up the value of this variable.</li>
</ul>
<p>The content of the Python file <code>capitals.py</code> is now all we need to “serve” our deployment. Ray Serve can be run from Python directly, but for this example we’ll show how to start the process from the command line, which is closer to what we would do in a production use case. From a shell where the Python environment containing <code>ray</code> is active, you can just run:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>serve run capitals:haystack_deployment
</span></span></code></pre></div><p>You should see several log lines scrolling down in the shell, and if everything went well, a final message <code>Deployed Serve app successfully.</code> will tell us that the deployment is ready to accept connections - we can make a query now. We will use <code>curl</code> but really any HTTP client will work the same. From the command line:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>curl -X POST -d <span style="color:#2aa198">"What is the capital of Italy?"</span> http://localhost:8000 
</span></span></code></pre></div><p>It might take a while to get a response the very first time, while Haystack downloads the model locally, but in the end you should see the response <code>Rome</code>.</p>
<h2 id="scale-the-deployment">Scale the deployment</h2>
<p>What we did so far doesn’t look much different from putting a <code>pipeline.run()</code> call behind a REST API, but Ray Serve shines when it’s time to put our pipeline in production. For example, let’s say we want to scale our <code>HaystackQA</code> horizontally to serve multiple requests at once. All we need to do is to change the Python code like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">@serve.deployment</span>(<span style="color:#268bd2">num_replicas</span>=<span style="color:#2aa198;font-weight:bold">3</span>)  <span style="color:#93a1a1;font-style:italic"># this is the only line to change!</span>
</span></span><span style="display:flex;"><span><span style="color:#859900">class</span> <span style="color:#cb4b16">HaystackQA</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#859900">def</span> <span style="color:#268bd2">__init__</span>(<span style="color:#cb4b16">self</span>):
</span></span><span style="display:flex;"><span>			...
</span></span></code></pre></div><p>We can now stop the server and restart it with the same <code>serve run</code> command: that’s it, we know have 3 instances of our <code>HaystackQA</code> deployment as you can see from the Ray control panel running at 
<a href="http://127.0.0.1:8265/#/actors" target="_blank" rel="noopener"><code>http://127.0.0.1:8265/#/actors</code></a></p>
<p>In case we want to adapt the number of deployments to the current traffic so we can optimize costs, Ray Serve offers autoscaling out of the box. All we need to change is again the <code>@serve.deployment</code> decorator like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#268bd2">@serve.deployment</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#268bd2">autoscaling_config</span>={
</span></span><span style="display:flex;"><span>        <span style="color:#2aa198">"min_replicas"</span>: <span style="color:#2aa198;font-weight:bold">1</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#2aa198">"initial_replicas"</span>: <span style="color:#2aa198;font-weight:bold">2</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#2aa198">"max_replicas"</span>: <span style="color:#2aa198;font-weight:bold">5</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#2aa198">"downscale_delay_s"</span>: <span style="color:#2aa198;font-weight:bold">30</span>,
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span><span style="color:#859900">class</span> <span style="color:#cb4b16">HaystackQA</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#859900">def</span> <span style="color:#268bd2">__init__</span>(<span style="color:#cb4b16">self</span>):
</span></span><span style="display:flex;"><span>			...
</span></span></code></pre></div><p>The <code>autoscaling_config</code> parameters are self-explanatory and the 
<a href="https://docs.ray.io/en/latest/serve/scaling-and-resource-allocation.html#autoscaling-config-parameters" target="_blank" rel="noopener">docs</a> are very helpful in detailing the logic behind that. In this case we tell Ray Serve that we want to scale up our deployment to up to 5 deployment replicas and scale down to at least 1. We also want to start with 2 replicas and downscale if there’s no activity for 30 seconds. Once again you can observe how this works behind the scenes by starting the server and looking at the dashboard on 
<a href="http://127.0.0.1:8265/#/actors" target="_blank" rel="noopener"><code>http://127.0.0.1:8265/#/actors</code></a>.</p>
<h2 id="whats-next">What’s next</h2>
<p>The deployment strategy we introduced in this article is ok for many use cases ranging from trying out Haystack to demo applications and small POCs, but close readers will note the limit of scaling up and down a whole Haystack pipeline: certain nodes of a pipeline might benefit more than others of horizontal scaling and redundancy, can we do that with Ray Serve? The answer is yes, and we’ll see in an upcoming article how we can split a pipeline into multiple deployments that can be then managed by Ray Serve with different strategies.</p>

      </description>
    </item>
    
    <item>
      <title>Generative vs. Extractive Language Models</title>
      <link>https://haystack.deepset.ai/blog/generative-vs-extractive-models/</link>
      <pubDate>Mon, 22 May 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/generative-vs-extractive-models/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Generative language models like ChatGPT have taken the world by storm. Aside from their value in areas like programming and translation, generative models’ main selling point is their ability to come up with fluent, well-formed natural language responses. Like humans, these models seem to generate their answers out of thin air, and without looking up the answers in a factual database.&lt;/p&gt;
&lt;p&gt;This sets them apart from their humbler siblings: &lt;em&gt;extractive&lt;/em&gt; language models. Both extractive and generative models are based on the breakthrough Transformer architecture that ushered in a new generation of natural-language understanding (NLU) for machines.&lt;/p&gt;
&lt;p&gt;However, extractive models differ from generative models in that they explicitly need a context to extract information from at querying time, and they return that information as direct quotations from the source, whereas generative models are trained for generating language and capable of writing new text from scratch. Compared to the fluency of their generative counterparts, extractive models like RoBERTa and ELECTRA can therefore seem much less capable.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/generative-vs-extractive-models/shakespeare.png alt=&#34;A comparison of extractive and generative models’ responses to the question ‘Who was Shakespeare?’. The generative model provides a detailed generated answer, while the extractive model gives a concise response extracted from a text.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;But it is becoming increasingly clear that generative models suffer from their own set of problems, like their size, the fact that many of them are proprietary (which isn’t ideal for everyone), and most significantly, their tendency to make things up. What’s more, extractive models have a much better track record in real-world applications — for example, in private or public semantic search engines, or in information extraction systems.&lt;/p&gt;
&lt;p&gt;So if you’re considering the use of a 
&lt;a href=&#34;https://haystack.deepset.ai/blog/what-is-a-language-model&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;modern language model&lt;/a&gt; in your own project, this article is for you. We’ll talk about the strengths and limitations of extractive and generative language models, and show you which use cases benefit most from which kind of model.&lt;/p&gt;
&lt;h2 id=&#34;the-popularity-of-generative-language-models&#34;&gt;The popularity of generative language models&lt;/h2&gt;
&lt;p&gt;In November 2022, 
&lt;a href=&#34;https://openai.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenAI&lt;/a&gt; made its latest large language model — ChatGPT — available to the general public. No coding skills were needed to use it — suddenly everyone was able to “chat” with a highly performant language model and experience for the first time how far these models have come in recent years. This has led to a wave of GPT-related content that variously explains, hypes, and warns against this technology.&lt;/p&gt;
&lt;p&gt;However, the foundation of these models are not new. GPT models are built up on 
&lt;a href=&#34;https://arxiv.org/abs/1706.03762&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Transformer architecture first introduced by Google&lt;/a&gt; in 2017 — years before ChatGPT arrived on the scene. The new Transformer architecture quickly gained widespread attention and became the foundation for numerous subsequent advancements in NLP.&lt;/p&gt;
&lt;p&gt;Generative models can have any number of parameters, but to yield the impressive results we’ve seen from ChatGPT, it seems that number needs to be almost inconceivably huge — the models in the 
&lt;a href=&#34;https://developer.nvidia.com/blog/openai-presents-gpt-3-a-175-billion-parameters-language-model/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GPT-3 series have 175 billion parameters&lt;/a&gt;, while GPT-4 is said to have more than a trillion. In terms of training data, too, more is more for generative models: not only are they trained on vast amounts of data, but they also seem to store a representation of the facts in that data (rather than just using it to build their linguistic intuition), which leads to an impression of omniscience.&lt;/p&gt;
&lt;p&gt;Closed-source language models like those by OpenAI are only accessible through an API and not always available for general download. But there are also impressive open source generative models shared widely on platforms like the 
&lt;a href=&#34;https://huggingface.co/models&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hugging Face model hub&lt;/a&gt;. Open source generative models like 
&lt;a href=&#34;https://lmsys.org/blog/2023-03-30-vicuna/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Vicuna&lt;/a&gt; and 
&lt;a href=&#34;https://open-assistant.io/chat&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenAssistant&lt;/a&gt; are catching up quickly though, generating 
&lt;a href=&#34;https://www.semianalysis.com/p/google-we-have-no-moat-and-neither&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;results that could soon dictate the state of the art&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;how-do-extractive-models-differ-from-generative-ones&#34;&gt;How do extractive models differ from generative ones?&lt;/h2&gt;
&lt;p&gt;In terms of architecture, broadly speaking, extractive models like those 
&lt;a href=&#34;https://haystack.deepset.ai/blog/the-definitive-guide-to-bertmodels&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;based on BERT&lt;/a&gt; belong to the same family of language models as their generative counterparts, since both use the Transformers. However, the main distinction lies in the usage of these models. Unlike generative models, extractive models require both a query and a specific piece of text as the foundation for their response. It then encodes the query as a high-dimensional, semantically informed vector exactly like a generative model does. But rather than generating an answer from scratch based on the given context as generative models do, the extractive model marks the section in the text document that, according to the model’s prediction, provides the best answer to the query.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/generative-vs-extractive-models/hamlet.png alt=&#34;A high-level explanation of how extractive models retrieve answers from a given document. The question ‘Who is the main female character in ‘Hamlet’?’ is asked, and the Reader component, using an extractive language model, answers with ‘Ophelia, Polonius’s daughter’ from the given document.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;So what’s the use of such a model, when you need to provide it with the document that contains the answer? Well, for one, it can be very useful when you need to extract the same kind of information from multiple documents: a classic 
&lt;a href=&#34;https://www.deepset.ai/blog/automating-information-extraction-with-question-answering&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;information extraction&lt;/a&gt; scenario. But even more significantly, applied NLP has come up with an ingenious pipeline paradigm to allow extractive models to operate on a large collection of documents, too. In a 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/01_basic_qa_pipeline#creating-the-retriever-reader-pipeline&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Retriever-Reader pipeline&lt;/a&gt;, a faster model pre-selects the best document “candidates,” which are then given a closer read by the extractive model to find the best answer possible.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/generative-vs-extractive-models/hamlet-retriever.png alt=&#34;A high-level explanation of how an extractive question answering pipeline works. The question ‘Who is the main female character in ‘Hamlet’?’ is asked. First, the Retriever component pre-selects the best document candidates, which are then passed to the Reader component. The Reader employs an extractive model to read the documents and determine the most suitable answer. In this case, the Reader component answers with ‘Ophelia, Polonius’s daughter’.&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Their purely extractive property means the model can only provide answers that quote verbatim from a text. Abstraction, paraphrasing, and the formation of well-formed answers are not in the repertoire of this model family.&lt;/p&gt;
&lt;p&gt;Because of their extractive nature, these models have no use for storing factual information, and therefore do not necessarily benefit from having more parameters. Compared to their generative cousins, extractive models are therefore usually much smaller in size and require less training data.&lt;/p&gt;
&lt;p&gt;Their relative smallness also makes it easier to share these models. In fact, the most popular models are all available on the model hub, where everyone can download them either for direct use or for 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/02_finetune_a_model_on_your_data&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;fine-tuning on a more specific dataset&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;the-problem-with-generative-modelsand-how-to-combat-it&#34;&gt;The problem with generative models — and how to combat it&lt;/h2&gt;
&lt;p&gt;A recurring problem with generative models is that they sometimes make up information and treat it as ground truth. This behavior is commonly referred to as “hallucinations” and the main reason for this is the model’s ability to produce language rather than extracting the answer from a given text.&lt;/p&gt;
&lt;p&gt;Hallucinations can occur when the model has insufficient information to answer a question accurately, when it receives the same prompt multiple times, or when it is prompted to come up with an argument that runs counter to commonly accepted facts. The last case shows how blurry the notion of hallucinations is. After all, it is perfectly common to make up facts: in fiction, for example, in hypothetical examples used for illustrative purposes, or in writing assignments for school.&lt;/p&gt;
&lt;p&gt;A problem arises, however, when a user expects factual knowledge, and instead receives a hallucination without realizing it. Considering that generative models have been proposed for various tasks such as teaching, programming, and writing, 
&lt;a href=&#34;https://www.datanami.com/2023/01/17/hallucinations-plagiarism-and-chatgpt/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;the significant presence of hallucinated content in ChatGPT’s output&lt;/a&gt; makes it necessary to rethink their applicability in the real world.&lt;/p&gt;
&lt;p&gt;While several methods have been proposed to detect hallucinations amongst a generative model’s output, we at 
&lt;a href=&#34;https://www.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset&lt;/a&gt; often use the approach of improving the reliability of these models through retrieval augmentation.&lt;/p&gt;
&lt;h3 id=&#34;retrieval-augmentation-feed-the-ground-truth-to-your-generative-model&#34;&gt;Retrieval augmentation: feed the ground truth to your generative model&lt;/h3&gt;
&lt;p&gt;In 
&lt;a href=&#34;https://haystack.deepset.ai/blog/build-a-search-engine-with-gpt-3&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retrieval-augmented question answering&lt;/a&gt; (and other applications), we don’t rely solely on the facts contained within the parameters of the model itself. Instead, we make use of the customizable nature of modern NLP systems, and connect the generative model to a database of curated documents — much like we saw earlier in the extractive QA setup.&lt;/p&gt;
&lt;p&gt;When the model is prompted for an answer, it then uses that database as its factual basis, while still relying on its amazing ability to understand and produce natural language. As an interim step, we use a Retriever to retrieve the most relevant documents in our database and embed it into our query to the model. A system with human-like conversational capabilities and a fact-checked knowledge base: retrieval augmentation gets you the best of both worlds.&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/22_pipeline_with_promptnode&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Connecting your generative model to a database&lt;/a&gt; doesn’t only combat hallucinations; it also helps with accuracy more generally. The information stored in the language model’s parameters becomes outdated quickly, but you can make sure to update the documents in your database on a regular basis.&lt;/p&gt;
&lt;p&gt;However, there are also downsides to this method when combined with a proprietary third-party model.&lt;/p&gt;
&lt;p&gt;Some companies are wary of sending the potentially sensitive data in their documents to an external service. For these organizations, it may be preferable to host the generative model locally rather than use a third-party provider.&lt;/p&gt;
&lt;h2 id=&#34;generative-or-extractive-which-one-is-better&#34;&gt;Generative or extractive, which one is better?&lt;/h2&gt;
&lt;p&gt;Now that we’ve gained a good sense of what characterizes generative and extractive language models, it’s time to look at the best use cases for each type of model. The decision depends on the character of your application itself, as well as your technical requirements and financial limitations.&lt;/p&gt;
&lt;h3 id=&#34;when-to-use-generative-models&#34;&gt;When to use generative models&lt;/h3&gt;
&lt;p&gt;Generative models are necessary when you need your application’s output to sound like a natural response that could have been produced by a human. That’s why these models are popular for &lt;strong&gt;chatbots&lt;/strong&gt;, or as &lt;strong&gt;writing aides&lt;/strong&gt; in areas like copywriting and software documentation.&lt;/p&gt;
&lt;p&gt;Another type of application that calls for generative models is any application that includes paraphrasing. As illustrated above, extractive models can only copy content from a knowledge base, while generative ones produce completely new utterances from scratch. This is indispensable in &lt;strong&gt;translation&lt;/strong&gt;, for instance (in fact, the very first Transformer was built for translation purposes), and extremely useful in 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/prompt_node#prompttemplates&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;other abstractive use cases&lt;/a&gt; like &lt;strong&gt;summarization&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Regarding the financial and technical requirements, keep in mind that, as of now, the most high-performing generative models aren’t open-source. OpenAI, for example, charges a fee for the use of its models through its API. You need to consider how this will impact your budget, especially in case of unexpected spikes in requests to your app.&lt;/p&gt;
&lt;p&gt;Some developers might also be uncomfortable with the notion that they’re not in charge of the model’s controls, or of the data used to train or fine-tune it. Retrieval augmentation can help with this final point — as long as you’re ok with sending your internal documents to an external service.&lt;/p&gt;
&lt;h3 id=&#34;when-to-use-extractive-models&#34;&gt;When to use extractive models&lt;/h3&gt;
&lt;p&gt;Extractive models are commonly used in question answering. The most typical application of this technique is thus in use cases where the user wants to receive answers verbatim from a large collection of documents. It can, for example, be used to &lt;strong&gt;extract answers&lt;/strong&gt; from technical documents in a short amount of time. As a rule of thumb, the more sensitive the application, the higher the need for &lt;strong&gt;textual faithfulness&lt;/strong&gt;, and the lower the tolerance for paraphrases.&lt;/p&gt;
&lt;p&gt;But question answering can mean much more than just that. In fact, it is a technique often used to &lt;strong&gt;extract information&lt;/strong&gt; from texts, which can then be used further downstream in an application. Let’s say you have, for instance, a collection of business reports and want to identify the main entities involved in certain operations. You can use an extractive model to 
&lt;a href=&#34;https://www.deepset.ai/blog/haystack-node-for-information-extraction&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;extract those answers from the text&lt;/a&gt;, make sure they contain only named entities such as persons, organizations, or places, and use those entities to &lt;strong&gt;populate a knowledge graph&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;In short, extractive models are the way to go when you’re interested in the information itself, rather than fluent answers; when you want to use them in downstream tasks; and when your application requires textual faithfulness. Researchers have also found that extractive models 
&lt;a href=&#34;https://arxiv.org/abs/2203.07522&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;operate better on out-of-domain data&lt;/a&gt; than generative ones.&lt;/p&gt;
&lt;p&gt;In terms of technical requirements, the big difference compared to generative models is that you usually host them on your own infrastructure — giving you both full control and full responsibility over them. While these models are available on the model hub for free, it’s useful to have access to one or more GPUs for fine-tuning, and to be able to allocate resources to 
&lt;a href=&#34;https://www.deepset.ai/blog/data-centric-ai&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;data-centric tasks&lt;/a&gt; like 
&lt;a href=&#34;https://www.deepset.ai/blog/labeling-data-with-haystack-annotation-tool&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;annotation&lt;/a&gt; and qualitative evaluation.&lt;/p&gt;
&lt;p&gt;Finally, keep in mind that not all Transformer models for NLP fit into the extractive and generative categories! In fact, there’s a whole barrage of language models that don’t return any answers at all, but that instead match or classify documents on the basis of their semantic content. These models, which are faster and less resource-consuming than the models discussed in this article, can be used in applications like 
&lt;a href=&#34;https://www.deepset.ai/blog/how-to-build-a-semantic-search-engine-in-python&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;semantic search&lt;/a&gt; — whether public-facing or business-internal — as well as 
&lt;a href=&#34;https://www.deepset.ai/blog/semantic-faq-search-with-haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;semantic FAQ search&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;generative-and-extractive-haystack-has-it-all&#34;&gt;Generative and extractive: Haystack has it all&lt;/h2&gt;
&lt;p&gt;Whether you’re looking to build applications based on generative or extractive language models: 
&lt;a href=&#34;https://haystack.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt;, our open-source framework for applied NLP, makes handling models of either family a breeze. With our composable pipelines and hands-on tutorials, you can build your own prototype in no time. To get started, why not have a look at our components for 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/reader#models&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;extractive&lt;/a&gt; and 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/prompt_node&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;generative question answering&lt;/a&gt;?&lt;/p&gt;
&lt;p&gt;Are you interested in exchanging tips and opinions on generative vs. extractive QA, large language models, or other topics in NLP with like-minded people? Come 
&lt;a href=&#34;https://discord.com/invite/xYvH6drSmA&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;join our Discord community&lt;/a&gt; and 
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;visit us on GitHub&lt;/a&gt; :)&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>How to Prevent Prompt Injections: An Incomplete Guide</title>
      <link>https://haystack.deepset.ai/blog/how-to-prevent-prompt-injections/</link>
      <pubDate>Fri, 19 May 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/how-to-prevent-prompt-injections/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;
&lt;a href=&#34;https://chat.openai.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ChatGPT&lt;/a&gt; is awesome, 
&lt;a href=&#34;https://huggingface.co/chat/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;HuggingChat&lt;/a&gt; is awesome, 
&lt;a href=&#34;https://crfm.stanford.edu/2023/03/13/alpaca.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Alpaca&lt;/a&gt; is awesome. However, if you want to &lt;strong&gt;use these models in an application,&lt;/strong&gt; for example, for your customer support, you may encounter a new problem that you should be aware of: &lt;strong&gt;Prompt injections&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/how-to-prevent-prompt-injections/meme.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;This guide showcases approaches to handling prompt injections. It also includes a brief overview of the first public 
&lt;a href=&#34;https://huggingface.co/datasets/deepset/prompt-injections&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;prompt injection datasets&lt;/strong&gt;&lt;/a&gt; and one of the first 
&lt;a href=&#34;https://huggingface.co/deepset/deberta-v3-base-injection&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;pre-trained prompt injection detection models&lt;/strong&gt;&lt;/a&gt; available on Hugging Face that you can use to combat attacks against your system.&lt;/p&gt;
&lt;h2 id=&#34;what-are-prompt-injections&#34;&gt;What are prompt injections?&lt;/h2&gt;
&lt;p&gt;Prompt injections are prompts that trick a generative language model into writing something the model providers clearly did not intend,for example., hate speech. 
&lt;a href=&#34;https://arxiv.org/abs/2211.09527&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;We can roughly distinguish two types of prompt injection&lt;/a&gt;: &lt;strong&gt;goal hijacking &amp; prompt leakage&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/how-to-prevent-prompt-injections/injection-classes.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Both types can pose serious threats to organizations or individuals trying to leverage generative AI:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Goal hijacking: AI systems, like any powerful tool, can be misused. If a model can be tricked into generating &lt;strong&gt;harmful or inappropriate content&lt;/strong&gt;, it can be weaponized against individuals or communities, leading to real-world harm. The organization hosting the model might face serious &lt;strong&gt;public backlash&lt;/strong&gt;. In extreme cases, they might even violate laws.&lt;/li&gt;
&lt;li&gt;Prompt leakage: As prompt engineering is developing into a 
&lt;a href=&#34;https://www.linkedin.com/pulse/prompt-engineering-new-skill-master-2023-or-koren/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;novel discipline&lt;/a&gt;, companies might lose &lt;strong&gt;competitive advantages&lt;/strong&gt; if their model inadvertently leaks sensitive information. Depending on the architecture of the AI system, prompt leakage could also &lt;strong&gt;violate privacy norms and laws&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;how-can-we-handle-injections&#34;&gt;How can we handle injections?&lt;/h2&gt;
&lt;p&gt;The first step to improve resilience against prompt injections is to improve the &lt;strong&gt;robustness of the internal prompt&lt;/strong&gt; that is added to the user input. Let’s look at an example from a retrieval augmented question-answering system:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/how-to-prevent-prompt-injections/prompt-example.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;By putting the user input into curly &lt;strong&gt;brackets&lt;/strong&gt;, separating it by additional &lt;strong&gt;delimiters,&lt;/strong&gt; and adding &lt;strong&gt;text after the input&lt;/strong&gt;, the system becomes more robust against prompt injections. Other potential safety measures according to 
&lt;a href=&#34;https://arxiv.org/abs/2211.09527&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Perez &amp; Ribeiro (2022)&lt;/a&gt; include setting a lower temperature and increasing the frequency penalty. Additionally, since elaborate prompt injections may require a lot of text to provide context, simply &lt;strong&gt;limiting the user input&lt;/strong&gt; to a reasonable maximum length makes prompt injection attacks a lot harder.&lt;/p&gt;
&lt;p&gt;However, these measures may not suffice in many cases. So, what else can we do?&lt;/p&gt;
&lt;h2 id=&#34;lets-detect-injections&#34;&gt;Let’s detect injections!&lt;/h2&gt;
&lt;p&gt;Ideally, we filter all prompt injection attempts even before being passed to our generative model. This will not only help prevent injection attacks but also save us money as a classifier model is typically a lot smaller than a generative model.&lt;/p&gt;
&lt;h3 id=&#34;building-the-dataset&#34;&gt;Building the dataset&lt;/h3&gt;
&lt;p&gt;To train an injection classifier, we first assembled a novel dataset of 662 widely varying prompts, including 263 prompt injections and 399 legitimate requests. As legitimate requests, we included various questions and keyword-based searches. We then split the data into train and test datasets.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/how-to-prevent-prompt-injections/dataset.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;During the first model training, we had two main learnings that helped us enrich the dataset with adversarial examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Including translations.&lt;/strong&gt; Simply switching the language of the prompt injection may cause security measures being bypassed. To prevent that, we included translations of both prompt injections as well as legitimate requests. We avoided leakage by making sure that each prompt and its respective translation always remain within the respective training or test set.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Including stacked prompts.&lt;/strong&gt; Initially, we were able to fool our first models into thinking a prompt was legitimate by combining both legitimate and injection prompts in one prompt. This was especially true if the legitimate part was a prompt that the model had seen during training. Consequently, we included adversarial examples by randomly stacking legitimate and injection prompts within the training and test set, respectively.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The dataset is available on Hugging Face: 
&lt;a href=&#34;https://huggingface.co/datasets/deepset/prompt-injections&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://huggingface.co/datasets/deepset/prompt-injections&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;training-the-model&#34;&gt;Training the model&lt;/h3&gt;
&lt;p&gt;We fine-tuned the popular state-of-the-art 
&lt;a href=&#34;https://huggingface.co/microsoft/deberta-base&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;DeBERTa base&lt;/a&gt; model 
&lt;a href=&#34;https://huggingface.co/docs/transformers/training&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;using the transformers library&lt;/a&gt; and Google Colab. The resulting model achieves 99.1% accuracy on our holdout test set, only failing in one edge case.&lt;/p&gt;
&lt;p&gt;You can find and try the model on Hugging Face: 
&lt;a href=&#34;https://huggingface.co/deepset/deberta-v3-base-injection&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;https://huggingface.co/deepset/deberta-v3-base-injection&lt;/a&gt;. It’s fully open-source.&lt;/p&gt;
&lt;h2 id=&#34;embedding-the-model-in-an-ai-system&#34;&gt;Embedding the model in an AI system&lt;/h2&gt;
&lt;p&gt;Once you have a classification model, there are multiple ways you can put it into production. For example, you can use this model within the 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/14_query_classifier&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;TransformersQueryClassifier&lt;/a&gt; from Haystack as a filter in a QA AI system:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/how-to-prevent-prompt-injections/ai-system-1.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/how-to-prevent-prompt-injections/ai-system-2.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;final-remarks&#34;&gt;Final remarks&lt;/h2&gt;
&lt;p&gt;Despite its surprisingly convincing performance, using the model as a filter does not guarantee that no prompt injections pass the check. The model should be treated as an &lt;strong&gt;additional layer of security&lt;/strong&gt;, rather than as a complete solution to the problem of prompt injection.&lt;/p&gt;
&lt;p&gt;Also, the model is trained for the chosen type of legitimate prompts. If your use case requires very different prompts to pass as legitimate, you can simply replace our legitimate prompts by yours and fine-tune the model.&lt;/p&gt;
&lt;p&gt;We hope the 
&lt;a href=&#34;https://huggingface.co/deepset/deberta-v3-base-injection&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;model&lt;/a&gt; and 
&lt;a href=&#34;https://huggingface.co/datasets/deepset/prompt-injections&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataset&lt;/a&gt; will prove as valuable for your LLM projects as it has been for ours!&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/how-to-prevent-prompt-injections/smily.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Scaling NLP indexing pipelines with KEDA and Haystack — Part 1: The Application</title>
      <link>https://haystack.deepset.ai/blog/scaling-nlp-indexing-pipelines-with-keda-and-haystack-part-1/</link>
      <pubDate>Mon, 01 May 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/scaling-nlp-indexing-pipelines-with-keda-and-haystack-part-1/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Large language models have been one of the most hyped technologies lately. By combining them with a vector database that acts as a long-term memory or document store, you can improve prompts with additional context. Applications that use this technique, such as &lt;strong&gt;ChatGPT Plugins&lt;/strong&gt;  and Google’s project  &lt;strong&gt;Magi&lt;/strong&gt;, are rapidly gaining popularity. For many business users, the ability to enhance a prompt with private or recent data is what makes the  &lt;strong&gt;difference between a prototype and a production-ready NLP application&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;But how do we create a service that can turn files into searchable documents? And how can we efficiently scale these applications to index millions of files?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The goal of this article is to build a system that can run use case-specific indexing pipelines for large-scale production NLP applications.&lt;/strong&gt;  It will focus on  &lt;strong&gt;how to create, deploy, and dynamically scale these applications&lt;/strong&gt;  using  
&lt;a href=&#34;https://haystack.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt;  and  
&lt;a href=&#34;https://keda.sh/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;KEDA&lt;/a&gt;. The first part covers creating a Python application with Haystack. The second part details how to deploy and scale the application using  
&lt;a href=&#34;https://keda.sh/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;KEDA&lt;/a&gt;  on  
&lt;a href=&#34;https://kubernetes.io/de/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Kubernetes&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This tutorial only covers turning files into text snippets, embeddings or arbitrary preprocessed data. We will not cover the augmented LLM application itself. If you are interested in creating such applications, feel free to use this tutorial:  
&lt;a href=&#34;https://haystack.deepset.ai/blog/build-a-search-engine-with-gpt-3&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Build a Search Engine with GPT-3&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;natural-language-processing-basics&#34;&gt;Natural language processing basics&lt;/h2&gt;
&lt;p&gt;The idea behind using documents for search is to fill in the gaps in the model’s knowledge, such as the latest news. This is known as Retrieval-Augmented Generation (RAG), which means that in addition to its own knowledge, the model also uses the documents you provide to generate answers. You can find a step-by-step tutorial on how to create a search engine that uses preprocessed documents by this tutorial  
&lt;a href=&#34;https://haystack.deepset.ai/blog/build-a-search-engine-with-gpt-3&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/scaling-nlp-indexing-pipelines-with-keda-and-haystack-part-1/1.png alt=&#34;Indexing and query pipeline for retrieval augmented generation&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;The diagram illustrates the process of passing text files (🔡) through a model (🔮) that generates vectors and stores them in a vector database, such as  
&lt;a href=&#34;https://opensearch.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Opensearch&lt;/a&gt;,  
&lt;a href=&#34;https://www.pinecone.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Pinecone&lt;/a&gt;, or  
&lt;a href=&#34;https://weaviate.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Weaviate&lt;/a&gt;. Later, these documents are fed to the model and the resulting output is provided to the user.&lt;/p&gt;
&lt;p&gt;Now, let’s discuss how to efficiently generate these documents and input them into a vector database.&lt;/p&gt;
&lt;h2 id=&#34;how-to-scale-indexing-applications&#34;&gt;How to scale indexing applications&lt;/h2&gt;
&lt;p&gt;For a proof of concept, engineers typically create embeddings on virtual machines with GPUs and add them to vector databases as a one-time job. Once in production, we will need to run these jobs  &lt;strong&gt;periodically&lt;/strong&gt;  or  &lt;strong&gt;on demand&lt;/strong&gt;  whenever a new file needs to be added to the model’s knowledge base. There are two major factors that we want to optimize:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;throughput&lt;/strong&gt;  — What is the maximum number of files that we can index within a given time period?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;latency&lt;/strong&gt;  — How long does it take for a file to become available for retrieval after it has been added?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We will use stateless indexing consumers written in Python and learn how to use  
&lt;a href=&#34;https://keda.sh/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;KEDA&lt;/a&gt;  for autoscaling on  
&lt;a href=&#34;https://kubernetes.io/de/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Kubernetes&lt;/a&gt;. This will enable horizontal scaling for throughput and on-demand startup for latency. Let’s dive into designing the architecture!&lt;/p&gt;
&lt;h2 id=&#34;architecture&#34;&gt;Architecture&lt;/h2&gt;
&lt;p&gt;The main idea behind the architecture we are going to explore is to queue up indexing requests and spin up consumers that process these tasks in batches. Each consumer receives file-by-file messages and creates documents. We use the following concepts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Batching&lt;/strong&gt;  — to improve throughput and optimize GPU utilization.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Decoupling of queueing and processing&lt;/strong&gt;  — to enable stable and scalable long-running processes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Autoscaling&lt;/strong&gt;  — improves throughput by scaling based on the number of pending files.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/scaling-nlp-indexing-pipelines-with-keda-and-haystack-part-1/2.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;There are various technologies out there that we can use to implement this. We will use:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://aws.amazon.com/de/s3/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;AWS s3&lt;/strong&gt;&lt;/a&gt;  (a simple storage service) — for storing the raw files that should be used for search&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://aws.amazon.com/de/sqs/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;AWS SQS&lt;/strong&gt;&lt;/a&gt;  (a simple queue service) - for decoupling queueing, processing, and batching&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://kubernetes.io/de/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;Kubernetes&lt;/strong&gt;&lt;/a&gt;  — for running our containerized application&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/pipelines#indexing-pipelines&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;Indexing pipeline&lt;/strong&gt;&lt;/a&gt;  (Haystack) - “&lt;em&gt;Consumer&lt;/em&gt;” application that subscribes to a queue and pulls the files from s3&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://keda.sh/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;KEDA&lt;/strong&gt;&lt;/a&gt; (Kubernetes Event-driven Autoscaling) — for autoscaling our indexing pipelines&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can find all the code snippets and configuration files used throughout this article in  
&lt;a href=&#34;https://github.com/ArzelaAscoIi/haystack-keda-indexing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this repository&lt;/a&gt;. You can use them to recreate this project from scratch.  &lt;strong&gt;Link to repo:&lt;/strong&gt; 
&lt;a href=&#34;https://github.com/ArzelaAscoIi/haystack-keda-indexing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;https://github.com/ArzelaAscoIi/haystack-keda-indexing&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;indexing-consumer&#34;&gt;Indexing Consumer&lt;/h2&gt;
&lt;p&gt;This chapter will guide you through the process of creating an application that continuously pulls upload file notifications from an SQS queue for processing. Upon receiving a new message, the application fetches the corresponding file from S3 and converts it into multiple documents with embeddings.&lt;/p&gt;
&lt;p&gt;These  &lt;em&gt;“consumers”&lt;/em&gt;  are the core service of our architecture. We will deploy them to Kubernetes and scale them based on the queue length with KEDA.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/scaling-nlp-indexing-pipelines-with-keda-and-haystack-part-1/3.png alt=&#34;Single indexing consumer with AWS infrastructure&#34;  /&gt;
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If you are only interested in scaling these applications using KEDA and do not need to know the details of this service, you can use the Docker image  
&lt;a href=&#34;https://hub.docker.com/repository/docker/arzelaascoli/keda-haystack-consumer/general&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;arzelaascoli/keda-haystack-consumer:latest&lt;/a&gt;  in the next section and skip writing the Python code.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We will start by creating a code snipped that receives a list of local, already downloaded, files and converts them into documents.&lt;/p&gt;
&lt;h2 id=&#34;indexing-pipeline&#34;&gt;Indexing Pipeline&lt;/h2&gt;
&lt;p&gt;Haystack provides a simple way to create pipelines using YAML files. In this example, we will use a  
&lt;a href=&#34;https://github.com/deepset-ai/templates/blob/69519af7178095d53cb5e879c8ac696d77c96aed/pipelines/GenerativeQuestionAnswering_gpt.yaml#L6&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;standard template&lt;/a&gt;, but we will simplify it slightly to suit our needs.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# pipeline yaml from Haystack: https://github.com/deepset-ai/templates/blob/69519af7178095d53cb5e879c8ac696d77c96aed/pipelines/GenerativeQuestionAnswering_gpt.yaml#L6  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;version&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;1.15.1&#34;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;components&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Uncomment if you have running an OpenSearch instance  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# - name: DocumentStore  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#   type: OpenSearchDocumentStore  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#   params:  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     user: &lt;your-opensearch-user&gt;  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#     password: &lt;your-opensearch-password&gt;  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Retriever  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;EmbeddingRetriever  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;params&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# document_store: DocumentStore # uncomment to connect the retriever to an OpenSearch instance  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;embedding_model&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;sentence-transformers/multi-qa-mpnet-base-dot-v1  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;model_format&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;sentence_transformers  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;top_k&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;AnswerGen  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;OpenAIAnswerGenerator  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;params&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;model&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;text-davinci-003  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;api_key&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&lt;your-openai-api-key&gt;  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;max_tokens&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;200&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;temperature&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.8&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;frequency_penalty&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.1&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;presence_penalty&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.1&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;top_k&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;FileTypeClassifier  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;FileTypeClassifier  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;TextConverter  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;TextConverter  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;PDFConverter  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;PDFToTextConverter  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Preprocessor  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;PreProcessor  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;params&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;split_by&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;word  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;split_length&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;250&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;split_overlap&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;20&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;language&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;en  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;split_respect_sentence_boundary&lt;/span&gt;: &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;pipelines&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# not required in our case, but can be used to query the document store  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# with Haystack  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;query  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;nodes&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Retriever  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;inputs&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;Query]  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;AnswerGen  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;inputs&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;Retriever]  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;indexing  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;nodes&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;FileTypeClassifier  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;inputs&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;File]  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;TextConverter  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;inputs&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;FileTypeClassifier.output_1]  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;PDFConverter  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;inputs&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;FileTypeClassifier.output_2]  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Preprocessor  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;inputs&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;TextConverter, PDFConverter]  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Retriever  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;inputs&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;Preprocessor]  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Uncomment if you have a running a document store.  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## For this tutorial, we will generate embeddings and documents,  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## but not store them in a document store to keep it generic.  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## You can use any of Haystack&#39;s document stores here.  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Docs: https://docs.haystack.deepset.ai/docs/document_store  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# - name: DocumentStore  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;#   inputs: [Retriever]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This template is split into an  &lt;em&gt;indexing&lt;/em&gt;  and a  &lt;em&gt;query&lt;/em&gt;  pipeline. Let’s go through the steps we run for the indexing pipeline first:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;FileTypeClassifier&lt;/strong&gt; — checks the file type for  &lt;code&gt;.txt&lt;/code&gt;  or  &lt;code&gt;.pdf&lt;/code&gt;  files&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TextConverter&lt;/strong&gt; — extracts the text from  &lt;code&gt;.txt&lt;/code&gt;  files&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PDFConverter&lt;/strong&gt;  — extracts the text from  &lt;code&gt;.pdf&lt;/code&gt;  files&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Preprocessor&lt;/strong&gt;  — splits the texts from files into smaller chunks of size 250 characters&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Retriever&lt;/strong&gt; — computes embeddings for chunked texts by running the model  &lt;code&gt;sentence-transformers/multi-qa-mpnet-base-dot-v1&lt;/code&gt;  from  
&lt;a href=&#34;https://huggingface.co/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DocumentStore&lt;/strong&gt;  (disabled) — This node sends the created documents, including the vectors, to a selected database. For the purposes of this tutorial, we will skip this step. However, for a scalable system, you will need to set up a highly scalable vector database. Some suitable options may include OpenSearch, Weaviate, Qdrant, or others. Haystack supports many different  
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/document_store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;document store clients&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Since we will just focus on indexing files, we will not load and run the query pipeline. Nevertheless, this pipeline just has two steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Retriever&lt;/strong&gt; — Perform a k-nearest neighbors search to find documents that match your prompt.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AnswerGen&lt;/strong&gt;  — OpenAI’s  &lt;code&gt;text-davinci-003&lt;/code&gt;  model receives a prompt, which includes the retrieved text from step 1, and generates an answer in response.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In the next step, we will load the pipeline so that we can run indexing for local files.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# pipeline.py   &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# link to file: https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/pipeline.py  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;get_pipeline&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;yaml_path&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;load_from_yaml&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;yaml_path&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline_name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;indexing&#34;&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## example usage  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# pipeline = get_pipeline(&#34;./pipelines/pipeline.yaml&#34;)  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# documents = pipeline.run(file_paths=[Path(&#34;.your-file.txt&#34;)])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now that we can generate documents for local files, we need to write the “glue code” that connects our indexing pipelines with messages from SQS and files from S3. For testing purposes, we also create a code snipped to queue files for indexing.&lt;/p&gt;
&lt;h2 id=&#34;glue-code-for-indexing-with-sqs-and-s3&#34;&gt;Glue Code for Indexing with SQS and S3&lt;/h2&gt;
&lt;p&gt;Let’s start with creating a class  &lt;code&gt;S3Client&lt;/code&gt;  that can upload and download files to Amazons s3 service. AWS offers a convenient way to communicate with resources through  
&lt;a href=&#34;https://boto3.amazonaws.com/v1/documentation/api/latest/index.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;boto3&lt;/a&gt;. We will use this library and write simple wrappers for its methods.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# aws_service.py  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# link to file: https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/aws_service.py  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;S3Client&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;__init__&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;bucket_name&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;local_dir&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;s3&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;boto3&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;client&lt;/span&gt;(  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;s3&#34;&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;endpoint_url&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AWS_ENDPOINT&lt;/span&gt;,  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# allows using localstack  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;region_name&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AWS_REGION&lt;/span&gt;,  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# allows using localstack  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;bucket_name&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;bucket_name&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;local_dir&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;local_dir&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;upload_file&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;local_path&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        Upload file to s3 with the same name as the file on local filesystem.  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        :param local_path: Path to file on local filesystem  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        &#34;&#34;&#34;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;path&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;basename&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;local_path&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;s3&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;upload_file&lt;/span&gt;(  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;local_path&lt;/span&gt;),  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;bucket_name&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;ExtraArgs&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ACL&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;public-read&#34;&lt;/span&gt;},  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;download_files&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;s3_keys&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;]) -&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;]:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        Download files from s3 to local directory.  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        TODO: Make this async to speed up the upload process  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        TODO: delete files from local directory after processing  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        :param s3_keys: List of keys on s3  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        :returns List of paths to downloaded files  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        &#34;&#34;&#34;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;paths&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;] = []  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;s3_key&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;s3_keys&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;s3&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;get_object&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;Bucket&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;bucket_name&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;Key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;s3_key&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;path&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;basename&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;s3_key&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;local_path&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;path&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;join&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;local_dir&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;paths&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;append&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;local_path&lt;/span&gt;))  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;open&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;local_path&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;wb&#34;&lt;/span&gt;) &lt;span style=&#34;color:#859900&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;f&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#268bd2&#34;&gt;f&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;write&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Body&#34;&lt;/span&gt;].&lt;span style=&#34;color:#268bd2&#34;&gt;read&lt;/span&gt;())  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;paths&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After implementing the file upload and download functionality, we need to enable consumers to fetch pending S3 keys that are queued up for indexing. To do this, we create a  &lt;code&gt;SQSClient&lt;/code&gt; class that can publish and receive messages from the Amazon SQS service. Similar to the  &lt;code&gt;S3Client&lt;/code&gt;,  
&lt;a href=&#34;https://boto3.amazonaws.com/v1/documentation/api/latest/index.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;boto3&lt;/a&gt;  provides some simple methods for this purpose.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;SQSClient&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;__init__&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;queue_name&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;sqs&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;boto3&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;client&lt;/span&gt;(  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;sqs&#34;&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;endpoint_url&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AWS_ENDPOINT&lt;/span&gt;,  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# allows using localstack  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;region_name&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;AWS_REGION&lt;/span&gt;,  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# allows using localstack  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;queue_name&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;queue_name&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;queue&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;sqs&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;get_queue_url&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;QueueName&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;queue_name&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;publish_key&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;key&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        Publish key to SQS queue  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        :param key: s3 Key to publish  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        &#34;&#34;&#34;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;sqs&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;send_message&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;QueueUrl&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;queue&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;QueueUrl&#34;&lt;/span&gt;], &lt;span style=&#34;color:#268bd2&#34;&gt;MessageBody&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;key&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;@contextmanager&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;fetch_keys&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Generator&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;], &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;, &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;]:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        Fetch messages from defined queue  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        :returns List of keys on s3  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;        &#34;&#34;&#34;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# receive up to 10 messages at once from sqs  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;sqs&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;receive_message&lt;/span&gt;(  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;QueueUrl&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;queue&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;QueueUrl&#34;&lt;/span&gt;],  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;MaxNumberOfMessages&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;10&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;WaitTimeSeconds&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# get s3 keys from sqs messages  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;response&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;get&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Messages&#34;&lt;/span&gt;, [])  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;s3_keys&lt;/span&gt; = [&lt;span style=&#34;color:#268bd2&#34;&gt;message&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Body&#34;&lt;/span&gt;] &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;message&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;]  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;try&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# We are using a context manager to ensure the messages  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# are deleted from the queue after handling them was  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# successful.  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;yield&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;s3_keys&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# This will be run after the message was processed  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;message&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;sqs&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;delete_message&lt;/span&gt;(  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#268bd2&#34;&gt;QueueUrl&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;queue&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;QueueUrl&#34;&lt;/span&gt;],  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#268bd2&#34;&gt;ReceiptHandle&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;message&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ReceiptHandle&#34;&lt;/span&gt;],  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                )  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;except&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Exception&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;error&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;raise&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;error&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We will now connect both of these clients to a service that has two methods:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;get_files&lt;/code&gt;  - This method fetches file messages from SQS, downloads the corresponding file from S3, and returns the local path.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;upload_file&lt;/code&gt;  - This method uploads a file from a local path to S3 and publishes the key on SQS to be picked up by the indexing consumers later.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# aws_service.py  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# link to file: https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/aws_service.py  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;AWSService&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;__init__&lt;/span&gt;( &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;queue_name&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;bucket_name&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;local_download_dir&lt;/span&gt;: &lt;span style=&#34;color:#cb4b16&#34;&gt;str&lt;/span&gt; ) -&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;sqs_client&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SQSClient&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;queue_name&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;s3_client&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;S3Client&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;bucket_name&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;local_download_dir&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;get_files&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;] | &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;sqs_client&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;fetch_keys&lt;/span&gt;() &lt;span style=&#34;color:#859900&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;paths&lt;/span&gt; = &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;s3_client&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;download_files&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;messages&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;paths&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;upload_file&lt;/span&gt;(&lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;local_path&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;s3_client&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;upload_file&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;local_path&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#cb4b16&#34;&gt;self&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;sqs_client&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;publish_key&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;local_path&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;set-up-a-local-aws-environment-with-localstack&#34;&gt;Set up a local AWS environment with LocalStack&lt;/h2&gt;
&lt;p&gt;We use  
&lt;a href=&#34;https://localstack.cloud/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;LocalStack&lt;/a&gt;  to simplify the development process and avoid the need to use real AWS infrastructure. To start LocalStack, run docker-compose up in the terminal using the  &lt;code&gt;docker-compose.yaml&lt;/code&gt; file.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# docker-compose.yaml   &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# link to file:https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/docker-compose.yaml  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;version&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;3&#34;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;services&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;localstack&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;image&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;localstack/localstack:1.4.0  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;ports&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;127.0.0.1:4566:4566&#34;&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# LocalStack Gateway  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;127.0.0.1:4510-4559:4510-4559&#34;&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# external services port range  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;environment&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#2aa198&#34;&gt;SERVICES=s3,sqs  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#2aa198&#34;&gt;AWS_ACCESS_KEY_ID=test  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#2aa198&#34;&gt;AWS_SECRET_ACCESS_KEY=test  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;volumes&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# startup script to create a sqs queue and a s3 bucket  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/scripts/sqs_bucket_bootstrap.sh  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#2aa198&#34;&gt;./scripts:/docker-entrypoint-initaws.d/&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# startup script found here&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After startup, we will run  
&lt;a href=&#34;https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/scripts/sqs_bucket_bootstrap.sh&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;a script&lt;/a&gt;  that creates a  &lt;code&gt;test-queue&lt;/code&gt;  and a  &lt;code&gt;test-bucket&lt;/code&gt;  that we can use to test our application. In the next steps, we will create AWS clients to read messages from SQS and download files from s3.&lt;/p&gt;
&lt;h2 id=&#34;running-the-consumer&#34;&gt;Running the consumer&lt;/h2&gt;
&lt;p&gt;Now that we can run pipelines, upload and download files, and set up our local AWS environment, we need to write our main service. On a code level, all we need to implement is a simple loop to continuously pull messages and run indexing.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# consumer.py  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# link to file:https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/consumer.py  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# To learn more about logging in python check out my other article  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# about structolg! https://medium.com/@ArzelaAscoli/writing-professional-python-logs-e1f31635b60b  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;logger&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;get_logger&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;__name__&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Initialize AWS service  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;aws_service&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;AWSService&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;SQS_QUEUE&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;S3_BUCKET&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;LOCAL_DOWNLOAD_DIR&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# load pipeline  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;get_pipeline&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;./pipelines/pipeline.yaml&#34;&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;while&lt;/span&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# fetch files from aws  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;files&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;List&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;] = &lt;span style=&#34;color:#268bd2&#34;&gt;aws_service&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;get_files&lt;/span&gt;()  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;files&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;logger&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;info&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;No files to process&#34;&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;sleep&lt;/span&gt;(&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;continue&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# process files if found  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;logger&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;info&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Found files&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;files&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;files&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# run indexing for downloaded files  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;file_paths&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;files&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;logger&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;info&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Processed files&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We will now create an image of our application using the Haystack GPU base image, which has most of our dependencies pre-installed.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Dockerfile   &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# link to file: https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/Dockerfile  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;FROM deepset/haystack:base-gpu-v1.15.0  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;WORKDIR /home/user  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;COPY . .  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;RUN pip3 install -r requirements.txt  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;CMD [&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;python3&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;consumer.py&#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;By adding the startup configuration to docker-compose, we can not only run LocalStack but also our application in Docker by calling docker-compose up.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# docker-compose.yaml  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# link to file: https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/docker-compose.yaml  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;...&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;consumer&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;build&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;context&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;.  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;dockerfile&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Dockerfile  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;environment&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#2aa198&#34;&gt;AWS_ENDPOINT=http://localstack:4566  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#2aa198&#34;&gt;AWS_REGION=eu-central-1  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#2aa198&#34;&gt;AWS_ACCESS_KEY_ID=test  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      - &lt;span style=&#34;color:#2aa198&#34;&gt;AWS_SECRET_ACCESS_KEY=test&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We are now ready to test our application by uploading files and checking the logs for resulting documents.&lt;/p&gt;
&lt;h2 id=&#34;testing-with-docker-compose&#34;&gt;Testing with Docker Compose&lt;/h2&gt;
&lt;p&gt;To start our Docker Compose stack, we use the command  &lt;code&gt;docker-compose up&lt;/code&gt;. Additionally, we provide a code snippet for file uploading.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# upload.py  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# link to file: &lt;https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/upload.py&gt;  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;aws_service&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;AWSService&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;SQS_QUEUE&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;S3_BUCKET&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;LOCAL_DOWNLOAD_DIR&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;aws_service&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;upload_file&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;./data/test.txt&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If we run python3  &lt;code&gt;upload.py&lt;/code&gt;  upload.py now, we will see the following output in our Docker console.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;haystack-keda-indexing-localstack-1  | 2023-04-22T09:55:44.981  INFO --- [   asgi_gw_1] localstack.request.aws     : AWS s3.GetObject =&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;200&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;haystack-keda-indexing-localstack-1  | 2023-04-22T09:55:45.007  INFO --- [   asgi_gw_0] localstack.request.aws     : AWS sqs.DeleteMessage =&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;200&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;haystack-keda-indexing-consumer-1    | 2023-04-22 09:55:45 [info     ] Found files                    &lt;span style=&#34;color:#268bd2&#34;&gt;files&lt;/span&gt;=[PosixPath(&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;/tmp/test.txt&#39;&lt;/span&gt;)]  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Converting files: 100%|██████████| 1/1 [00:00&lt;00:00, 81.28it/s]  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Preprocessing: 100%|██████████| 1/1 [00:00&lt;00:00, 35.10docs/s]  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Batches: 100%|██████████| 1/1 [00:03&lt;00:00,  3.18s/it]  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;haystack-keda-indexing-consumer-1    | 2023-04-22 09:55:48 [info     ] Processed files                &lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;documents&#39;&lt;/span&gt;: [&lt;Document: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;content&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;this is text&#39;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;content_type&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;text&#39;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;score&#39;&lt;/span&gt;: None, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;meta&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;_split_id&#39;&lt;/span&gt;: 0}, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;id_hash_keys&#39;&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;content&#39;&lt;/span&gt;], &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;embedding&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;&lt;embedding of shape (768,)&gt;&#39;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;id&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;46ec22c7eafaea7c43eef7d996fd04ce&#39;&lt;/span&gt;}&gt;], &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;file_paths&#39;&lt;/span&gt;: [PosixPath(&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;/tmp/test.txt&#39;&lt;/span&gt;)], &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;root_node&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;File&#39;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;params&#39;&lt;/span&gt;: {}, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;node_id&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Retriever&#39;&lt;/span&gt;}  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;haystack-keda-indexing-localstack-1  | 2023-04-22T09:55:48.327  INFO --- [   asgi_gw_0] localstack.request.aws     : AWS sqs.GetQueueUrl =&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;200&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We have a stateless service that processes indexing requests asynchronously and can operate in Docker Compose. We can now deploy this service to Kubernetes, configure KEDA, and scale it.&lt;/p&gt;
&lt;h2 id=&#34;conclusion-and-next-steps&#34;&gt;Conclusion and next steps&lt;/h2&gt;
&lt;p&gt;This article explains how to create a stateless consumer application that turns files into documents. We learned that we can use this application with different models, preprocessors, and file types which are defined in a single configuration file using Haystack. In order to ship this application to a production-ready deployment, we will deploy this service on Kubernetes and define custom autoscaling in  
&lt;a href=&#34;scaling-nlp-indexing-pipelines-with-keda-and-haystack-part-2&#34;&gt;the next chapter&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Scaling NLP indexing pipelines with KEDA and Haystack — Part 2: The Deployment</title>
      <link>https://haystack.deepset.ai/blog/scaling-nlp-indexing-pipelines-with-keda-and-haystack-part-2/</link>
      <pubDate>Mon, 01 May 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/scaling-nlp-indexing-pipelines-with-keda-and-haystack-part-2/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;In the first part of this article series, we discussed the power of retrieval-augmented generation. We also explored how to create a Python application that converts files into searchable documents with embeddings via  
&lt;a href=&#34;https://haystack.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt;  pipelines. However, merely having a Python program that converts files into text snippets and embeddings on a single machine is not enough for a production-ready deployment.&lt;/p&gt;
&lt;p&gt;In this part, we will explore how to  &lt;strong&gt;deploy an indexing consumer to Kubernetes&lt;/strong&gt;  and  &lt;strong&gt;how to autoscale it using KEDA&lt;/strong&gt;. This will allow us to efficiently add text and embeddings to our vector database that can power a retrieval augmented LLM search engine  
&lt;a href=&#34;https://haystack.deepset.ai/blog/build-a-search-engine-with-gpt-3&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;like this&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We will use the  
&lt;a href=&#34;scaling-nlp-indexing-pipelines-with-keda-and-haystack-part-1&#34;&gt;architecture described in the first part&lt;/a&gt;, which involves queuing files to be indexed on AWS SQS and consuming them in parallel using Haystack pipelines. Let’s start deploying and scaling!&lt;/p&gt;
&lt;h2 id=&#34;deployment-on-kubernetes-and-scaling-with-keda&#34;&gt;Deployment on Kubernetes and scaling with KEDA&lt;/h2&gt;
&lt;p&gt;In this section, we will learn how to set up KEDA on Kubernetes and configure autoscaling to scale our consumers based on the number of pending files in Kubernetes. We will use the following tools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://k3d.io/v5.4.9/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;k3d&lt;/a&gt;  for creating a local Kubernetes cluster to deploy our consumers&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://keda.sh/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;KEDA&lt;/a&gt;  for scaling the consumers after deployment&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://localstack.cloud/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;localstack&lt;/a&gt;  as a local AWS cloud stack, for testing our application&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To communicate with our local Kubernetes cluster, we will use  
&lt;a href=&#34;https://kubernetes.io/de/docs/reference/kubectl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;kubectl&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;⚠️ For certain types of indexing pipelines, GPUs are required. This is particularly true when embeddings are generated on the same machine (rather than through an external inference service) or when a model is used within the pipeline. In such cases, it is necessary to have GPU pods within the Kubernetes cluster to run the model.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;installation-and-setup&#34;&gt;Installation and setup&lt;/h2&gt;
&lt;p&gt;We will need to set up a local Kubernetes cluster and deploy a list of services before we can start deploying and scaling our application.&lt;/p&gt;
&lt;h3 id=&#34;create-a-local-kubernetes-cluster&#34;&gt;Create a local Kubernetes cluster&lt;/h3&gt;
&lt;p&gt;To begin, create a new Kubernetes cluster named  &lt;code&gt;haystack-keda-cluster&lt;/code&gt;  using k3d.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k3d cluster create haystack-keda-cluster  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# check the the status via: kubectl cluster-info&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, we will create a namespace called  &lt;code&gt;indexing&lt;/code&gt;  that we will use to deploy our services.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl create namespace indexing
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;install-services--keda-and-localstack&#34;&gt;Install services — KEDA and LocalStack&lt;/h3&gt;
&lt;p&gt;To set up LocalStack, add the helm chart and install LocalStack in the  &lt;code&gt;indexing&lt;/code&gt;  namespace:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;helm repo add localstack https://localstack.github.io/helm-charts  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;helm install localstack localstack/localstack --namespace indexing
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We will repeat the same steps with KEDA.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;helm repo add kedacore https://kedacore.github.io/charts  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;helm install keda kedacore/keda --namespace indexing
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can validate the setup by running  &lt;code&gt;indexing kubectl get pods -n indexing&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/scaling-nlp-indexing-pipelines-with-keda-and-haystack-part-2/1.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;create-an-sqs-queue-and-an-s3-bucket&#34;&gt;Create an SQS queue and an S3 bucket&lt;/h2&gt;
&lt;p&gt;Our indexing consumers will connect to queues on LocalStack and download files from S3. Therefore, we need to create the necessary resources before deploying our application.&lt;/p&gt;
&lt;p&gt;To create a queue and a bucket, we will use  
&lt;a href=&#34;https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/scripts/sqs_bucket_bootstrap.sh&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;the same shell script&lt;/a&gt;  that we used in our development environment with Docker Compose. To run the script from within the container, pipe the script into the  &lt;code&gt;kubectl exec&lt;/code&gt;  command:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cat ./scripts/sqs_bucket_bootstrap.sh | kubectl &lt;span style=&#34;color:#cb4b16&#34;&gt;exec&lt;/span&gt; -i -n indexing deployment/localstack -- /bin/bash
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If we fetch the logs via  &lt;code&gt;kubectl logs -f deployment/localstack -c localstack -n indexing&lt;/code&gt;, we should see that a queue and a bucket were created.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;2023-04-22T15:19:34.166  INFO --- [   asgi_gw_1] localstack.request.aws     : AWS sqs.CreateQueue =&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;200&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;2023-04-22T15:19:34.533  INFO --- [   asgi_gw_0] localstack.request.aws     : AWS s3.CreateBucket =&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;200&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;deploying-the-indexing-consumer&#34;&gt;Deploying the indexing consumer&lt;/h2&gt;
&lt;p&gt;Now that we have LocalStack and KEDA deployed to our Kubernetes cluster, we can start deploying our indexing consumer. The indexing consumers are deployed as Kubernetes deployments by using a deployment file  &lt;code&gt;deployment-consumer.yaml&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# deployment-consumer.yaml  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# link to file: &lt;https://github.com/ArzelaAscoIi/haystack-keda-indexing/tree/main/kubernetes&gt;  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;kind&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Deployment  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;apps/v1  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;indexing-consumer  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;labels&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;k8s-app&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;indexing-consumer  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;spec&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# we want to start with 0 replicas and scale up on demand  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;replicas&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;selector&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;matchLabels&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;k8s-app&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;indexing-consumer  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;template&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;indexing-consumer  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;labels&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;k8s-app&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;indexing-consumer  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;spec&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;containers&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;indexing-consumer  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;command&lt;/span&gt;: [&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;python3&#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;consumer.py&#34;&lt;/span&gt;]  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# public consumer image  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;image&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;arzelaascoli/keda-haystack-consumer:latest  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;env&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# localstack configuration  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;AWS_ENDPOINT  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;value&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;http://localstack:4566  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;AWS_REGION  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;value&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;eu-central-1  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;AWS_ACCESS_KEY_ID  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;value&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;test  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;AWS_SECRET_ACCESS_KEY  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;value&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;test  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Resource estimations  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# TODO: adjust these to our needs and the load we expect  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;resources&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;requests&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;memory&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;1000Mi  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;cpu&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;750m  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;limits&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;memory&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;2500Mi  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;cpu&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;2000m&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can apply this YAML to our namespace indexing with kubectl:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f ./kubernetes/deployment-consumer.yaml --namespace indexing
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To validate that consumers can successfully start and connect to the queue, we can scale up the replicas to 1, and check the running pods.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# scale deployment   &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl scale deployment indexing-consumer --namespace=indexing --replicas=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# get pods   &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl get pods -n indexing  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# check logs   &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl logs -f deployment/indexing-consumer -c indexing-consumer -n indexing
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/scaling-nlp-indexing-pipelines-with-keda-and-haystack-part-2/2.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;The system will log that no files were found to be processed:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│ 2023-04-23 15:43:14 [info     ] No files to process                                                                                                       │  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│ 2023-04-23 15:43:19 [info     ] No files to process                                                                                                       │  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│ 2023-04-23 15:43:24 [info     ] No files to process                                                                                                       │  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;│ 2023-04-23 15:43:29 [info     ] No files to process
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, we will set up autoscaling based on the length of the SQS queue to enable autoscaling and scaling to zero if no files are pending.&lt;/p&gt;
&lt;h2 id=&#34;configure-autoscaling-based-on-queue-length&#34;&gt;Configure autoscaling based on queue length&lt;/h2&gt;
&lt;p&gt;After successfully creating all the required services to index files, we can now configure KEDA to check the queue length and scale the indexing consumers accordingly.&lt;/p&gt;
&lt;p&gt;To set up a KEDA trigger on SQS, we need to configure authentication by creating a Kubernetes secrets object.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# secrets-localstack.yaml  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# link to file: &lt;https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/kubernetes/keda/secrets-localstack.yaml&gt;  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;v1  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;kind&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Secret  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;aws-secrets  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;namespace&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;indexing  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;data&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;AWS_ACCESS_KEY_ID&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;dGVzdA==&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# base64 encoded string for &#34;test&#34;  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;dGVzdA==&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# base64 encoded string for &#34;test&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This secret is then mapped via a TriggerAuthentication object to KEDA, which will use  
&lt;a href=&#34;https://keda.sh/docs/2.10/scalers/aws-sqs/#authentication-parameters&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;credential based authentication.&lt;/a&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# trigger-authentication.yaml  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# link to file: &lt;https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/kubernetes/keda/trigger-authentication.yaml&gt;  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;keda.sh/v1alpha1  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;kind&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;TriggerAuthentication  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;keda-trigger-auth-aws-credentials  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;namespace&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;indexing  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;spec&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;secretTargetRef&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;parameter&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;awsAccessKeyID&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Required.  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;aws-secrets&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Required.  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;key&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;AWS_ACCESS_KEY_ID&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Required.  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;parameter&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;awsSecretAccessKey&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Required.  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;aws-secrets&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Required.  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;key&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Required.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The operator is now allowed to access LocalStacks resources, and we can create a scaled object with a &lt;code&gt;aws-sqs-queue&lt;/code&gt;  
&lt;a href=&#34;https://keda.sh/docs/2.1/scalers/aws-sqs/#trigger-specification&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;trigger&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;keda.sh/v1alpha1  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;kind&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;ScaledObject  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;indexing-consumer-scaled-object  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;spec&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;scaleTargetRef&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;apps/v1  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;kind&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;Deployment  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;indexing-consumer&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Mandatory. Must be in the same namespace as the ScaledObject  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;pollingInterval&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;30&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;cooldownPeriod&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;300&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;idleReplicaCount&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;minReplicaCount&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;maxReplicaCount&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;fallback&lt;/span&gt;: &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# defines a number of replicas to fallback to if a scaler is in an error state.  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;failureThreshold&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;replicas&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;4&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;triggers&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    - &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;type&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;aws-sqs-queue  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;authenticationRef&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;name&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;keda-trigger-auth-aws-credentials  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# KEDA will use the value of the environment variable of the `consumer-file-ingestion` containers  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;queueURL&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;http://localhost:4566/000000000000/test-queue  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;queueLength&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;10&#34;&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Should roughly equal the number of messages that can be processed in 1 minute  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;awsRegion&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;eu-central-1&#34;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;awsEndpoint&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;http://localstack:4566&#34;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2;font-weight:bold&#34;&gt;scaleOnInFlight&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;false&#34;&lt;/span&gt; &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Exclude in-flight messages from the queue length calculation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After applying these three YAMLs via  &lt;code&gt;kubectl apply --f ./kubernetes/keda --namespace indexing&lt;/code&gt;, we can forward the port to allow uploading files to LocalStack.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; kubectl port-forward deployment/localstack 4566:4566 -n indexing
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can use an  
&lt;a href=&#34;https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/upload.py&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;upload script&lt;/a&gt;  to add a file  &lt;code&gt;test.txt&lt;/code&gt;  by running  &lt;code&gt;python3 upload.py&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# upload.py   &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# link to file: &lt;https://github.com/ArzelaAscoIi/haystack-keda-indexing/blob/main/upload.py&gt;  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;aws_service&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;AWSService&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;SQS_QUEUE&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;S3_BUCKET&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;LOCAL_DOWNLOAD_DIR&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;aws_service&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;upload_file&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;Path&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;./data/test.txt&#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Once the file is successfully uploaded and queued, KEDA will take care of scaling the deployment from 0 to 1 replica. Kubernetes will list an indexing-consumer pod.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;NAME                                               READY   STATUS                
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;localstack-8fc647d9d-xkrsk                         1/1     Running               
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;keda-operator-metrics-apiserver-7bcfdd7c9b-7pbkp   1/1     Running              
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;keda-operator-6857fbc758-xtc44                     1/1     Running               
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;keda-admission-webhooks-59978445df-q85jr           1/1     Running              
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;indexing-consumer-656d98db6f-psz6q                 0/1     ContainerCreating 
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After startup, the files will be fetched and removed from the queue, and indexed.&lt;/p&gt;
&lt;h2 id=&#34;conclusion-and-next-steps&#34;&gt;Conclusion and next steps&lt;/h2&gt;
&lt;p&gt;This article explains how to create a scalable application to convert text and PDF files into documents containing text and embeddings. KEDA allows for the on-demand scaling of each application. With this simple architecture, we are able to horizontally scale the creation of embeddings.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How was KEDA useful?&lt;/strong&gt;  — KEDA enabled us to scale consumers based on queue length. An alternative solution involves using horizontal pod autoscaling based on CPU usage, which would be triggered once elements are fetched from the queue. However, this approach does not allow for scaling down to zero. Since these tasks require GPUs, one idle machine that is constantly running can be expensive.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How do I deploy this without k3d?&lt;/strong&gt;  — This tutorial is applicable to any Kubernetes cluster. Simply follow the instructions provided.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What resources do I need?&lt;/strong&gt;  — When running this in production, GPU nodes are necessary for the cluster, which may require additional configuration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Is there a simple way to deploy multiple pipelines?&lt;/strong&gt;  — In one of the next articles, I will share an article on how to use the Kubernetes Operator Framework (
&lt;a href=&#34;https://kopf.readthedocs.io/en/stable/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Kopf&lt;/a&gt;), written in Python, to dynamically create these resources.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>New DocumentStore Integration for Haystack: QdrantDocumentStore</title>
      <link>https://haystack.deepset.ai/blog/qdrant-integration/</link>
      <pubDate>Tue, 18 Apr 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/qdrant-integration/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;We’re thrilled to introduce the &lt;code&gt;QdrantDocumentStore&lt;/code&gt;, the latest addition to the Haystack DocumentStore family by Qdrant. It’s an open source package that provides powerful capabilities such as efficient search, high-dimensional vector retrieval, and flexible launch options.&lt;/p&gt;
&lt;p&gt;Let’s learn about DocumentStores and how to use &lt;code&gt;QdrantDocumentStore&lt;/code&gt; with your pipelines!&lt;/p&gt;
&lt;h2 id=&#34;haystack-documentstores&#34;&gt;Haystack DocumentStores&lt;/h2&gt;
&lt;p&gt;Haystack is an end-to-end NLP framework that provides a modular approach to building state-of-the-art generative AI, QA, and semantic knowledge base search systems. A core component of most modern NLP systems is a database that can efficiently store and retrieve vast amounts of text data. Vector databases are a great way of doing this. These databases can store vector representations of text while also implementing efficient ways to retrieve them at speed. To this end, Haystack provides a set of native 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/document_store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;DocumentStores&lt;/a&gt; that allow you to connect to and query your data that might be in Elasticsearch, Pinecone, Weaviate, or others. This way, you can store and maintain your data within one of these databases while simultaneously using them within your Haystack pipelines and applications.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog//images/concepts_haystack_handdrawn.png alt=&#34;image&#34;  /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;new-documentstore-by-qdrant&#34;&gt;New DocumentStore by Qdrant&lt;/h2&gt;
&lt;p&gt;The latest addition to the Haystack DocumentStore family is the &lt;strong&gt;&lt;code&gt;QdrantDocumentStore&lt;/code&gt;&lt;/strong&gt;. This new DocumentStore is a first of a kind by being an external package maintained by the Qdrant team. 
&lt;a href=&#34;https://qdrant.tech/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Qdrant&lt;/a&gt; is a vector similarity search engine that provides a production-ready service with a convenient API for storing, searching, and managing vectors with an additional payload. Qdrant is optimized for high-dimensional vector search and supports various similarity metrics, making it a powerful tool for various NLP applications.&lt;/p&gt;
&lt;h3 id=&#34;advantages-of-qdrantdocumentstore&#34;&gt;Advantages of QdrantDocumentStore&lt;/h3&gt;
&lt;p&gt;Some of the advantages of using Qdrant with Haystack include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Efficient search and retrieval of high-dimensional vectors&lt;/li&gt;
&lt;li&gt;mmap support for HNSW graph storage&lt;/li&gt;
&lt;li&gt;Filtering capabilities integrated directly into the vector search phase&lt;/li&gt;
&lt;li&gt;Constraints based on metadata attributes&lt;/li&gt;
&lt;li&gt;Reducing the memory footprint up to 4 times by using 
&lt;a href=&#34;https://qdrant.tech/articles/scalar-quantization/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Scalar Quantization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Storing even a 
&lt;a href=&#34;https://qdrant.tech/articles/memory-consumption/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;million vectors with a small amount of RAM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Open source! 😍&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Qdrant provides several options to launch an instance. A local in-memory/disk-persisted mode is a good choice for test scenarios and quick experiments in Colab or Jupyter Notebook, where you plan to store only a few vectors. When you need to scale, you can easily switch to on-premise server deployment using Docker containers. If you don’t want to manage the infrastructure at all, there is also a fully managed 
&lt;a href=&#34;https://cloud.qdrant.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Qdrant Cloud&lt;/a&gt; offering with a free forever 1GB cluster. All is possible when using Haystack! 💙&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/qdrant-integration/try-develop-deploy.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Haystack is a comprehensive framework that enables developers to construct complete NLP pipelines, and Qdrant is a powerful vector similarity search engine. To ensure seamless collaboration between the two tools and create robust NLP applications, &lt;code&gt;QdrantDocumentStore&lt;/code&gt; is a separate package maintained by Qdrant, specifically designed to integrate with Haystack. By having its own release cycle, &lt;code&gt;QdrantDocumentStore&lt;/code&gt; can be updated independently from Haystack, resulting in a more streamlined and efficient workflow.&lt;/p&gt;
&lt;h2 id=&#34;how-to-use-qdrantdocumentstore&#34;&gt;How to Use QdrantDocumentStore&lt;/h2&gt;
&lt;p&gt;If you want to use the &lt;code&gt;QdrantDocumentStore&lt;/code&gt; with Haystack, you can find the source code in the 
&lt;a href=&#34;https://github.com/qdrant/qdrant-haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;qdrant-haystack&lt;/a&gt; repository on GitHub. With this integration, you can take advantage of the robust search capabilities of Qdrant with the natural language processing features of Haystack, creating a powerful tool for semantic search and retrieval.&lt;/p&gt;
&lt;p&gt;To use the &lt;code&gt;QdrantDocumentStore&lt;/code&gt; with Haystack, you can install &lt;code&gt;qdrant-haystack&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pip install qdrant-haystack
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then, initialize &lt;code&gt;QdrantDocumentStore&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;qdrant_haystack.document_stores&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;QdrantDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;QdrantDocumentStore&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;:memory:&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;index&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Document&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_dim&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;512&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;recreate_index&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;hnsw_config&lt;/span&gt;={&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;m&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;16&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ef_construct&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;64&lt;/span&gt;}  &lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Optional&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you’d like to enable the Scalar Quantization for your embeddings, you can just pass the proper configuration in the same way as you would do with raw QdrantClient:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;qdrant_haystack.document_stores&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;QdrantDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;qdrant_client.http&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;models&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;QdrantDocumentStore&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;http://localhost:6333&#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;index&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Document&#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_dim&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;512&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;recreate_index&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;quantization_config&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;models&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;ScalarQuantization&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#268bd2&#34;&gt;scalar&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;models&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;ScalarQuantizationConfig&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#cb4b16&#34;&gt;type&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;models&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;ScalarType&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;INT8&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;quantile&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0.99&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#268bd2&#34;&gt;always_ram&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            ),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     ),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;QdrantDocumentStore&lt;/code&gt; supports 
&lt;a href=&#34;https://qdrant.tech/documentation/collections/#create-collection&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;all the configuration properties&lt;/a&gt; available in the Qdrant Python client.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Once you have initialized the document store, you can write your documents using the &lt;code&gt;write_documents&lt;/code&gt; method. This will allow you to seamlessly integrate your vector database with Haystack and use its functionalities.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;write_documents&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That’s it! For more examples and documentation on using Qdrant with Haystack, refer to the 
&lt;a href=&#34;https://github.com/qdrant/qdrant-haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;qdrant-haystack&lt;/a&gt; GitHub repository, and if you have any questions, join Haystack community on 
&lt;a href=&#34;https://discord.com/invite/xYvH6drSmA&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Discord&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Writing Professional Python Logs</title>
      <link>https://haystack.deepset.ai/blog/writing-professional-python-logs/</link>
      <pubDate>Thu, 13 Apr 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/writing-professional-python-logs/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;If you are a  &lt;strong&gt;Python developer&lt;/strong&gt;  who has been  &lt;strong&gt;struggling with inconsistent&lt;/strong&gt;  &lt;strong&gt;and unhelpful logs&lt;/strong&gt;, then this article is for you! In this post, we will discuss how structlog can help you write logs that not only assist you in your daily work but also enforce certain best practices to ensure consistency in logging throughout your project.&lt;/p&gt;
&lt;p&gt;We will also explore common logging challenges and how structlog can help overcome them. So whether you are a beginner or an experienced developer, keep reading to learn how structlog can revolutionize your logging process!&lt;/p&gt;
&lt;p&gt;In the past, I often neglected logs and only wrote a few when they helped me in my development environment. These logs were useful for finding bugs or validating that the code works as intended, but they were only helpful for local debugging.&lt;/p&gt;
&lt;p&gt;After learning about structlog, I now know how to write logs that not only assist me in my daily work, but also enforce certain best practices that ensure consistency in logging throughout the project.&lt;/p&gt;
&lt;p&gt;This makes parsing logs for your shipped application easier and improves your monitoring in tools like  
&lt;a href=&#34;https://www.datadoghq.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Datadog&lt;/a&gt;  or  
&lt;a href=&#34;https://grafana.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Grafana&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In this article, I will share some of my experiences with  
&lt;a href=&#34;https://www.structlog.org/en/stable/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;structlog&lt;/a&gt;  and demonstrate the power of this library by walking you through an example scenario. Let’s assume that a user wants to upload files and we have already created an endpoint that accepts these requests. In the past, I might have written logs like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;logging&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;logging&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;debug&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Start uploading file with name &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;. Number of retries: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt;, &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;logging&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;error&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Failed uploading file with name &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;. Current retry: &lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;.&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retries&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;... 
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Basically, there is nothing wrong with this code snippet, which prints the following logs.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/writing-professional-python-logs/1.png alt=&#34;Native python logger&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;These two log lines describe what is happening during a request. However, we could improve the notation of the retries, add more context and make the log lines more readable.&lt;/p&gt;
&lt;h2 id=&#34;classic-logging-challenges&#34;&gt;Classic logging challenges&lt;/h2&gt;
&lt;p&gt;Before we jump into the hands on part, let’s first understand some common logging challenges based on the example above.&lt;/p&gt;
&lt;h3 id=&#34;1-missing-contextual-information&#34;&gt;1. Missing contextual information&lt;/h3&gt;
&lt;p&gt;The log line precisely describes that a  &lt;code&gt;my_file&lt;/code&gt;  failed to upload. This information might be sufficient for a development environment where there is only one user and one client sending requests, the data is stored on the local file system, and all requests are executed sequentially.&lt;/p&gt;
&lt;p&gt;However, when encountering this kind of log in a production system, typical questions that might arise are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Who sent this request?&lt;/li&gt;
&lt;li&gt;Which organization does the user belong to?&lt;/li&gt;
&lt;li&gt;Which file store did we connect to? Is there a session identifier or something else that helps us trace down the error?&lt;/li&gt;
&lt;li&gt;Is this log line connected to a request identifier?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To obtain this information for log lines, we would need to add this extra information to each log line, and in the worst case, fetch the same value from a database multiple times to add this informational context throughout the codebase.&lt;/p&gt;
&lt;h3 id=&#34;2-transition-from-development-only-to-machine-readable-logs&#34;&gt;2. Transition from development-only to machine readable logs&lt;/h3&gt;
&lt;p&gt;Depending on the environment, you may want to have log lines that are prettified to allow humans to read them easily, but also provide them in a machine-readable format in a production system. Within our production deployment, we might want to count the total number of retries or filter for a failed file name. In this scenario, we would need to run substring matches. It would be much easier to write queries that let you filter, maybe even in an SQL-like manner for events.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;select file_name from logs where retry_count &gt; 1
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id=&#34;3-inconsistent-wording&#34;&gt;3. Inconsistent wording&lt;/h3&gt;
&lt;p&gt;It is common for log lines within the same Python module to look completely different from one another. Sometimes, important information is even missing. Depending on the author, log lines can look very diverse. Entities can be referred to by different names, and the values of variables may be added to the log with or without quotes, or their wording may be changed throughout the module.&lt;/p&gt;
&lt;p&gt;This makes searching for specific events difficult and time-consuming.&lt;/p&gt;
&lt;h2 id=&#34;structlog&#34;&gt;Structlog&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;
&lt;a href=&#34;https://www.structlog.org/en/stable/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Structlog&lt;/a&gt;  offers a lot of nice features that help you write logs in a faster and less painful way. It helps you to add contextual data to your logger and offers a modular configuration for parsing your log lines in a machine readable and developer friendly way.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In our case, this means that we get beautiful colored log lines within our development environment and parsed JSONs for our monitoring, add file names as contextual data and use some more convenience features. It is important to mention that although the native logger can be set up to log as JSON, structlog provides this capability out of the box.&lt;/p&gt;
&lt;h3 id=&#34;first-step--replace-pythons-native-logger-with-structlog&#34;&gt;First step — Replace python’s native logger with structlog&lt;/h3&gt;
&lt;p&gt;Let’s use structlog for the same code snippet as above. We need to obtain the logger and add the variables to our log messages. A code snippet could look like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;logger&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;get_logger&lt;/span&gt;()  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;logger&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;debug&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Start uploading file.&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt;,&lt;span style=&#34;color:#268bd2&#34;&gt;retries&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;logger&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;error&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Failed uploading file.&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt;,&lt;span style=&#34;color:#268bd2&#34;&gt;retries&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;retries&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The first thing we might notice is that the variables are not longer loaded to the string but rather  &lt;code&gt;kwargs&lt;/code&gt;  of the  &lt;code&gt;debug&lt;/code&gt;  and  &lt;code&gt;error&lt;/code&gt;  methods. This not only adds the values of these variables to the logger but also binds the key names to the log message. The resulting log lines will look like this:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/writing-professional-python-logs/2.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;If we compare this to the solution we had before, we can see that the file name and retries are added to the log lines by mapping key values instead of inserting the values as a raw string.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/writing-professional-python-logs/1.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;h3 id=&#34;second-step--bind-contextual-data-to-loggers&#34;&gt;Second step — Bind contextual data to loggers&lt;/h3&gt;
&lt;p&gt;Generally there are two ways of adding contextual data to loggers with structlog.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Explicitly binding variables&lt;/strong&gt;, by calling  &lt;code&gt;.debug(…)&lt;/code&gt;  . We will update the previous code snipped by first binding the  &lt;code&gt;file_name&lt;/code&gt;  to the logger and using the logger instance to print log messages.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;logger&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;get_logger&lt;/span&gt;()  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;log&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;logger&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;bind&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;log&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;debug&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Start uploading file.&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retries&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;log&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;error&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Failed uploading file.&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retries&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;retries&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This allows us to remove the file name from all the log lines and move it to the top where it is just called once. Structlog will take care of adding it to each log line.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Implicitly binding context variables&lt;/strong&gt;, by calling structlog.contextvars.bind_contextvars(…) — This function uses  
&lt;a href=&#34;https://docs.python.org/3/library/contextvars.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;context variables&lt;/a&gt;  that are used whenever the logger within the same thread prints log messages. A code snippet that prints the same log messages using context variables might look like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# /controller/file.py  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;logger&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;get_logger&lt;/span&gt;()  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;contextvars&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;bind_contextvars&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;file_name&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;logger&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;debug&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Start uploading file.&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retries&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;logger&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;error&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Failed uploading file.&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retries&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;retries&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;However, our scenario is  &lt;strong&gt;not the preferred use case&lt;/strong&gt;  for using context variables to add values to your logger. As we learned in the logging challenges, contextual data is often missing in the classic approach. Before processing the file, we assume that an authentication and authorization mechanism is in place to validate whether a user has access to the endpoint. In a framework like  
&lt;a href=&#34;https://fastapi.tiangolo.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FastAPI&lt;/a&gt;, this workload is handled in middleware. Since we have already fetched the user ID here to authenticate the request, let’s add this key-value pair to the logger:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# /auth/middleware.py   &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;verify_user&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;request&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Request&lt;/span&gt;) -&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;bool&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;validate&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;request&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;token&lt;/span&gt;):  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;user_id&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;get_user_id&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;request&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;token&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;contextvars&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;bind_contextvars&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;user_id&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;user_id&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This way  
&lt;a href=&#34;https://www.structlog.org/en/stable/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;structlog&lt;/a&gt;  will take care of adding this contextual information for us and we don’t need to worry about adding the user id manually.&lt;/p&gt;
&lt;h2 id=&#34;third-step--configuration-for-machine-readable-logs&#34;&gt;Third Step — Configuration for machine readable logs&lt;/h2&gt;
&lt;p&gt;After learning how to attach information to log lines using structlog, we now want to configure it to send prettified log lines in our local development environment setup and parsed JSONs for our logging service. We will follow the first and simplest solution from the structlog docs. Before we proceed, we need to understand two concepts from structlog: Preprocessors and Renderers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Preprocessors&lt;/strong&gt;  are used to extend the content of log messages. This can be done by adding timestamps, modifying fields to obfuscate user names, or dropping log lines based on defined conditions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Renderers&lt;/strong&gt;  are used to control the representation of log lines. Depending on the use case, they can display beautified and colorful log lines, or machine-readable JSONs.&lt;/p&gt;
&lt;p&gt;The ability to write custom implementations of these nodes provides a lot of extensibility.&lt;/p&gt;
&lt;p&gt;We will use default structlog preprocessors to define the content of our log lines. The following diagram illustrates how to handle log lines that use a structlog logger. After the output step, we add a renderer based on the environment that formats the log lines as human or machine readable output.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/writing-professional-python-logs/3.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Logging architecture from  
&lt;a href=&#34;https://www.structlog.org/en/stable/standard-library.html#rendering-within-structlog&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;structlog docs&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We begin by configuring a set of processors that add timestamps, set the log level, and merge the context variables used in step 2.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;structlog.dev&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ConsoleRenderer&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;structlog.processors&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;JSONRenderer&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Timestamper preprocessor that to add unified timestamps to each log  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;timestamper&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;processors&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;TimeStamper&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;fmt&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;iso&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;utc&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# Structlog preprocessors  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;structlog_processors&lt;/span&gt; = [  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;stdlib&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_log_level&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;processors&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_log_level&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;contextvars&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;merge_contextvars&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;processors&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;StackInfoRenderer&lt;/span&gt;(),  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;dev&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;set_exc_info&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;timestamper&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we want to define a function that either returns structlogs  &lt;code&gt;ConsoleRenderer&lt;/code&gt;or  &lt;code&gt;JSONRenderer&lt;/code&gt;, based on the environment settings and append this renderer to the processors.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;# __init__.py  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;structlog.dev&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ConsoleRenderer&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;structlog.processors&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;JSONRenderer&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;get_renderer&lt;/span&gt;() -&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Union&lt;/span&gt;[&lt;span style=&#34;color:#268bd2&#34;&gt;JSONRenderer&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;ConsoleRenderer&lt;/span&gt;]:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    Get renderer based on the environment settings  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    :return structlog renderer  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;    &#34;&#34;&#34;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;os&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;get&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;DEV_LOGS&#34;&lt;/span&gt;, &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;):  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ConsoleRenderer&lt;/span&gt;()  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#859900&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;JSONRenderer&lt;/span&gt;()  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#93a1a1;font-style:italic&#34;&gt;## Structlog  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;configure&lt;/span&gt;(  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;processors&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;structlog_processors&lt;/span&gt; &#43; [&lt;span style=&#34;color:#268bd2&#34;&gt;get_renderer&lt;/span&gt;()],  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;wrapper_class&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;stdlib&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;BoundLogger&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;context_class&lt;/span&gt;=&lt;span style=&#34;color:#cb4b16&#34;&gt;dict&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;logger_factory&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;structlog&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;PrintLoggerFactory&lt;/span&gt;(),  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#268bd2&#34;&gt;cache_logger_on_first_use&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;⚠️ This will only affect the log lines printed within your application and will not affect logs produced by modules used inside your application. If you also want these logs to be formatted as JSON, you need to follow  
&lt;a href=&#34;https://www.structlog.org/en/stable/standard-library.html#rendering-using-structlog-based-formatters-within-logging&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;tutorial 3 of the structlog documentation&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;After setting the environment variable  &lt;code&gt;DEV_LOGS&lt;/code&gt;  to  &lt;code&gt;False&lt;/code&gt;  our log lines are printed as raw JSONs.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/writing-professional-python-logs/4.png alt=&#34;Logs parsed as dictionaries&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;This format can now be used in your monitoring system and easily be parsed and aggregated since its not longer a stream of text, but rather a stream of easy to parse JSONs.&lt;/p&gt;
&lt;p&gt;In our local environment we can still use the old colored log lines:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/writing-professional-python-logs/5.png alt=&#34;Pretified logs&#34;  /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;limitations&#34;&gt;Limitations&lt;/h2&gt;
&lt;p&gt;Although structlog alone will not solve all inconsistencies with logging, it can help to use a more consistent and less error-prone workflow.&lt;/p&gt;
&lt;p&gt;To increase consistency in our codebase, we use a set of default naming conventions. While these conventions are not enforceable, they can contribute to a common understanding of how to write logs.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;&lt;entity&gt;_id&lt;/code&gt; &lt;- &lt;strong&gt;( required)&lt;/strong&gt;  every entity (e.g. file) that is processed needs to be added to the logger&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&lt;entity&gt;_name&lt;/code&gt;  &lt;-  &lt;strong&gt;(optional)&lt;/strong&gt;  every entity (e.g. file) can be added with the suffix _name for a better readability&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&lt;entity&gt;_&lt;additional_key&gt;&lt;/code&gt;  &lt;-  &lt;strong&gt;(optional)&lt;/strong&gt;  additional properties of entities are allowed to be added with the key name as a suffix&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&lt;custom_usecase_specific_field&gt;&lt;/code&gt;  ←  &lt;strong&gt;(optional)&lt;/strong&gt; Sometimes, you may need to add case-specific information. In such cases, you can use additional arbitrary keys. However, it is recommended to follow the naming conventions from 1–3, if possible.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is for sure not the only possible naming, but the one that works for us best.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary&lt;/h2&gt;
&lt;p&gt;We have learned the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Using a key-value format for variables helps create uniform log messages and promotes consistent naming conventions.&lt;/li&gt;
&lt;li&gt;Contextual data can be explicitly added by binding variables to loggers or implicitly added using context variables, which enriches log lines with useful metadata.&lt;/li&gt;
&lt;li&gt;Choosing a structlog renderer allows for easy parsing of log lines into a machine-readable or prettified, human-friendly format based on the environment.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;About me:&lt;/strong&gt;  I am a developer based in Cologne, working at  
&lt;a href=&#34;https://www.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset&lt;/a&gt;. I am part of the team building “deepset Cloud,” which is powered by the open source framework  
&lt;a href=&#34;https://haystack.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt;.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Introducing Agents in Haystack: Make LLMs resolve complex tasks</title>
      <link>https://haystack.deepset.ai/blog/introducing-haystack-agents/</link>
      <pubDate>Thu, 30 Mar 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/introducing-haystack-agents/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Wouldn’t it be cool if you could enter a query and an LLM makes decisions on-the-fly about what resources it needs to resolve the query? For example, that it needs to search the web and retrieve relevant resources. Or that it needs to search through your company files first. That’s now possible with Agents!&lt;/p&gt;
&lt;p&gt;With the release of 
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack 1.15&lt;/a&gt;, we’re officially introducing the &lt;code&gt;Agent&lt;/code&gt; to the Haystack ecosystem. The implementation of Agents is inspired by two papers: the 
&lt;a href=&#34;https://arxiv.org/abs/2205.00445&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;MRKL Systems paper&lt;/a&gt; (pronounced ‘miracle’ 😉) and 
&lt;a href=&#34;https://arxiv.org/abs/2210.03629&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;the ReAct paper&lt;/a&gt;. If you like reading papers, I highly recommend these two. Here, I’ll explain how we’re introducing this functionality to Haystack.&lt;/p&gt;
&lt;p&gt;Let’s first talk about what an Agent is, and then see how you can start using Agents in Haystack. First, we need to briefly talk about what a ‘prompt’ is.&lt;/p&gt;
&lt;h2 id=&#34;what-is-a-prompt&#34;&gt;What is a Prompt?&lt;/h2&gt;
&lt;p&gt;Long story short, a prompt is an instruction. In the world of Natural Language Processing, these instructions can often be things like ‘Answer the given query’, or ‘Summarize the following piece of text’. In recent months, new large language models (LLMs) were made available, such as the ever improving GPT models by OpenAI (&lt;code&gt;text-davinci-003&lt;/code&gt;, &lt;code&gt;gpt-3.5-turbo&lt;/code&gt;, and GPT-4). These have shown great capability in consuming and acting on increasingly complex prompts.&lt;/p&gt;
&lt;p&gt;Haystack users may already have encountered the 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/prompt_node&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;code&gt;PromptNode&lt;/code&gt;&lt;/a&gt;, which works alongside an LLM that consumes instructions. You have the option to define your own prompt in a &lt;code&gt;PromptTemplate&lt;/code&gt;, or use one of our defaults.&lt;/p&gt;
&lt;p&gt;For example, 
&lt;a href=&#34;https://huggingface.co/spaces/deepset/should-i-follow/blob/482bee0f12c76147028acae160814a948cf6d945/utils/haystack.py#L14&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here is a prompt&lt;/a&gt; that I defined in a &lt;code&gt;PromptTemplate&lt;/code&gt; for the 
&lt;a href=&#34;https://huggingface.co/spaces/deepset/should-i-follow&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;‘Should I follow?’ demo on Hugging Face&lt;/a&gt;. Take a look if you’re interested in how to use custom prompts in Haystack!&lt;/p&gt;
&lt;p&gt;In this case, I wanted to have a system that can tell me what type of things a Twitter user has been posting about. The prompt starts like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You will be given a twitter stream belonging to a specific profile. Answer with a summary of what they’ve lately been tweeting about and in what languages. You may go into some detail about what topics they tend to like tweeting about. Please also mention their overall tone, for example: positive, negative, political, sarcastic or something else.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Why does this matter to understand Agents? Let’s have a look 👇&lt;/p&gt;
&lt;h2 id=&#34;what-is-an-agent&#34;&gt;What is an Agent?&lt;/h2&gt;
&lt;p&gt;Agents are a way to leverage this ability of LLMs to understand and act on prompts. In essence, an Agent is an LLM that has been given a very clever initial prompt. The prompt tells the LLM to break down the process of answering a complex query into a sequence of steps that are resolved one at a time.&lt;/p&gt;
&lt;p&gt;Agents become &lt;em&gt;really&lt;/em&gt; cool when we combine them with ‘experts’, introduced in the MRKL paper. Simple example: an Agent might not have the inherent capability to reliably perform mathematical calculations by itself. However, we can introduce an expert - in this case a calculator, an expert at mathematical calculations. Now, when we need to perform a calculation, the Agent can call in the expert rather than trying to predict the result itself.&lt;/p&gt;
&lt;p&gt;For example, an Agent that is asked “Who was the US president ten years ago today?”. A simplified view of the Agent’s thought process as it breaks down this question might look like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;“I have to answer the question: What is today’s date?”&lt;/li&gt;
&lt;li&gt;“Now I know todays date is 29th March 2023, I need to answer: what is 29 March 2023 minus 10 years?”&lt;/li&gt;
&lt;li&gt;“Now I need to answer the question: Who was the US president on 29 of March 2013?”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At each step, the Agent may decide to make use of an expert to come to a resolution. Notice how it is able to figure out that it first has to ask a question, then perform a calculation with the answer, then finally ask another question. Whether it is able to do any of those things depends on which experts it has access to. In Haystack, these experts are called &lt;code&gt;Tools&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;what-are-tools&#34;&gt;What are Tools?&lt;/h2&gt;
&lt;p&gt;In the world of the Agent, a Tool is an expert that the Agent can query at any time. For example, an Agent might be given a Tool that can search the web (let’s call it ‘Websearch’). If the Agent sees that there’s a need to search the web and it has the Websearch in its set of tools, it will use it.&lt;/p&gt;
&lt;p&gt;With the release of Haystack 1.15, we are making it possible to give Haystack components (the Nodes, Pipelines and more) to an Agent as Tools. For example, a Tool might simply be an &lt;code&gt;ExtractiveQAPipeline&lt;/code&gt;. This is a pipeline that has access to a DocumentStore which contains your own data and, given a query, uses an extractive question answering model to find an answer in your own data. With this Tool, your Agent can use the pipeline to query your own data whenever it sees fit.&lt;/p&gt;
&lt;p&gt;Our first release of Agents will let you use several ready-made Haystack pipelines and Nodes as Tools: the &lt;code&gt;ExtractiveQAPipeline&lt;/code&gt;, &lt;code&gt;DocumentSearchPipeline&lt;/code&gt;, the &lt;code&gt;PromptNode&lt;/code&gt;, the new &lt;code&gt;WebQAPipeline&lt;/code&gt; and more. For a full list of available Tools, check out our 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/agent#tools&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Agent documentation&lt;/a&gt;. We will soon be expanding the list of Tools available out of the box, and you will be able to create your own custom Tools as well.&lt;/p&gt;
&lt;p&gt;How does an Agent select a Tool? Each Tool comes with a &lt;code&gt;description&lt;/code&gt;. This is arguably one of the most important attributes of a Tool, as it is used by the Agent to make the Tool selection. For example, the description of a Tool that does extractive question answering on docuemtns about USA presidents could be: ‘Useful for when you need to answer questions about the presidents of the USA’.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/introducing-haystack-agents/tools.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Let’s imagine that we do actually have our own data store containing information about US presidents. We create an &lt;code&gt;ExtractiveQAPipeline&lt;/code&gt; for this data and supply it to our Agent as a Tool called ‘ExtractiveQATool’. When we ask the Agent a question, here’s what the output might look like:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Question: What year was the 1st president of the USA born?&lt;/p&gt;
&lt;p&gt;Thought: Let’s think step-by-step, I first need to find out who the first president of the USA was.&lt;/p&gt;
&lt;p&gt;Tool: ExtractiveQATool&lt;/p&gt;
&lt;p&gt;Tool Input: &lt;strong&gt;Who was the first president of the USA?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Observation: &lt;strong&gt;George Washington&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Thought: Now that I know who the first president of the USA was, I can use the same tool to find out when he was born.&lt;/p&gt;
&lt;p&gt;Tool: ExtractiveQATool&lt;/p&gt;
&lt;p&gt;Tool Input: &lt;strong&gt;When was George Washington born?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Observation: &lt;strong&gt;February 22, 1732&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Thought: That’s the answer to the question.&lt;/p&gt;
&lt;p&gt;Final Answer: 1732&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Pay attention to the highlighted sections of the output. Notice how the Agent goes step-by-step to come up with an action plan. Depending on what the next step is, it is able to select a Tool (in this case our ‘ExtractiveQATool’) to perform the desired action.&lt;/p&gt;
&lt;p&gt;The options for these Tools are effectively limitless. It is up to you the user, to define your own Tools and decide which ones to give your Agent access to.&lt;/p&gt;
&lt;h2 id=&#34;what-is-the-difference-between-an-agent-and-a-pipeline&#34;&gt;What is the Difference between an Agent and a Pipeline?&lt;/h2&gt;
&lt;p&gt;Now let’s see how this differs from the Haystack Pipeline. A Pipeline is a powerful structure in Haystack. A simple pipeline might consist of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A &lt;code&gt;Retriever&lt;/code&gt;: finds the most relevant documents for a query.&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;Reader&lt;/code&gt; looks at some piece of text and extracts an answer to a question from it.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When we run a pipeline with a query, data will flow from one node to the next (in our example: first retrieving documents and them extracting an answer from them), until it gets to the end of the pre-defined pipeline flow and outputs a result. That’s it. That’s where our time with you comes to an end, curtains close, we’re all happy. As you can see, a pipeline is inherently a one-pass system. And this is precisely where an Agent is different (very different) to a pipeline.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/introducing-haystack-agents/pipeline.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Where a Pipeline has a clear start and end, input and result, an Agent does not (although you may define the maximum number of iterations you will allow it). The core difference between these two structures is that while a Pipeline is designed to be a single-iteration system, an Agent is a many-iterations system. The number of iterations is defined by the Agent, which decides after each iteration whether it has come to a final answer or not. (Of course, with a system in place to prevent infinite loops!)&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/introducing-haystack-agents/agents.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;The Agent owes this talent to three things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;powerful LLMs that are able to consume increasingly complex instructions;&lt;/li&gt;
&lt;li&gt;a well-defined prompt that defines the lifecycle of an Agent and causes it to continue until it has reached a solution; and&lt;/li&gt;
&lt;li&gt;a set of Tools that are each good at doing something &lt;em&gt;really well.&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;customizing-an-agent&#34;&gt;Customizing an Agent&lt;/h2&gt;
&lt;p&gt;You may have already noticed and started to use the &lt;code&gt;PromptNode&lt;/code&gt; in Haystack. This node is an interface with LLMs such as &lt;code&gt;text-davinci-003&lt;/code&gt; by OpenAI; the Flan T5 models by Google and so on.&lt;/p&gt;
&lt;p&gt;At its core, the Agent is a &lt;code&gt;PromptNode&lt;/code&gt; that has been given a specific prompt. By default, the Agent you will find in Haystack will be using the &lt;code&gt;PromptTemplate&lt;/code&gt; called 
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/agent#prompttemplate&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;“zero-shot-react”.&lt;/a&gt; However, you are free to create your own &lt;code&gt;PromptTemplate&lt;/code&gt; and use that as the basis of your Agent.&lt;/p&gt;
&lt;p&gt;The same goes for &lt;code&gt;Tools&lt;/code&gt;. In Haystack, you can now turn virtually any Haystack component into a Tool that can be used by an Agent. This means you have the freedom to design and create Tools with your own Pipelines and Nodes, including the PromptNode. This makes Tools incredibly flexible and will allow you to add all sorts of capabilities to your agents.&lt;/p&gt;
&lt;h2 id=&#34;an-example-of-creating-and-adding-tools&#34;&gt;An Example of Creating and Adding Tools&lt;/h2&gt;
&lt;p&gt;For a walkthrough of creating an Agent and adding Tools to it, visit our first tutorial on how to 
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/23_answering_multihop_questions_with_agents&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;‘Answer Multihop Questions with Agents’.&lt;/a&gt; But here’s the simple idea behind it:&lt;/p&gt;
&lt;p&gt;First, create a tool. Let’s assume the following pipeline:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;my_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;ExtractiveQAPipeline&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;some_retriever&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;reader&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;some_reader&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This could become a tool as follows:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.agents&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Tool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;my_qa_tool&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Tool&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;ExtractiveQATool&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;pipeline_or_node&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;my_pipeline&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                  &lt;span style=&#34;color:#268bd2&#34;&gt;description&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Useful for when you need to answer questions related to Yoda&#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                  &lt;span style=&#34;color:#268bd2&#34;&gt;output_variable&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answers&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that &lt;code&gt;description&lt;/code&gt; property you supply to your Tool is really important. The descriptions will be used by your Agent to make its decision about which Tool to use for the task at hand.&lt;/p&gt;
&lt;p&gt;Then, you would simply add this Tool to the set of Tools an Agent has access to. Or as I like to imagine it, you, as the all-seeing divinity in charge of the measly Agent, grant it the ability to (as per the description) use the “ExtractiveQATool” to “answer questions related to Yoda” 😊&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.agents&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PromptNode&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_node&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PromptNode&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model_name_or_path&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text-davinci-003&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;OPENAI_API_KEY&#39;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;stop_words&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Observation:&#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Agent&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_node&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;prompt_node&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;agent&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_tool&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;my_qa_tool&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We’re very excited to see how you use Agents and Tools in Haystack, and we can’t wait to start adding more capabilities and usability improvements. Join us on 
&lt;a href=&#34;https://discord.com/invite/xYvH6drSmA&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Discord&lt;/a&gt; or follow the activity in the 
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack repo&lt;/a&gt; to see what’s next for Haystack and the Agent&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Announcing the Haystack Blog</title>
      <link>https://haystack.deepset.ai/blog/announcing-haystack-blog/</link>
      <pubDate>Fri, 03 Mar 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/announcing-haystack-blog/</guid>
      <description>
      
      
      

      
      

      <p>We are thrilled to announce the launch of our new blog, a space to learn and share! The Haystack community has been growing at an incredible speed, reaching over 7k stars on GitHub and 900+ members on Discord, and we’re always looking for ways to make our community experience even better. Haystack blog is another step in that direction. 🚀</p>
<p>You might already be familiar with our 
<a href="https://www.deepset.ai/blog" target="_blank" rel="noopener">deepset blog</a> where we share tons of great articles. While we will continue to publish new content there too, on this new blog you will find more content targeted to developers building with Haystack specifically. It will be easier and faster for the Haystack developer to find the NLP content that is most relevant to them! 🏎️</p>
<p>Our goal is to create a space where you can find articles about state-of-the-art NLP technologies, such as LLMs, open source development, building with Haystack, community updates, and more. 📚</p>
<p>In addition to providing a space for our community to access high-quality NLP content, we would also be thrilled to give you the opportunity to submit your content and contribute to the Haystack blog! We believe learning from a diverse set of backgrounds is essential and we want to make sure everyone has a chance to share their knowledge and experience. 🤩</p>
<p>So, whether you are a seasoned NLP expert or just starting your journey, we invite you to check out our new blog. We are excited to hear your feedback and to continue growing our community together. 🙌</p>
<p>Do you have questions? Join our 
<a href="https://discord.com/invite/xYvH6drSmA" target="_blank" rel="noopener">Discord community</a>!</p>

      </description>
    </item>
    
    <item>
      <title>Build a Search Engine with GPT-3</title>
      <link>https://haystack.deepset.ai/blog/build-a-search-engine-with-gpt-3/</link>
      <pubDate>Tue, 31 Jan 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/build-a-search-engine-with-gpt-3/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;If you’ve been online lately, then you’ve likely seen the excitement about OpenAI’s newest language model, ChatGPT. ChatGPT is astonishingly good at many things, including debugging code and rewriting text in whatever style you ask it. As an offshoot of GPT-3.5, a large language model (LLM) with billions of parameters, ChatGPT owes its impressive amount of knowledge to the fact that it’s seen a large portion of the internet during training — in the form of the Common Crawl corpus and other data.&lt;/p&gt;
&lt;p&gt;After decades of chatbots that couldn’t even remember what they said a moment ago, it’s understandable that people are excited by a language model that can hold a conversation and create a solid semblance of intelligence. But when it comes to the validity of answers generated by these huge models, we need to stay critical. LLMs especially are  
&lt;a href=&#34;https://www.unite.ai/preventing-hallucination-in-gpt-3-and-other-complex-language-models/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;prone to hallucinations&lt;/a&gt;: producing text that sounds sensible at first but doesn’t hold up to closer scrutiny, and presenting things as fact that are made up entirely.&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://www.deepset.ai/blog/how-to-build-a-semantic-search-engine-in-python&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Semantic search engines&lt;/a&gt;  — our specialty here at deepset — are often powered by extractive question answering models. These models return snippets from the knowledge base verbatim, rather than generating text from scratch the way ChatGPT does. However, many applications can benefit from the abilities of generative LLMs. That’s why  
&lt;a href=&#34;https://haystack.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt;, deepset’s open-source framework for applied natural language processing (NLP), allows you to leverage multiple GPT models in your pipeline. With this approach, you can build a GPT-powered semantic search engine that uses your own data as ground truth and bases its natural-language answers on the information it contains.&lt;/p&gt;
&lt;p&gt;You can think of Haystack as a comprehensive and highly flexible toolbox whose primary goal is to make building different flavors of NLP systems easy and fast, yet transparent. Besides providing a comfortable entry point to the  
&lt;a href=&#34;https://openai.com/api/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenAI API&lt;/a&gt;, Haystack offers all the other components you need to successfully implement an end-to-end NLP system with GPT: a vector database, a module for retrieval, and the pipeline that combines all those elements into one queryable system.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In this article, we’ll demonstrate how to build a generative question answering system that uses the GPT-3 “&lt;/strong&gt;
&lt;a href=&#34;https://beta.openai.com/docs/models/overview&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;strong&gt;davinci-00&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;3” model to present results in convincing natural language.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id=&#34;the-advent-of-large-language-models&#34;&gt;The advent of large language models&lt;/h2&gt;
&lt;p&gt;Aren’t all modern language models large? It’s true — ever since the Transformer was introduced as a baseline-beating architecture for all kinds of NLP tasks, models have grown in size. But while the  
&lt;a href=&#34;https://huggingface.co/bert-large-uncased&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;largest BERT model&lt;/a&gt;  has 336 million parameters, OpenAI’s largest  
&lt;a href=&#34;https://beta.openai.com/docs/model-index-for-researchers/models-referred-to-as-gpt-3-5&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GPT-3.5 model&lt;/a&gt;  — which ChatGPT is based on — has 520 times as many.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/build-a-search-engine-with-gpt-3/gpt-image.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;So what does GPT do with all its additional capacity? From observation, we can say that GPT is exceptionally good at understanding implication and intent. It can remember what’s been discussed earlier in the conversation, including figuring out what you’re referring to with words like “he” or “before that,” and it can tell you when your question doesn’t make sense. All of these properties account for the increased sense of actual intelligence. It also has to generate language from scratch, a much harder task than returning the right section from a corpus. But it has been shown, too, that GPT uses a lot of its parameters to store facts — actual information, which it then uses for tasks like code debugging and answering general-knowledge questions.&lt;/p&gt;
&lt;p&gt;But here’s the problem: GPT can still make massive mistakes, and these are harder to detect because it is so good at conversing and making its answers and code examples  &lt;em&gt;look&lt;/em&gt;  right. Early in December, the programming Q&amp;A forum  
&lt;a href=&#34;https://www.theverge.com/2022/12/5/23493932/chatgpt-ai-generated-answers-temporarily-banned-stack-overflow-llms-dangers&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Stack Overflow temporarily banned ChatGPT-generated solutions&lt;/a&gt;. Meanwhile, hallucinations in large language models have spawned a new  
&lt;a href=&#34;https://dl.acm.org/doi/10.1145/3571730&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;field of research&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;However, there is a way to use GPT models in a way that is safer and generates value. By connecting the generative model to a textual database with curated, domain-specific content — say, a corpus of product reviews, a collection of financial reports, or a database with research papers — you can combine factual accuracy with GPT’s conversational prowess. With Haystack, you can set up such a GPT-based search engine in no time. The search engine is placed on top of your textual database, and returns natural-language answers in reply to an input query.&lt;/p&gt;
&lt;h2 id=&#34;different-types-of-search-engines&#34;&gt;Different types of search engines&lt;/h2&gt;
&lt;p&gt;Semantic search engines come in different varieties and can roughly be distinguished by the type of answer they return. The answers could consist of matching documents (in document search), answer spans (in extractive QA), or newly generated answers (in generative QA).&lt;/p&gt;
&lt;h3 id=&#34;the-generativeqapipeline-haystacks-component-for-a-generative-search-engine&#34;&gt;The GenerativeQAPipeline: Haystack’s component for a generative search engine&lt;/h3&gt;
&lt;p&gt;For each of those search paradigms, Haystack offers ready-made pipelines: search engine configurations with placeholders for language models for maximum efficiency. In this tutorial, we use the GenerativeQAPipeline. It consists of a retriever (to find relevant documents) and a generator (to write text), chained together. The retriever connects to the database. Like the generator, it is often (but not necessarily) based on a Transformer model. Its task is to retrieve the documents from the database that are most likely to contain valuable information, based on a user’s input query. Our generative model then uses those documents as a factual basis to write its answer.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/build-a-search-engine-with-gpt-3/generative-qa.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;h2 id=&#34;how-to-build-a-search-engine-using-gpt-3&#34;&gt;How to build a search engine using GPT-3&lt;/h2&gt;
&lt;p&gt;Before we get started, let’s quickly talk about the tools you need to follow along.&lt;/p&gt;
&lt;h3 id=&#34;prerequisites&#34;&gt;Prerequisites&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;You need to install Haystack. We use version 1.13.&lt;/li&gt;
&lt;li&gt;To use the  
&lt;a href=&#34;https://openai.com/api/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenAI API&lt;/a&gt;, you need to make an account and  
&lt;a href=&#34;https://beta.openai.com/account/api-keys&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;generate an API key&lt;/a&gt;. Note that while the first queries are free, you need to pay once you reach a certain limit. (OpenAI grants you an initial budget of $18, which is more than enough to complete this tutorial and play around with the pipeline.)&lt;/li&gt;
&lt;li&gt;Here we use a small dataset of 18 Wikipedia articles about Berlin, the capital city of Germany. You can, of course, use your own dataset.&lt;/li&gt;
&lt;li&gt;Since our embeddings and answers are provided through the OpenAI API, it’s not necessary to use GPUs for this guide. However, if you want to experiment with different models, we recommend that you work in a  
&lt;a href=&#34;https://colab.research.google.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Colab notebook&lt;/a&gt;  and activate the GPU (under “Runtime -&gt; Change runtime type”).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;converting-and-preprocessing&#34;&gt;Converting and preprocessing&lt;/h3&gt;
&lt;p&gt;Before you can set up the pipeline, you need to preprocess your data and add them to the document store, or database. There are many options for document stores in Haystack. This tutorial uses FAISS, which is a vector database.&lt;/p&gt;
&lt;p&gt;Before you can feed data into the document store, you need to get it into the right format. The DocumentStore expects data to be supplied in the form of a Haystack data type called Document — a dictionary data type that stores information as a set of related fields (such as the document text and its metadata). The convert_files_to_docs function retrieves your files from a directory and converts them into Haystack documents. If you’re working with articles from Wikipedia, you can use the clean_wiki_text cleaning function that removes some Wikipedia-specific boilerplate.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.utils&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;convert_files_to_docs&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;clean_wiki_text&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;convert_files_to_docs&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;dir_path&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;DOC_DIR&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;clean_func&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;clean_wiki_text&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;split_paragraphs&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Depending on your own data’s format, you may need to follow a slightly different procedure for preprocessing. With Haystack, you can extract data from websites, or convert different file formats like pdf, txt, or docx files. Have a look at our  
&lt;a href=&#34;https://haystack.deepset.ai/tutorials/08_preprocessing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;preprocessing tutorial&lt;/a&gt;  and the documentation page of the  
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/file_converters&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FileConverter&lt;/a&gt;  to learn more.&lt;/p&gt;
&lt;p&gt;Many documents, including Wikipedia articles about popular topics, can be very long. You need to make sure that the documents in your database are short enough for the embedding model to properly capture their meaning. To do this, you can use the  
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/preprocessor&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PreProcessor&lt;/a&gt;  to split them into shorter text snippets. We suggest a split length of 100 tokens per snippet, and an overlap of three tokens, to make sure no information gets lost.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;PreProcessor&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;preprocessor&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;PreProcessor&lt;/span&gt;(  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;clean_empty_lines&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;clean_whitespace&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;clean_header_footer&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;split_by&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;word&#34;&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;split_length&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;100&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;split_overlap&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;split_respect_sentence_boundary&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;False&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;processed_docs&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;preprocessor&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;process&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;docs&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;What do these processed documents look like? Let’s have a look at one of them:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;processed_docs&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;&lt;span style=&#34;color:#268bd2&#34;&gt;Document&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;content&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;The Berlin U-Bahn (German: [ˈuː baːn]; short for Untergrundbahn, &#34;underground railway&#34;) is a rapid transit system in Berlin, the capital and largest city of Germany, and a major part of the city&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\&#39;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;s public transport system. Together with the S-Bahn, a network of suburban train lines, and a tram network that operates mostly in the eastern parts of the city, it serves as the main means of transport in the capital.&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;Opened in 1902, the U-Bahn serves 175 stations[1] spread across nine lines, with a total track length of 155.4 kilometres (96 miles 45 chains),[3] about 80&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;% o&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;f which is underground.[4]  Trains run&#39;&lt;/span&gt;,   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;content_type&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;text&#39;&lt;/span&gt;,   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;score&#39;&lt;/span&gt;: &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;,   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;meta&#39;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;name&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Berlin U-Bahn.txt&#39;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;_split_id&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;},   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;embedding&#39;&lt;/span&gt;: &lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;None&lt;/span&gt;,   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;id&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;d2bf58a531b2500250650b43b1cce290&#39;&lt;/span&gt;}&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Every document has been turned into an object of the Document class, which is a dictionary that contains not only the document’s text, but also some automatically generated metadata, like which file the text came from.&lt;/p&gt;
&lt;h3 id=&#34;initializing-the-documentstore&#34;&gt;Initializing the DocumentStore&lt;/h3&gt;
&lt;p&gt;Time to set up the document store — for example, the vector-optimized FAISS database. When you initialize the document store, you need to know the length of your retriever’s document  
&lt;a href=&#34;https://www.deepset.ai/blog/the-beginners-guide-to-text-embeddings&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;vector embeddings&lt;/a&gt;  — its internal representations that it will produce for each document. Since you’ll be working with the high-dimensional text-embedding-ada-002 model from OpenAI, you need to set the vectors’ embedding_dim to 1536.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;FAISSDocumentStore&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;FAISSDocumentStore&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;faiss_index_factory_str&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Flat&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_dim&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1536&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, delete any existing documents in the database, and add the preprocessed documents that you generated earlier.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;delete_documents&lt;/span&gt;()  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;write_documents&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;processed_docs&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that so far, the database only contains the plain-text documents. To add the high-dimensional vector embeddings — the representations of each document that make sense to the language model and that it can use for semantic search — you need to set up the model for retrieval.&lt;/p&gt;
&lt;h3 id=&#34;retriever&#34;&gt;Retriever&lt;/h3&gt;
&lt;p&gt;The retriever is the module that matches your query to the documents in the database, and retrieves those that it deems most likely to contain the answer. Retrievers can be keyword-based (like tf-idf and BM25), or they can encode semantic similarity through the use of Transformer-generated text vectors. In the latter case, the retriever is also used to  &lt;em&gt;index&lt;/em&gt; the documents in your database — that is, turn them into high-dimensional embeddings that the retriever can then search.&lt;/p&gt;
&lt;p&gt;You’ll be working with  
&lt;a href=&#34;https://openai.com/blog/new-and-improved-embedding-model/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenAI’s most recent retrieval model&lt;/a&gt;, text-embedding-ada-002. To initialize it in Haystack, you need to provide your OpenAI API key.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;EmbeddingRetriever&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;EmbeddingRetriever&lt;/span&gt;(  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text-embedding-ada-002&#34;&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;batch_size&lt;/span&gt; = &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;32&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;MY_API_KEY&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;max_seq_len&lt;/span&gt; = &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1024&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When you set up the retriever, you connect it directly to your document store. Now you can use the update_embeddings method to turn the raw documents in the document store into high-dimensional vectors that the retrieval model can search and compare.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;update_embeddings&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;generator&#34;&gt;Generator&lt;/h3&gt;
&lt;p&gt;You are now ready to initialize the GPT model that will generate text for you. The  
&lt;a href=&#34;https://docs.haystack.deepset.ai/reference/answer-generator-api#openaianswergenerator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenAIAnswerGenerator node&lt;/a&gt;  can use four different GPT models. You can use the highest performing GPT-3.5 model, text-davinci-003.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIAnswerGenerator&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;OpenAIAnswerGenerator&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;api_key&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;MY_API_KEY&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;text-davinci-003&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;temperature&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;.5&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;max_tokens&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;30&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We recommend increasing the max_tokens parameter from the default value of 13 to 30, so that the GPT model can produce longer sequences. We also suggest you set the temperature to .5 (default is .2), which gives the model slightly more liberty in generating its answers. The lower the temperature, the more faithful the model remains to the underlying source texts.&lt;/p&gt;
&lt;h3 id=&#34;pipeline&#34;&gt;Pipeline&lt;/h3&gt;
&lt;p&gt;Now that all the individual elements of your GPT search engine are set up, it’s time to pass them to your generative QA pipeline.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.pipelines&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;GenerativeQAPipeline&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;gpt_search_engine&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;GenerativeQAPipeline&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And that’s it! Your GPT-powered search engine is ready for querying.&lt;/p&gt;
&lt;h2 id=&#34;querying-the-pipeline&#34;&gt;Querying the pipeline&lt;/h2&gt;
&lt;p&gt;Now you can ask your system some general questions about Berlin (or whatever other topic your dataset is about). In addition to the query itself, you can pass a few parameters to the search engine, like the number of documents that the retriever should deliver to the generator and the number of answers that should be generated (both designated “top_k”).&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What is Berlin known for?&#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;params&lt;/span&gt; = {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Retriever&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;top_k&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;5&lt;/span&gt;}, &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Generator&#34;&lt;/span&gt;: {&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;top_k&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt;}}  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;gpt_search_engine&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;params&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;params&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To print the answer generated by your pipeline, import Haystack’s handy print_answers function. It lets you determine the amount of detail that you want to see when printing the answer. Setting it to minimum will print only the answer string. So what’s the search engine’s answer to the question above?&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.utils&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;print_answers&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;print_answers&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;answer&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;details&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;minimum&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Query&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;What&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Berlin&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;known&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt;?  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Answers&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[ { &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;answer&#39;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Berlin is known for its diverse culture, its nightlife, its contemporary arts, and its high quality of life.&#34;&lt;/span&gt;}]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that this answer is generated from scratch: it’s not a quotation from any of the Wikipedia articles, but has been written based on the content in them.&lt;/p&gt;
&lt;h3 id=&#34;generated-answers-are-context-dependent&#34;&gt;Generated answers are context-dependent&lt;/h3&gt;
&lt;p&gt;Remember how we said earlier that the GPT-3 model generates its answers on the basis of the documents that it receives? You can now test that by running the generator in isolation, without the retriever. You can’t run it without any documents at all, though, so you need to pass it a single snippet. Here’s what happens if you use the snippet about the U-Bahn that was printed out above.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;generator&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;predict&lt;/span&gt;(&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What is Berlin known for?&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;documents&lt;/span&gt;=[&lt;span style=&#34;color:#268bd2&#34;&gt;processed_docs&lt;/span&gt;[&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;]])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Query&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;What&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Berlin&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;known&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt;?  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Answers&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[ { &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;The Berlin U-Bahn.&#34;&lt;/span&gt;}]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The system answers that Berlin is well known for its underground railway, because that’s all the knowledge it has from that one document.&lt;/p&gt;
&lt;p&gt;Now, go back to the full version of the search engine — the one that’s ingested your whole dataset (for example, all 18 Wikipedia articles about Berlin) — and ask it a few more questions, to get a better feeling for how your search engine operates.&lt;/p&gt;
&lt;h3 id=&#34;example-1&#34;&gt;Example 1&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Query&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;When&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;best&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;time&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;visit&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Berlin&lt;/span&gt;?  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Answers&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[ { &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34; Berlin is a great city to visit year-round, 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;but the besttime to visit is during the summer months, from June to 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; August.&#34;&#34;&#34;&lt;/span&gt;}]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;example-2&#34;&gt;Example 2&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Query&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Do&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;people&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Berlin&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;have&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;a&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;dialect&lt;/span&gt;?  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Answers&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[ { &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34; Yes, people from Berlin have a dialect, which is a variant   
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; of the Brandenburgish dialect.&#34;&#34;&#34;&lt;/span&gt;}]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;example-3&#34;&gt;Example 3&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Query&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Tell&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;me&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;about&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;some&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;interesting&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;buildings&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Berlin&lt;/span&gt;.  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Answers&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[ { &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;The Berlin TV Tower is a TV Tower in Berlin, Germany. It   
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; is Berlin’s tallest structure, at a height of 1,207 feet. &#34;&#34;&#34;&lt;/span&gt;}]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;example-4&#34;&gt;Example 4&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Query&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;How&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;was&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;TV&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;tower&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;built&lt;/span&gt;?  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Answers&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[ { &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;The TV tower was built by a team of architects between   
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; 1965 and 1969. The tower was built in a record-breaking 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; time of just 53 months,&#34;&#34;&#34;&lt;/span&gt;}]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;One interesting fact about GPT is that it doesn’t always return the same answers. When prompted multiple times with the same query, it will try to come up with different answers. When we worked through this tutorial ourselves, our model came up with a major hallucination, after receiving the same question multiple times. Have a look at this answer:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Query&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;How&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;was&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;TV&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;tower&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;built&lt;/span&gt;?  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Answers&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[ { &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34; The TV Tower was built by the Soviet troops by setting the   
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; contents on fire and turning the tower into a makeshift  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; chimney.&#34;&#34;&#34;&lt;/span&gt;}]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This answer — which was ranked lower than the correct answer — is rather absurd. But it should serve as a reminder that the output of a generative model, even when it looks like a well-formed answer, can be entirely hallucinated and should be fact-checked.&lt;/p&gt;
&lt;h3 id=&#34;example-5&#34;&gt;Example 5&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Query&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Berlin&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;a&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;good&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;place&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;clubbing&lt;/span&gt;?  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Answers&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[ { &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;Yes, Berlin is a good place for clubbing. There are many 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  nightclubs, including the Watergate, Tresor, and Berghain.&#34;&#34;&#34;&lt;/span&gt;}]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;comparing-generative-qa-to-extractive-qa&#34;&gt;Comparing generative QA to extractive QA&lt;/h3&gt;
&lt;p&gt;Like this generative pipeline, a search engine that’s based on extractive QA returns answers about a corpus of documents in response to a query in natural language. But because the extractive QA system extracts its answers verbatim from the document text itself, it has some restrictions compared to the GPT search engine.&lt;/p&gt;
&lt;p&gt;An extractive QA model is unable to produce the kind of conversational elements you can see in the examples above, like answering “Yes,” or repeating parts of the question in its answer. More significantly, it cannot answer questions as comprehensively as the GPT model, because it isn’t able to aggregate information from different texts.&lt;/p&gt;
&lt;p&gt;In the last example, GPT affirmed that Berlin is a good place for clubbing, before going on to list some examples of famous night clubs.&lt;/p&gt;
&lt;p&gt;But when an extractive QA model is asked the same question, it can only answer the question in an implicit manner, by extracting the text spans that it deems most relevant to the query. Here are two answers returned by such an extractive QA search engine, including the section (“context”) from which they were extracted:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Query&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;Is&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Berlin&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;a&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;good&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;place&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;clubbing&lt;/span&gt;?  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;Answers&lt;/span&gt;:  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[ { &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34; Partygoers in Germany often toast the New Year with a glass   
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; of sparkling wine&#34;&#34;&#34;&lt;/span&gt;,  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;context&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;ke place throughout the entire city. Partygoers in Germany   
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; often toast the New Year with a glass of sparkling wine.&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; Berlin is home to 44 theaters and &#34;&#34;&#34;&lt;/span&gt;},  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; { &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;answer&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Berlin’s club scene a premier nightlife venue”,  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;context&#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;&#34;&#34;rticularly those in Western and Central Europe, made 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; Berlin’s club scene a premier nightlife venue. After the   
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt; fall of the Berlin Wall in 1989, many hi&#34;&#34;&#34;&lt;/span&gt;}]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;should-you-use-generative-or-extractive-qa&#34;&gt;Should you use generative or extractive QA?&lt;/h2&gt;
&lt;p&gt;The advantages of generative versus extractive QA are clear: it has better conversational skills, produces well-formed sentences in natural language, and can aggregate knowledge from multiple sources in a single answer. But, as we have seen, it also has some downsides. You should consider using extractive over generative QA in the following cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When you want to work with smaller, open-source models. As we’ve seen, GPT models are huge, and once you’ve reached a certain limit of queries, you need to pay to use the API. The  
&lt;a href=&#34;https://huggingface.co/models&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hugging Face model hub&lt;/a&gt;, on the other hand, hosts thousands of open-source, pre-trained models that you can download for free.&lt;/li&gt;
&lt;li&gt;When you want transparency over where the model gets its information from. Extractive models don’t hallucinate. Naturally, these models can return wrong answers, too. But these are much easier to detect by checking the context from which they were extracted.&lt;/li&gt;
&lt;li&gt;When you want to use the answers in a downstream task. One of the most popular applications for extractive question answering is in  
&lt;a href=&#34;https://www.deepset.ai/blog/automating-information-extraction-with-question-answering&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;information extraction systems&lt;/a&gt;. Such systems have no use for the conversational abilities of a generative search engine. Rather, they require a search functionality that can extract factual information from large corpora quickly and reliably.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;haystack-the-main-framework-for-building-search-engines&#34;&gt;Haystack: the main framework for building search engines&lt;/h2&gt;
&lt;p&gt;Generative QA, extractive QA, translation, summarization and much, much more: with Haystack you can build the system that is best suited to solve your specific problem, using the hottest new architectures.&lt;/p&gt;
&lt;p&gt;Our application-focused approach to NLP provides you with the modular building blocks to set up your own system in the shortest amount of time. Check out the  
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack repository&lt;/a&gt;  to learn more, or have a look at our  
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Want to chat about GPT-3, search engines, and all things NLP?  
&lt;a href=&#34;https://discord.com/invite/xYvH6drSmA&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Join our Discord&lt;/a&gt;!&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>The Definitive Guide to BERT Models</title>
      <link>https://haystack.deepset.ai/blog/the-definitive-guide-to-bertmodels/</link>
      <pubDate>Mon, 16 Jan 2023 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/the-definitive-guide-to-bertmodels/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Anyone who has studied natural language processing (NLP) can tell you that the state of the art moves exceptionally fast. Big players like Google, Facebook, or OpenAI employ large teams of experts to come up with new solutions that bring computers ever closer to a seemingly human-like understanding of language. This results in model architectures and other approaches quickly becoming obsolete, and what was considered cutting-edge technology six months ago may almost seem outdated today. Nevertheless, some models make such an impact that they become foundational knowledge even as they are eclipsed by their successors.&lt;/p&gt;
&lt;p&gt;One model architecture for which this is true is BERT (short for  &lt;strong&gt;B&lt;/strong&gt;idirectional  &lt;strong&gt;E&lt;/strong&gt;ncoder  &lt;strong&gt;R&lt;/strong&gt;epresentations from  &lt;strong&gt;T&lt;/strong&gt;ransformers, an unwieldy name almost certainly picked for its friendly acronym). Although the first BERT model — born late 2018 — is rarely used in its original form today, the adaptability of this model architecture in terms of tasks, languages, ​​and even sizes means that direct BERT offspring are still thriving in all sorts of fields.&lt;/p&gt;
&lt;p&gt;In the high-churn world of  
&lt;a href=&#34;https://www.deepset.ai/blog/what-is-a-language-model&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;language models&lt;/a&gt;, it can be difficult to keep up and find the best option for your project. This post aims to refresh your knowledge of BERT, provide a survey of the various models that have iterated past the BERT baseline, and help you find the right BERT-like model for you.&lt;/p&gt;
&lt;h2 id=&#34;who-is-bert&#34;&gt;Who is BERT?&lt;/h2&gt;
&lt;p&gt;Google researchers designed BERT as a general language model, adapting the Transformer architecture, which had made an enormous impact on the field of NLP just a year earlier. Aside from  
&lt;a href=&#34;https://blog.google/products/search/search-language-understanding-bert/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;improving Google’s search results&lt;/a&gt;  through its deep understanding of semantics, BERT’s main function is as a basis for specific “downstream” tasks like question answering or sentiment analysis. That’s because its ability to process written language at a near-human level greatly aids the BERT language model in solving other language-based tasks.&lt;/p&gt;
&lt;p&gt;The impression BERT made on the NLP landscape in 2018 was incredible. After it was shown in the original paper that models based on BERT’s pre-trained architecture could outperform their competitors on many different tasks, industry observers predicted that this new model paradigm would be a game changer,  
&lt;a href=&#34;https://www.kdnuggets.com/2019/02/deconstructing-bert-distilling-patterns-100-million-parameters.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;one blog post&lt;/a&gt;  even going so far as calling BERT “one model to rule them all.”&lt;/p&gt;
&lt;h3 id=&#34;what-sets-bert-apart&#34;&gt;&lt;strong&gt;What sets BERT apart?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;To understand the tremendous success of BERT, we need to dip into the history of language modeling. Before Transformers, language models were based on recurrent neural nets (RNNs). RNNs had a major edge over previous, “static”  
&lt;a href=&#34;https://www.deepset.ai/blog/the-beginners-guide-to-text-embeddings&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;text embeddings&lt;/a&gt;  like Doc2Vec in that they were able to capture the meaning of words in context.&lt;/p&gt;
&lt;p&gt;However, RNN-based language models still had serious shortcomings. They are strictly sequential, meaning that they process the input token after token, rather than all at once. Plus, RNNs perform weakly on longer sequences, prompting researchers to implement remedies such as “memory cells” in  
&lt;a href=&#34;https://en.wikipedia.org/wiki/Long_short-term_memory&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;long short-term memory (LSTM)&lt;/a&gt;  networks, and processing an input string twice: once from left to right, and once from right to left.  
&lt;a href=&#34;https://allenai.org/allennlp/software/elmo&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ELMo&lt;/a&gt;, often considered BERT’s direct predecessor, is an example of such a bidirectional LSTM model.&lt;/p&gt;
&lt;p&gt;In contrast, BERT follows the Transformer architecture introduced in the  
&lt;a href=&#34;https://arxiv.org/abs/1706.03762&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;seminal paper from 2017&lt;/a&gt;. Transformers are built on self-attention, a new core algorithm for detecting patterns, which enables the model to learn how individual tokens in an input sequence are contextually related to each other. Embeddings created by self-attention-driven Transformers are therefore highly dynamic and can easily differentiate language that machines traditionally find difficult, like homonyms or jargon words. In addition, self-attention can process input sequences at once, rather than sequentially, allowing for a faster training procedure.&lt;/p&gt;
&lt;p&gt;However, in deep learning, the model architecture itself is just one element that contributes to a model’s success — the other one is the training task and the data it uses. One of the training tasks used to help BERT acquire language is  
&lt;a href=&#34;https://towardsdatascience.com/robustly-optimized-bert-pretraining-approaches-537dc66522dd&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;“masked language modeling” (MLM)&lt;/a&gt;. During training, individual tokens in the training input are masked, and loss is calculated based on how well BERT can predict those tokens. What makes this task so useful is that the training data for it can be generated in a trivial manner: simply obscure a random token in the source sequence, and make that same token the prediction target. No expensive data annotation is required.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/the-definitive-guide-to-bertmodels/bert-masked-token.png alt=&#34;BERT struggling with a particularly tricky masked token&#34;  /&gt;
&lt;/p&gt;
&lt;h3 id=&#34;other-directions-in-transformers&#34;&gt;Other directions in Transformers&lt;/h3&gt;
&lt;p&gt;BERT’s basic architecture is not the only way to use the self-attention developments of transformers. The well-known Generative Pre-trained Transformer (GPT) family of models utilizes the same baseline self-attention modules as BERT, but includes a decoding layer capable of generating text from inputs.&lt;/p&gt;
&lt;p&gt;Keep in mind that although GPT gets a lot of public attention, BERT’s lack of a “decoder” component doesn’t mean GPT is better. BERT is a lighter model, we don’t use it for tasks that require language generation.&lt;/p&gt;
&lt;h2 id=&#34;the-bert-family&#34;&gt;The BERT family&lt;/h2&gt;
&lt;p&gt;The success of the basic BERT model has led to a large number of popular variants. Newer, more powerful models soon appeared, BERT got siblings in all kinds of languages — some were even multilingual — and researchers used BERT as a base model to fine-tune and adapt to different tasks and language domains. Recently, we’ve seen a push for making Transformer models like BERT ever smaller and more efficient.&lt;/p&gt;
&lt;p&gt;Below, we will break down some of these broad categories in the BERT family. You can explore all the different BERTs available on the  
&lt;a href=&#34;https://huggingface.co/models&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hugging Face model hub&lt;/a&gt;, and use the search and filter options to find one that fits your needs.&lt;/p&gt;
&lt;h3 id=&#34;roberta-the-overachieving-big-sister&#34;&gt;RoBERTa: the overachieving big sister&lt;/h3&gt;
&lt;p&gt;In 2019, Facebook engineers  
&lt;a href=&#34;https://arxiv.org/abs/1907.11692&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;developed a significantly more powerful version&lt;/a&gt;  of BERT using the same model architecture. By optimizing hyperparameters and feeding the model more data, they made sizable improvements to the basic BERT model while maintaining resource size and inference complexity. Another large step forward was the change over to dynamic MLM, providing varying sentence masking to increase the robustness of learning.&lt;/p&gt;
&lt;p&gt;The resulting architecture was named RoBERTa, and was quickly adopted as a standard baseline over BERT. Today RoBERTa is still commonly used as a baseline, as in our  
&lt;a href=&#34;https://huggingface.co/deepset/roberta-base-squad2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;standard question answering model for English&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;polyglottal-bert&#34;&gt;Polyglottal BERT&lt;/h3&gt;
&lt;p&gt;Wikipedia is a great resource for training sets in many languages, so it is relatively trivial to retrain basic BERT models on other languages if a large enough Wikipedia exists.&lt;/p&gt;
&lt;p&gt;At deepset, we are proud to have contributed some of the best-performing German language BERTs to the NLP community. Be sure to check out a variety of English and German language models on our  
&lt;a href=&#34;https://www.deepset.ai/models&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;website&lt;/a&gt;. German BERT is in the fine company of a large international model family that contains members such as  
&lt;a href=&#34;https://huggingface.co/aubmindlab/bert-base-arabertv02&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;AraBERT&lt;/a&gt;,  
&lt;a href=&#34;https://huggingface.co/camembert-base&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CamemBERT&lt;/a&gt;, and  
&lt;a href=&#34;https://huggingface.co/DeepPavlov/rubert-base-cased&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;RuBERT&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Many people also rely on multilingual BERTs for generalized semantic context. One of BERT’s more impressive offspring is the Multilingual BERT developed by Google, a BERT model trained on 104 different languages, which is  
&lt;a href=&#34;https://arxiv.org/pdf/1906.01502.pdf&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;capable of “zero shot” adaptation to a new language domain&lt;/a&gt;. This may be a quick and tidy solution to generalized language modeling, but when you need a specific semantic model, nothing beats training it yourself (as we demonstrated with  
&lt;a href=&#34;https://www.deepset.ai/german-bert&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;German BERT&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/the-definitive-guide-to-bertmodels/timeline.png alt=&#34;A timeline of various developments in BERT and other Transformers&#34;  /&gt;
&lt;/p&gt;
&lt;h3 id=&#34;bert-in-the-real-world-fine-tuning&#34;&gt;BERT in the real world: fine-tuning&lt;/h3&gt;
&lt;p&gt;General language models are great, not because they are particularly useful on their own, but because they provide a deep understanding of a language’s semantics that can be used in downstream tasks. The models for those tasks are produced through fine-tuning.&lt;/p&gt;
&lt;p&gt;In fine-tuning, the core architecture of a pre-trained model like BERT or RoBERTa is altered slightly to achieve a different target output. It is then trained on data specific to the downstream task: for a question answering model, that could be a dataset consisting of questions and answers, like SQuAD; for a summarization model, a dataset with long- and short-form document pairs. The advantage to these specializations is that a pre-trained base model requires much less specific data and much less training time to become performant in a given speciality.&lt;/p&gt;
&lt;p&gt;Fine-tuning is a transfer learning method that has enabled BERT models to accomplish tasks as diverse as  
&lt;a href=&#34;https://huggingface.co/unitary/toxic-bert&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;detection of “toxic” language&lt;/a&gt;,  
&lt;a href=&#34;https://huggingface.co/cahya/bert2bert-indonesian-summarization&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;summarization&lt;/a&gt;,  
&lt;a href=&#34;https://huggingface.co/dslim/bert-base-NER&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;named entity recognition&lt;/a&gt;,  
&lt;a href=&#34;https://huggingface.co/deepset/bert-base-cased-squad2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;question answering&lt;/a&gt;, and many others.&lt;/p&gt;
&lt;h3 id=&#34;how-bert-got-its-doctorate-domain-adaptation&#34;&gt;How BERT got its doctorate: domain adaptation&lt;/h3&gt;
&lt;p&gt;Language is notoriously divergent in different domains. Even when dialects aren’t formally distinguished, even a small vocabulary of jargon, terms of art, or antiquated usage can throw off the effectiveness of a language model in a particular application. Domain adaptation is the practice of gearing a general language model towards a more specialized jargon of a language.&lt;/p&gt;
&lt;p&gt;Here are some examples of BERT models that have been adapted to different domains, to be used for specific business applications:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://arxiv.org/abs/1901.08746&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BioBERT&lt;/a&gt;: a BERT model retrained on a biomedical corpus, enabling applications like this  
&lt;a href=&#34;https://towardsdatascience.com/tagging-genes-and-proteins-with-biobert-c7b04fc6eb4f&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;named entity recognition&lt;/a&gt;  model, which contextualizes obscure designations in text.&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://arxiv.org/pdf/1903.10676.pdf&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SciBERT&lt;/a&gt;: a similar model, but trained on over one million published articles. Performs at state of the art for computer science articles.&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://arxiv.org/abs/2005.10200&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BERTweet&lt;/a&gt;: a RoBERTa model trained on 850 million tweets, seen used in analyzing public discourse around  
&lt;a href=&#34;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9353682/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;novel vaping technology&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://arxiv.org/abs/1908.10063&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FinBERT&lt;/a&gt;: a model that was first adapted to the financial domain, before fine-tuning it to the particularly tricky task of sentiment analysis on news snippets from the world of finance.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;bert-today-the-quest-for-smaller-and-faster-models&#34;&gt;BERT today: the quest for smaller and faster models&lt;/h2&gt;
&lt;p&gt;A major restriction in the use of BERT (and other state-of-the-art models) today is the speed and fluidity of inference tasks. Most active development on these models is seeking to make them faster for better customer experience, or even scaled down enough to fit into smartphones and run locally.&lt;/p&gt;
&lt;p&gt;That is why a current focus in BERT research is  
&lt;a href=&#34;https://www.deepset.ai/blog/knowledge-distillation-with-haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;model distillation&lt;/a&gt;: using large, performant models to direct the training of smaller, more efficient models. The smaller, distilled model learns to imitate most of the larger model’s learning — knowledge which the smaller models would not have the bandwidth to acquire during their own training.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/the-definitive-guide-to-bertmodels/teacher-student.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;A large, pre-trained model serves as a teacher for an untrained smaller model, which is trained to mimic the teacher. The result is a compressed, less resource intensive model with similar results.&lt;/p&gt;
&lt;h2 id=&#34;whats-next-for-bert&#34;&gt;What’s next for BERT?&lt;/h2&gt;
&lt;p&gt;As our ability to shrink BERT improves, researchers are eager to expand the scope and abilities of the original model. Much of the innovation in BERT’s development is happening in changes to the training tasks that BERT is learning from.&lt;/p&gt;
&lt;p&gt;Much of current research is directed towards finding important alternative training tasks, rather than innovating BERT’s core architecture. It is thought that adding a wider variety of training tasks will especially improve BERT as a core model from which others can be fine-tuned. Such multi-task learning — a single model that can solve multiple tasks — may have better semantic generalization and therefore better performance on individual tasks.&lt;/p&gt;
&lt;p&gt;Pushing the envelope, some directions of research are developing multimedia BERT models. Using a modified BERT as a core semantic embedding, models can be trained to translate from strings to other media, as in our  
&lt;a href=&#34;https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/19_Text_to_Image_search_pipeline_with_MultiModal_Retriever.ipynb&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;semantic image search tutorial&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;If you want to learn more about BERT models through personal tinkering, our  
&lt;a href=&#34;https://haystack.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack framework&lt;/a&gt;  is a seamless tool for setting up a project. You can dive in right now with this  
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/answer_generator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;tutorial on OpenAI’s GPT models for text generation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you’d like to ask about our tutorials around BERT or just connect with developers working at the cutting edge of NLP,  
&lt;a href=&#34;https://haystack.deepset.ai/community&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our Discord&lt;/a&gt;  is full of friendly BERTologists ready to chat.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>How to Build a Semantic Search Engine in Python</title>
      <link>https://haystack.deepset.ai/blog/how-to-build-a-semantic-search-engine-in-python/</link>
      <pubDate>Wed, 23 Nov 2022 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/how-to-build-a-semantic-search-engine-in-python/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Semantic search is the task of retrieving documents from a collection of documents (also known as a ‘corpus’) in response to a query asked in natural language. Powered by the latest Transformer language models, semantic search allows you to access the best matches from your document collection within seconds, and on the basis of meaning rather than keyword matches. As well as being helpful in its own right, semantic search also forms the basis for many complex tasks, like question answering or text summarization.&lt;/p&gt;
&lt;p&gt;Over the last decade or so, Python has become the principal language for machine learning (ML) and natural language processing (NLP). In this article, we will show you how to set up a semantic search engine in Python, placing it on top of your document collection of choice, with our open source  
&lt;a href=&#34;https://haystack.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack framework&lt;/a&gt;. Thanks to Haystack’s modular setup and the availability of high-quality pre-trained language models, you’ll be able to set up your own  
&lt;a href=&#34;https://www.deepset.ai/blog/understanding-semantic-search&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;semantic search&lt;/a&gt;  system in less than twenty minutes.&lt;/p&gt;
&lt;h2 id=&#34;a-refresher-on-semantic-search&#34;&gt;A refresher on semantic search&lt;/h2&gt;
&lt;p&gt;Like all  
&lt;a href=&#34;https://www.deepset.ai/blog/what-is-a-language-model&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Transformer-based language models&lt;/a&gt;, the models used in semantic search encode text (both the documents and the query) as high-dimensional vectors or embeddings. We can then use similarity measures like cosine similarity to understand how close in meaning two vectors (and their associated texts) are. Texts that are similar in meaning are closer to each other, while unrelated texts are more distant. While illegible to humans, the  
&lt;a href=&#34;https://www.deepset.ai/blog/what-is-text-vectorization-in-nlp&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;vector-based representation&lt;/a&gt;  works very well for computers to represent meaning.&lt;/p&gt;
&lt;p&gt;The  
&lt;a href=&#34;https://www.deepset.ai/blog/what-semantic-search-can-do-for-you&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;superiority of semantic search&lt;/a&gt;  over a keyword-based approach becomes clear if we look at an example. Think of the difference between the queries “why can’t I commit changes” — a perennial problem for the novice Git user — and “why can’t I commit to changes” — a problem for the indecisive. The preposition “to” entirely changes the meaning of the query, which is impossible to detect with simple keyword matching. A semantic language model (like the one  
&lt;a href=&#34;https://blog.google/products/search/search-language-understanding-bert/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;used by Google&lt;/a&gt;) will embed the two queries in disparate locations of the vector space.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/how-to-build-a-semantic-search-engine-in-python/why-cant-I-commit.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Semantic search is great for disentangling subtleties like this. Here are some scenarios where semantic search will be particularly helpful:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;When your application is searching a large or ambiguous corpus&lt;/strong&gt;. Many heterogeneous corpuses present problems for simple keyword matching. Imagine trying to compare reward programs between airlines.  
&lt;a href=&#34;https://www.southwest.com/rapidrewards/overview&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Every&lt;/a&gt;  
&lt;a href=&#34;https://www.china-airlines.com/de/en/member/membership-benefits/Index&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;company&lt;/a&gt;  
&lt;a href=&#34;https://www.flydubai.com/en/skywards/benefits&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;has&lt;/a&gt;  
&lt;a href=&#34;https://www.ryanair.com/mt/en/plan-trip/travel-extras/travel-credit&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;a&lt;/a&gt;  
&lt;a href=&#34;https://www.icelandair.com/frequent-flyer/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;different&lt;/a&gt;  
&lt;a href=&#34;https://latampass.latam.com/en_us/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;term&lt;/a&gt;  for its customer currencies and rebate programs. Many of those terms borrow from common concepts like miles, points, or rewards. A semantic search engine can easily capture the similarity of these terms, where exact text matching would not.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;When your users aren’t finding what they’re looking for through keywords.&lt;/strong&gt; In keyword matching systems, good results are dependent on an exact match between corpus and query. When users are unable to find quality search results, they often painfully iterate search terms, hunting for the right combination to unlock the corpus. This is a poor user experience, and can be replaced with a more comfortable and natural semantic search model.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;When you want to offer your users a more intuitive search interface.&lt;/strong&gt; Semantic search is simply more comfortable and enjoyable for sifting through documents. Human minds work in human terms, and most people prefer it when they don’t have to adapt their language to a computer.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now, let’s look at how Haystack and Python make it possible to build semantic search systems in just a few lines of code.&lt;/p&gt;
&lt;h2 id=&#34;haystack-applied-nlp-in-python&#34;&gt;Haystack: Applied NLP in Python&lt;/h2&gt;
&lt;p&gt;Haystack is our framework for applied NLP that uses a modular, mix-and-match approach to building NLP systems. These days, the highest-performing language models are huge.  
&lt;a href=&#34;https://www.deepset.ai/blog/when-and-how-to-train-a-language-model&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Training such a language model&lt;/a&gt;  yourself on your local PC is impractical, uneconomical and, frankly, often impossible. However, thanks to centralized platforms like the  
&lt;a href=&#34;https://huggingface.co/models&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hugging Face model hub&lt;/a&gt;, pre-trained models can be shared and reused by everyone.&lt;/p&gt;
&lt;p&gt;With pre-trained models being readily available, the Haystack framework offers the architecture that makes them work in practice: the Python library comes with several  
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/ready_made_pipelines&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;preconfigured pipelines&lt;/a&gt;, where all you need to do is plug in the language model that works for you. This makes it easy to build prototype systems quickly, using different models. At the same time, Haystack offers all the flexibility that we’ve come to expect from Python libraries, making it extremely easy to configure and customize your own pipeline.&lt;/p&gt;
&lt;p&gt;Our pipelines are essentially nodes connected by edges. A node can hold a specific language model, which transforms its input before passing it on to the next node. Once you’ve decided on the pipeline setup that works best for your use case, you can easily iterate through different language models, to find the combination that gives you the best results. Note that whenever you change the model combination, you’ll need to initialize your pipeline from scratch.&lt;/p&gt;
&lt;h2 id=&#34;building-a-semantic-search-engine-prerequisites&#34;&gt;Building a Semantic Search Engine: Prerequisites&lt;/h2&gt;
&lt;p&gt;To build a semantic search prototype, think about three aspects in advance: what documents you want to search, the design of your pipeline, and which language models to use (often, you’ll have a number of models that you want to compare to each other).&lt;/p&gt;
&lt;p&gt;For the context of this guide, imagine that you’ve got some gardening work to do, and you’ve acquired a  
&lt;a href=&#34;https://sites.google.com/view/emanualqa/home&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;corpus of e-manuals&lt;/a&gt;  that should help you find the best tools for different tasks. Unfortunately, there are too many documents to look through manually. If you’re not very familiar with the gardening jargon, a keyword-based search would likely be only moderately successful.&lt;/p&gt;
&lt;p&gt;You can experiment with different pipeline designs for your semantic search system. First, try using the basic, ready-made  
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/ready_made_pipelines#documentsearchpipeline&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;DocumentSearch pipeline&lt;/a&gt;, which consists only of one node: the  
&lt;a href=&#34;https://docs.haystack.deepset.ai/reference/retriever-api&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retriever&lt;/a&gt;, which extracts the best matches from your documents collection.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/how-to-build-a-semantic-search-engine-in-python/simple-semantic-search-engine.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;The retriever selects the best documents in response to a query. Before you can use a Transformer-based retrieval model in practice, you’ll need to index your documents; that is, pre-process them and let the retrieval model turn them into vector representations, which it can compare later. In more complex pipeline architectures, the retriever often works as a “sieve” that presents a small pre-selection of documents to the next node, taking the burden of processing high amounts of texts off more computationally costly models.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;build-your-semantic-search-system-with-the-haystack-python-framework&#34;&gt;Build Your Semantic Search System With The Haystack Python Framework&lt;/h2&gt;
&lt;p&gt;We’ll be working with the  
&lt;a href=&#34;https://github.com/facebookresearch/faiss&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FAISS&lt;/a&gt;  
&lt;a href=&#34;https://haystack.deepset.ai/components/document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;document store&lt;/a&gt;  as our database, which is optimized for working with vector representations. Make sure to install Haystack (we use version 1.11) with FAISS support enabled:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pip&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;install&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;git&lt;/span&gt;&#43;&lt;span style=&#34;color:#268bd2&#34;&gt;https&lt;/span&gt;://&lt;span style=&#34;color:#268bd2&#34;&gt;github&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;com&lt;/span&gt;/&lt;span style=&#34;color:#268bd2&#34;&gt;deepset&lt;/span&gt;-&lt;span style=&#34;color:#268bd2&#34;&gt;ai&lt;/span&gt;/&lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;git&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pip&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;install&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;farm-haystack[faiss]&#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can then start by reading in and converting the files, having stored them as .txt documents locally under  &lt;em&gt;datapath&lt;/em&gt;. If you want to use the same .&lt;em&gt;txt&lt;/em&gt;  files that we use in this project, you can download them from  
&lt;a href=&#34;https://drive.google.com/drive/folders/1-gX1DlmVodP6OVRJC3WBRZoGgxPuJvvt?usp=sharing&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this Google Drive folder&lt;/a&gt;  Haystack’s  &lt;em&gt;convert_files_to_docs&lt;/em&gt;  function recognizes the files’ format automatically and converts them into objects of the  
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/documents_answers_labels#document&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Document&lt;/a&gt;  class, which can then be read into the document store.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.utils&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;convert_files_to_docs&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;all_docs&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;convert_files_to_docs&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;dir_path&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;datapath&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;split_paragraphs&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;True&#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, import the  &lt;em&gt;FAISSDocumentStore&lt;/em&gt;  class, initialize the document store, and add the documents.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.document_stores.faiss&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;FAISSDocumentStore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;FAISSDocumentStore&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;faiss_index_factory_str&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;Flat&#34;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;similarity&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cosine&#34;&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;write_documents&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;all_docs&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, import the retriever class and initialize it with the retrieval model that you want to use for this task. Set the  &lt;em&gt;top_k&lt;/em&gt;  parameter — the number of documents returned by the retriever — to 3.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;EmbeddingRetriever&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;sentence-transformers/multi-qa-mpnet-base-dot-v1&#39;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;EmbeddingRetriever&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;use_gpu&lt;/span&gt;=&lt;span style=&#34;color:#859900;font-weight:bold&#34;&gt;True&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;embedding_model&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;model&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now you’ll need to update the documents in the document store by running them through the retriever model. This is also known as indexing. Since the Transformer model needs to process all the documents in your database, this step will probably take several minutes.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;document_store&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;update_embeddings&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After you’ve indexed your documents, it’s finally time to set up the pipeline by importing the class and initializing it with the retriever.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.pipelines&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentSearchPipeline&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;semantic_search_pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;DocumentSearchPipeline&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;your-python-semantic-search-engine-in-action&#34;&gt;Your Python Semantic Search Engine in Action&lt;/h2&gt;
&lt;p&gt;Now that your documents have been stored and indexed in the document store, and your pipeline is set up and connected to it, it’s time to ask your questions. Keep in mind that since you’re using a document search pipeline, the results will be documents rather than answers. You could start with the following:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What&#39;s a good machine for cutting grass?&#34;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prediction&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;semantic_search_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The prediction variable stores a dictionary. Have a quick look at its contents:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;prediction&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;keys&lt;/span&gt;())  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;dict_keys&lt;/span&gt;([&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;documents&#39;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;root_node&#39;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;params&#39;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;query&#39;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;node_id&#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The  &lt;em&gt;documents&lt;/em&gt;  key is the most interesting of all of these: it contains the documents that match your query, according to the model. Look at the first 100 characters of each result:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;i&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;pred&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;enumerate&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;prediction&lt;/span&gt;[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;documents&#39;&lt;/span&gt;]):  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#cb4b16&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;i&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;pred&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;content&lt;/span&gt;[:&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;100&lt;/span&gt;] &#43; &lt;span style=&#34;color:#2aa198&#34;&gt;&#39;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;&lt;/span&gt;)  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Fast&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;clean&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;dependable&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;mowing&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Land&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pride&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Flail&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Mowers&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;are&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;perfect&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;choice&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;schools&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ot&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;All&lt;/span&gt;-&lt;span style=&#34;color:#268bd2&#34;&gt;Flex&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Mowers&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Our&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;says&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;it&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;all&lt;/span&gt;! &lt;span style=&#34;color:#268bd2&#34;&gt;Maintaining&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;a&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;large&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;area&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;grass&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;a&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;picturesque&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;setting&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;is&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt; - &lt;span style=&#34;color:#268bd2&#34;&gt;Steep&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Slopes&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt;° &lt;span style=&#34;color:#268bd2&#34;&gt;to&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;31&lt;/span&gt;° - &lt;span style=&#34;color:#268bd2&#34;&gt;High&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;production&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;mowing&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;of&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;roadways&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;parks&lt;/span&gt; - &lt;span style=&#34;color:#268bd2&#34;&gt;Steering&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;wheel&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;steered&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;These are looking great! Clearly the language model can understand that “cutting grass” is very similar to the action of “mowing.” If you remember that the underlying model is a general language model that has not been fine-tuned to the specific topic of gardening (or to the texts at hand), the potency of these models becomes evident. Now try a different query:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;What are the best tools for digging a ditch?&#34;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;prediction&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;semantic_search_pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;query&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;question&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And print the three results again:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&gt;&gt;&gt; 0 Double Pocket Chart Stand with Storage Assembly Instructions Contents: A. Bottom Side Pole (4) B. Mi  
&gt;&gt;&gt; 1 Ideal for ditching, road grading and all-around farm use. RBT40 SERIES 65 - 100 HP -- 84&#34; 96&#34; 108&#34; Ca  
&gt;&gt;&gt; 2 Small farms and food plots ... just right for the Land Pride disc. Land Pride Disc Harrows break up th
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;These results definitely seem on topic — but the order could be improved. After all, the second result sounds like the ideal match. You could replace your retrieval model with the  
&lt;a href=&#34;https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;sentence-transformers/multi-qa-mpnet-base-dot-v1&lt;/a&gt;  model, which is much better suited to this  
&lt;a href=&#34;https://www.sbert.net/examples/applications/semantic-search/README.html#symmetric-vs-asymmetric-semantic-search&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;asymmetric semantic search&lt;/a&gt;  task. However, for the purpose of this tutorial, we’ll show you a different approach to improving your semantic search engine.&lt;/p&gt;
&lt;h2 id=&#34;expand-your-python-semantic-search-engine&#34;&gt;Expand Your Python Semantic Search Engine&lt;/h2&gt;
&lt;p&gt;As an alternative to using a more complex model, you can expand your pipeline by setting up a custom design that contains a  
&lt;a href=&#34;https://haystack.deepset.ai/pipeline_nodes/ranker&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ranker node&lt;/a&gt;  in addition to the retriever. The ranker uses its own Transformer-based language model to re-rank the documents it receives from the retriever.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/how-to-build-a-semantic-search-engine-in-python/semantic-search-with-ranker.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;First, you need to import two additional classes — the general pipeline and the ranker node — and instantiate them.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;haystack.nodes&lt;/span&gt; &lt;span style=&#34;color:#dc322f;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersRanker&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;Pipeline&lt;/span&gt;()  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;ranker_model&lt;/span&gt; = &lt;span style=&#34;color:#2aa198&#34;&gt;&#34;cross-encoder/ms-marco-MiniLM-L-12-v2&#34;&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;ranker&lt;/span&gt; = &lt;span style=&#34;color:#268bd2&#34;&gt;SentenceTransformersRanker&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;model_name_or_path&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;ranker_model&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;top_k&lt;/span&gt;=&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, add the two modules — retriever and ranker — to your pipeline object. Whenever you add a node to a pipeline, you need to give it a name by which you can refer to it. You also need to tell the pipeline how the nodes are connected to each other by specifying the origin of the input for each node. By default, the first input is always designated Query.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_node&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;component&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;retriever&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Retriever&#39;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Query&#39;&lt;/span&gt;])  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;pipeline&lt;/span&gt;.&lt;span style=&#34;color:#268bd2&#34;&gt;add_node&lt;/span&gt;(&lt;span style=&#34;color:#268bd2&#34;&gt;component&lt;/span&gt;=&lt;span style=&#34;color:#268bd2&#34;&gt;ranker&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;name&lt;/span&gt;=&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Ranker&#39;&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;inputs&lt;/span&gt;=[&lt;span style=&#34;color:#2aa198&#34;&gt;&#39;Retriever&#39;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And print the first part of the top three documents again:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#586e75;background-color:#eee8d5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Ideal&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;ditching&lt;/span&gt;, &lt;span style=&#34;color:#268bd2&#34;&gt;road&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;grading&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#cb4b16&#34;&gt;all&lt;/span&gt;-&lt;span style=&#34;color:#268bd2&#34;&gt;around&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;farm&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;use&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;RBT40&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;SERIES&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;65&lt;/span&gt; - &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;100&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;HP&lt;/span&gt; -- &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;84&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34; 96&#34;&lt;/span&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;108&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&#34; Ca  &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Small&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;farms&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;and&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;food&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;plots&lt;/span&gt; ... &lt;span style=&#34;color:#268bd2&#34;&gt;just&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;right&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;the&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Land&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pride&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;disc&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Land&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pride&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Disc&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Harrows&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;break&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;up&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;th&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&gt;&gt;&gt; &lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;2&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Double&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pocket&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Chart&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Stand&lt;/span&gt; &lt;span style=&#34;color:#859900&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Storage&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Assembly&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Instructions&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Contents&lt;/span&gt;: &lt;span style=&#34;color:#268bd2&#34;&gt;A&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Bottom&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Side&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;Pole&lt;/span&gt; (&lt;span style=&#34;color:#2aa198;font-weight:bold&#34;&gt;4&lt;/span&gt;) &lt;span style=&#34;color:#268bd2&#34;&gt;B&lt;/span&gt;. &lt;span style=&#34;color:#268bd2&#34;&gt;Mi&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Adding a ranker to your semantic search engine can definitely improve your results, especially if you’re unhappy with the retriever’s results. From here on, there are no limits to your creativity. For instance, the retriever-ranker setup works well in connection with a  
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/summarizer&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;summarizer node&lt;/a&gt;, in a popular  
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/pipelines#example-creating-a-retriever-ranker-summarizer-pipeline&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;custom pipeline setup&lt;/a&gt;. And, of course, we would be remiss if we didn’t mention the crucial role of semantic search in most applications of open-domain  
&lt;a href=&#34;https://www.deepset.ai/blog/haystack-question-answering-at-scale&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;question answering&lt;/a&gt;  (QA).&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/how-to-build-a-semantic-search-engine-in-python/open-domain-qa.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Because question answering models are so computationally expensive, it is necessary to install a coarse “sieve” that can extract the best answer candidates to a query from a large corpus. Semantic search models like the retrieval model you used earlier have that property, and are therefore part of every open-domain question answering system. In a question-answering pipeline, the reader node contains the actual QA model, which “reads” through only the documents that the retriever has pre-selected from the larger corpus.&lt;/p&gt;
&lt;h2 id=&#34;come-join-the-haystack-community&#34;&gt;Come Join the Haystack Community&lt;/h2&gt;
&lt;p&gt;Haystack is the framework of choice for the Python programmer looking to implement a semantic search system. Moreover, anyone who wants to use the latest NLP models in their product — be it an app, a website, or something else — can do so easily through Haystack’s modular and application-oriented nature.&lt;/p&gt;
&lt;p&gt;If you want to tweak your semantic search system and use, say, a different database as your document store, have a look at our  
&lt;a href=&#34;https://docs.haystack.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Do you have more questions or simply want to see what other people are up to in the applied NLP space? We run a vibrant  
&lt;a href=&#34;https://haystack.deepset.ai/community&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NLP community on Discord&lt;/a&gt;, which is open to everyone. Meet other open source enthusiasts and talk directly to our team members.&lt;/p&gt;
&lt;p&gt;Want to add more nodes to your pipeline, fine-tune your own models or create new datasets? Why not check out the  
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack repository on GitHub&lt;/a&gt;  — and give us a star while you’re there?&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>When and How to Train Your Own Language Model</title>
      <link>https://haystack.deepset.ai/blog/when-and-how-to-train-a-language-model/</link>
      <pubDate>Wed, 03 Aug 2022 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/when-and-how-to-train-a-language-model/</guid>
      <description>
      
      
      

      
      

      <p>Many people, when considering whether to incorporate natural language processing functionality into their product, shy away from the perceived complexity of the task. Modern-day NLP operates with huge language models that learn from large amounts of data.</p>
<p>What many beginners don’t know, however, is that the vast majority of use cases  <em>don’t require</em>  training a new language model from scratch. There are already tens of thousands of pre-trained models freely available online, which can be used out of the box.</p>
<p>Still, there are many use cases that do benefit from fine-tuning or domain adaptation, which means refining a pre-trained language model on a smaller custom dataset. In this article, we’ll guide you through the process of experimenting with different language models and understanding when to train your own models.</p>
<h2 id="recap-what-is-a-language-model">Recap: What Is a Language Model?</h2>
<p>A  
<a href="https://www.deepset.ai/blog/what-is-a-language-model" target="_blank" rel="noopener">language model</a>  is a computational, data-based representation of a natural language. Natural languages are languages that evolved from human usage (like English or Japanese), as opposed to constructed languages like those used for programming. With the help of language models, computers can process natural language.</p>
<p>A language model is not a knowledge base. Sometimes we can be misled into thinking that language models know things because they answer all our questions so eloquently. But in actuality, most language models are purely geared towards a human-like  <em>understanding</em> of language, not the memorization of facts. Their perceived knowledge comes from the database that they operate on.</p>
<p>With the help of a framework like  
<a href="https://haystack.deepset.ai/overview/intro" target="_blank" rel="noopener">Haystack</a>, language models may be placed on top of any textual database. That’s why you could deploy any language model out of the box for your own use case, without modifying or training it any further.</p>
<h2 id="working-with-pre-trained-language-models">Working with Pre-trained Language Models</h2>
<p>The  
<a href="https://huggingface.co/models" target="_blank" rel="noopener">Hugging Face model hub</a>  is the go-to location for finding pre-trained language models — that is, models which have already been trained and are ready for use. You’ll find tens of thousands of models on the model hub, which differ in language, use case and size.</p>
<h2 id="some-preliminary-considerations">Some Preliminary Considerations</h2>
<p>We usually recommend experimenting with  <em>multiple</em>  models, to find the one that is best suited to your application, your overall system pipeline and your computing resources. To find the right models for your use case, it’s helpful to think about a couple of questions in advance:</p>
<ul>
<li>Who are the prospective users and what can they expect from your system?</li>
<li>How fast does it need to be?</li>
<li>How accurate does it need to be?</li>
<li>What computing resources do you have?</li>
</ul>
<p>The last three questions are relevant because they might put some restrictions on the size of the model that you can operate with. High-performing language models are usually very large, which means that they take up space on your hard drive, are slow to train and take longer to make a prediction. In the past years, the NLP field has come up with ingenious techniques for  
<a href="https://www.deepset.ai/blog/knowledge-distillation-with-haystack" target="_blank" rel="noopener">knowledge distillation</a>  that make models  <em>smaller</em>  while retaining their  <em>prediction quality</em>.</p>
<p>Another point to consider is the shape of your data. Is it already neatly stored in some database, like  
<a href="https://www.deepset.ai/elasticsearch-integration" target="_blank" rel="noopener">Elasticsearch</a>  or  
<a href="https://www.deepset.ai/weaviate-vector-search-engine-integration" target="_blank" rel="noopener">Weaviate</a>? Or does it consist of a collection of documents in some folder on your desktop? In the latter case, you’ll probably want to perform some preprocessing. With just a few lines of code, Haystack lets you extract text from PDF or TXT files, as well as cleaning and splitting it into snippets of uniform length (have a look at  
<a href="https://haystack.deepset.ai/tutorials/08_preprocessing" target="_blank" rel="noopener">this tutorial</a>  for details.)</p>
<h2 id="experimenting-with-different-models">Experimenting with Different Models</h2>
<p>Once you’ve chosen a couple of candidate models, it’s time to plug them into your pipeline and start evaluating them. To assess how suited the models’ capabilities are to your use case, it’s a good idea to prepare a few samples from your own data and  
<a href="https://www.deepset.ai/blog/labeling-data-with-haystack-annotation-tool" target="_blank" rel="noopener">annotate them</a>.</p>
<p>The importance of  <em>curating your own datasets</em>  cannot be overstated. Machine learning models revolve entirely around data. If they’re trained on low-quality data, the models themselves won’t be worth much. Similarly, you can only evaluate the quality of a model’s predictions if you have ground-truth labels against which those predictions can be compared.</p>
<p>Our  
<a href="https://www.deepset.ai/blog/how-to-evaluate-question-answering" target="_blank" rel="noopener">evaluation mode</a>  outputs a couple of metrics that quantify a model’s prediction quality. If you’ve never evaluated an NLP model before, the  
<a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html" target="_blank" rel="noopener">F1 score</a>  is a pretty safe bet. It strikes a balance between  <em>precision</em>  (how many of the data points that the model assigned to a class actually belong to the class?) and  <em>recall</em>  (how many datapoints of a class did the model correctly assign to the class?) by taking their  
<a href="https://en.wikipedia.org/wiki/Harmonic_mean" target="_blank" rel="noopener">harmonic mean</a>, a mathematical average that works well with ratios. Now you can run your pipeline with different models and compare their F1 scores.</p>
<p>In our experience, out-of-the-box models perform reasonably well for most use cases. When it comes to  
<a href="https://www.deepset.ai/question-answering-use-case" target="_blank" rel="noopener">question answering</a>  in English, for example, most of our users are perfectly happy with the prediction quality of  
<a href="https://huggingface.co/deepset/roberta-base-squad2" target="_blank" rel="noopener">our RoBERTa-base-SQuAD model</a>. But if the scores are too low for your liking, or you’re just feeling experimental, you could go a step further and fine-tune an existing model. In that case, the original score acts as a  
<a href="https://towardsdatascience.com/baseline-models-your-guide-for-model-building-1ec3aa244b8d" target="_blank" rel="noopener">baseline</a>  against which you can compare your next generation models.</p>
<p>Sometimes, it’s hard to define what “good” really means. Is an F1 score of .75 good or bad? It depends! The best way to find out is to  <em>let your users judge</em>. While we know that involving your users at this early stage might be tricky, they can provide invaluable feedback. For example, you could set up different pipelines in  
<a href="https://www.deepset.ai/blog/announcing-deepset-cloud" target="_blank" rel="noopener">deepset Cloud</a>  and let your users provide feedback directly in the interface. That will give you a much clearer idea of what your data can accomplish in combination with a pre-trained language model, and whether you might want to fine-tune it.</p>
<h2 id="when-to-train-your-own-language-model">When to Train Your Own Language Model</h2>
<p>It will rarely be feasible to train your own Transformer-based language models from scratch. However, if you do have the data and computing resources to train a new BERT model for an  
<a href="https://www.deepset.ai/blog/nlp-resources-beyond-english" target="_blank" rel="noopener">underrepresented language</a>, be sure to share the model on the model hub, so that others can benefit from it too. You can read about our own experience training several Transformer-based language models for German  
<a href="https://www.deepset.ai/german-bert" target="_blank" rel="noopener">here</a>.</p>
<p>In most cases, however, you’ll want to look into  
<a href="https://haystack.deepset.ai/guides/domain-adaptation" target="_blank" rel="noopener">domain adaptation</a>  and/or fine-tuning. While the industry has come to use the two terms interchangeably, they originally describe two different techniques: fine-tuning means further training a general-purpose, pre-trained model to accomplish a specific task, question answering for example. Domain adaptation, on the other hand, means further training a model to better understand a domain-specific language, like legal or scientific jargon.</p>
<p>If you find that the prediction quality of the existing models is not up to scratch — either because your use case is not captured well, or your documents use a domain-specific language not represented by the likes of FinBERT, LEGAL-BERT and SciBERT — then it’s time to annotate more data and use it to subject your pre-trained models to a few more training steps.</p>
<h2 id="how-to-fine-tune-a-language-model">How to Fine-tune a Language Model</h2>
<p>If done correctly, fine-tuning can be a rewarding process. As you tweak your pre-trained model and feed it more use-case-specific data, its prediction quality will increase, at times dramatically.</p>
<p>You can fine-tune your model in Haystack with just a few lines of code. Find out how by following  
<a href="https://haystack.deepset.ai/tutorials/02_finetune_a_model_on_your_data" target="_blank" rel="noopener">our tutorial</a>. Here’s what else you should know about fine-tuning:</p>
<ol>
<li><strong>You’ll need more data.</strong>  Unlike in the process above, where we employed user-specific data to  <em>evaluate</em>  a language model, for fine-tuning you’ll also need  <em>labeled data</em>  for training. The number of datapoints required depends on your use case, the size of the models and how diverse your overall dataset is. As an alternative (or complement) to annotating data, you could look into  
<a href="https://neptune.ai/blog/data-augmentation-nlp" target="_blank" rel="noopener">data augmentation</a>.</li>
<li><strong>Fine-tuning (and model training in general) is an iterative process.</strong> Evaluate your model once it’s been trained, and try to beat that score by  
<a href="https://www.deepset.ai/blog/parameter-tweaking-get-faster-answers-from-your-haystack-pipeline" target="_blank" rel="noopener">tweaking some model parameters</a>  and training it again. To identify your ideal model settings, you’ll probably need to go through a few iterations of train-evaluate-tweak-repeat.  
<a href="https://haystack.deepset.ai/guides/domain-adaptation#finetuning" target="_blank" rel="noopener">Fine-tuning a model</a>  in Haystack is as simple as calling .train() on an initialized model.</li>
<li><strong>Pro-tip: experiment with smaller models rather than larger ones.</strong>  To reduce the time consumed by iteratively training your models during fine-tuning, you’ll want to keep the model small. We often fine-tune a smaller version of a given architecture (say, BERT-base instead of BERT-large) to find the optimal  
<a href="https://en.wikipedia.org/wiki/Hyperparameter_%28machine_learning%29" target="_blank" rel="noopener">hyperparameter</a>  settings. Once you’ve found the right settings, you can apply them to the larger model.</li>
<li><strong>Pro-tip: distill your models.</strong>  Most use cases will benefit from smaller, faster models. That’s why we are huge fans of  
<a href="https://www.deepset.ai/blog/knowledge-distillation-with-haystack" target="_blank" rel="noopener">model distillation</a>. That technique lets you “distill” the knowledge from a larger model into a smaller one. The result is a model that’s  <em>faster</em>, more environmentally friendly, and cheaper to use in production. What’s not to like?</li>
</ol>
<h2 id="the-case-for-data-labeling">The Case for Data Labeling</h2>
<p>Many people underestimate the role data labeling can play in machine learning. It’s boring, monotonous, and can seem disappointingly manual when we expect our models to perform some kind of AI magic. That’s probably why we often see teams spend way too many hours trying to squeeze more out of their language models, when that time would actually be much better spent annotating some data.</p>
<p>What  <em>really</em> makes for good models is annotated data, specifically “difficult” annotations, as they can teach your model to deal with cases that even humans find hard to handle.</p>
<p>While we’ll admit that annotation might not be the most fun work, there are tools to make the process easier for everyone. For example, the  
<a href="https://docs.haystack.deepset.ai/docs/annotation" target="_blank" rel="noopener">Haystack annotation tool</a>  provides the framework for a more streamlined process. Clear guidelines go a long way toward a well annotated and consistent dataset. It’s also valuable to engage with your own data intimately, as it will increase your understanding of the use case and why certain predictions may be hard for your model.</p>
<p>So to really drive the point home: we recommend investing in  <em>data annotation</em>  rather than model creation. Machine learning researchers have worked hard to come up with model architectures that emulate linguistic intuition faithfully, and new techniques are constantly emerging to make existing models smaller and faster. But you and your team’s expertise lies in your own data — and that is precisely the area where you can have the biggest impact on your models’ performance.</p>
<h2 id="getting-started">Getting Started</h2>
<p>Whether on a hosted platform like deepset Cloud or in your own setup, model training  <em>doesn’t have to be a burden</em>; it can actually be kind of fun. Annotate data, tweak some parameters and watch your language model as it becomes smarter and more adapted to your use case.</p>
<p>We have accompanied many different teams on their quest to make NLP work for them. Come talk to us about your project in  
<a href="https://haystack.deepset.ai/community" target="_blank" rel="noopener">our Discord community</a>, or check out  
<a href="https://github.com/deepset-ai/haystack" target="_blank" rel="noopener">the Haystack GitHub repository</a>.</p>
<p>If the process of evaluating and fine-tuning manually seems daunting and time-consuming,  
<a href="https://www.deepset.ai/deepset-cloud" target="_blank" rel="noopener">have a look at deepset Cloud</a>, our end-to-end platform for designing and deploying NLP-based solutions.</p>
<p>Finally, if you’re looking to learn more about modern NLP, make sure to  
<a href="https://landing.deepset.ai/nlp-for-product-managers" target="_blank" rel="noopener">download our free ebook</a>  “NLP for Product Managers.”</p>

      </description>
    </item>
    
    <item>
      <title>What is a Language Model?</title>
      <link>https://haystack.deepset.ai/blog/what-is-a-language-model/</link>
      <pubDate>Wed, 20 Jul 2022 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/what-is-a-language-model/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      
        
        

        

        
      
        
        

        

        
      

      &lt;p&gt;Our aim at  
&lt;a href=&#34;https://www.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset&lt;/a&gt;  is that everyone, no matter their level of technical background, can harness the power of modern natural language processing (NLP) and language models for their own use case.  
&lt;a href=&#34;https://haystack.deepset.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt;, our open-source framework, makes this a reality.&lt;/p&gt;
&lt;p&gt;When we talk to our users, we encounter common sources of confusion about NLP and machine learning. Therefore, in the upcoming blog posts, we want to explain some basic NLP concepts in understandable language. First up: language models.&lt;/p&gt;
&lt;h2 id=&#34;language-models-in-nlp&#34;&gt;Language Models in NLP&lt;/h2&gt;
&lt;p&gt;Language models take center stage in NLP. But what is a language model? To answer that question, let’s first clarify the term  &lt;em&gt;model&lt;/em&gt;  and its use in machine learning.&lt;/p&gt;
&lt;h3 id=&#34;what-is-a-machine-learning-model&#34;&gt;What is a machine learning model?&lt;/h3&gt;
&lt;p&gt;The real world is complex and confusing. Models serve to represent a particular field of interest — a domain — in simpler terms. For example, weather models are simplified representations of meteorological phenomena and their interactions. These models help us understand the weather domain better and make predictions about it.&lt;/p&gt;
&lt;p&gt;In machine learning, models are much the same. They serve mainly to predict events based on past data, which is why they’re also known as forecasting or predictive models.&lt;/p&gt;
&lt;p&gt;The data that we feed to an ML algorithm allows it to devise a model of the data’s domain. That data should represent reality most faithfully, so that the models which are based on it can approximate the real world as closely as possible.&lt;/p&gt;
&lt;h3 id=&#34;what-is-a-language-model&#34;&gt;What is a language model?&lt;/h3&gt;
&lt;p&gt;A language model is a machine learning model designed to represent the language domain. It can be used as a basis for a number of different language-based tasks, for instance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://www.deepset.ai/question-answering-use-case&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Question answering&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://www.deepset.ai/semantic-search-use-case&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Semantic search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/summarizer&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Summarization&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;and plenty of other tasks that operate on natural language.&lt;/p&gt;
&lt;p&gt;In a domain like weather forecasting, it’s easy to see how past data helps a model to predict a future state. But how do you apply that to language? In order to understand how the concept of prediction factors into language modeling, let’s take a step back and talk about linguistic intuition.&lt;/p&gt;
&lt;h4 id=&#34;linguistic-intuition&#34;&gt;&lt;strong&gt;Linguistic intuition&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;As the speaker of a language, you have assembled an astonishing amount of knowledge about it, much of which cannot be taught explicitly. It includes judgments about grammaticality (whether or not a sentence is syntactically correct), synonymity (whether two words mean roughly the same) and sentence completion. Suppose I asked you to fill in the gap in the following sentence:&lt;/p&gt;
&lt;p&gt;“Julia is looking for ___ purse.”&lt;/p&gt;
&lt;p&gt;You’d probably say “her” or “my” or any other pronoun. Even a possessive noun phrase like “the cat Pablo’s” would work. But you wouldn’t guess something like “toothbrush” or “Las Vegas.” Why? Because of linguistic intuition.&lt;/p&gt;
&lt;h4 id=&#34;training-a-language-model&#34;&gt;&lt;strong&gt;Training a language model&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Language models seek to model linguistic intuition. That is not an easy feat. As we’ve said, linguistic intuition isn’t learned through schooling but through constant use of a language (Noam Chomsky even postulated the existence of a special “
&lt;a href=&#34;https://chomsky.info/198311__/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;language organ&lt;/a&gt;” in humans). So how can we model it?&lt;/p&gt;
&lt;p&gt;Today’s state of the art in NLP is driven by large neural networks. Neural language models like  
&lt;a href=&#34;https://en.wikipedia.org/wiki/BERT_%28language_model%29&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BERT&lt;/a&gt;  learn something akin to linguistic intuition by processing millions of data points. In machine learning, this process is known as “training.”&lt;/p&gt;
&lt;p&gt;To train a model, we need to come up with tasks that cause it to learn a representation of a given domain. For language modeling, a common task consists of  
&lt;a href=&#34;https://towardsdatascience.com/masked-language-modelling-with-bert-7d49793e5d2c&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;completing the missing word in a sentence&lt;/a&gt;, much like in our example earlier. Through this and other training tasks, a language model learns to encode the meanings of words and longer text passages.&lt;/p&gt;
&lt;p&gt;So how do you get from a computational representation of a language’s semantic properties to a model that can perform specific tasks like question answering or summarization?&lt;/p&gt;
&lt;h2 id=&#34;general-purpose-versus-domain-specific-language-models&#34;&gt;General-purpose Versus Domain-specific Language Models&lt;/h2&gt;
&lt;p&gt;General language models like BERT or its bigger sister  
&lt;a href=&#34;https://arxiv.org/abs/1907.11692&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;RoBERTa&lt;/a&gt;  require huge amounts of data to learn a language’s regularities. NLP practitioners often use Wikipedia and other freely available collections of textual data to train them. By now, BERT-like models exist for practically all the  
&lt;a href=&#34;https://www.deepset.ai/blog/nlp-resources-beyond-english&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;languages with a sufficiently large Wikipedia&lt;/a&gt;. In fact, we at deepset have produced several models for German and English, which you can check out on  
&lt;a href=&#34;https://www.deepset.ai/models&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our models page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/what-is-a-language-model/bert-family.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;So what can you do with these models? Why are they so popular? Well, BERT can be used to enhance language understanding, for example in the  
&lt;a href=&#34;https://blog.google/products/search/search-language-understanding-bert/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Google search engine&lt;/a&gt;. But arguably the biggest value of general-purpose language models is that they can serve as a basis for other language-based tasks like question answering. By exposing it to different datasets and adjusting the training objective, we can adapt a general language model to a specific use case.&lt;/p&gt;
&lt;h3 id=&#34;fine-tuning-a-language-model&#34;&gt;Fine-tuning a language model&lt;/h3&gt;
&lt;p&gt;There are many tasks that benefit from a representation of linguistic intuition. Examples of such tasks are  
&lt;a href=&#34;https://en.wikipedia.org/wiki/Sentiment_analysis&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;sentiment analysis&lt;/a&gt;,  
&lt;a href=&#34;https://en.wikipedia.org/wiki/Named-entity_recognition&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;named entity recognition&lt;/a&gt;, question answering, and others. Adapting a general-purpose language model to such a task is known as  
&lt;a href=&#34;https://huggingface.co/docs/transformers/training&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;fine-tuning&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/what-is-a-language-model/fine-tuning.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;Fine-tuning requires data specific to the task you want the model to accomplish. For instance, to fine-tune your model to the question-answering task, you need a  
&lt;a href=&#34;https://rajpurkar.github.io/SQuAD-explorer/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataset of question-answer pairs&lt;/a&gt;. Such data often needs to be created and curated manually, which makes it quite expensive to generate. On the bright side, fine-tuning requires much less data than training a general language model.&lt;/p&gt;
&lt;h3 id=&#34;where-to-look-for-models&#34;&gt;Where to look for models&lt;/h3&gt;
&lt;p&gt;Both general-purpose models and fine-tuned models can be saved and shared. The  
&lt;a href=&#34;https://huggingface.co/models&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Hugging Face model hub&lt;/a&gt;  is the most popular platform for model-sharing, with tens of thousands of models of different sizes, for different languages and use cases. Chances are high that your own use case is already covered by one of the models on the model hub.&lt;/p&gt;
&lt;p&gt;To help you find a model that might fit your needs, you can use the interface on the left side of the model hub page to filter by task, language, and other criteria. This lets you specifically look for models that have been trained for question answering, summarization, and many other tasks. Once you’ve found a suitable model, all you need to do is plug it into your  
&lt;a href=&#34;https://docs.haystack.deepset.ai/docs/pipelines&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;NLP pipeline&lt;/a&gt;, connect to your database, and start experimenting.&lt;/p&gt;
&lt;h3 id=&#34;how-to-handle-domain-specific-language&#34;&gt;How to handle domain-specific language&lt;/h3&gt;
&lt;p&gt;Though we often talk about languages as if they were homogeneous entities, the reality is very far from that. There are, for example, some professional domains — like medicine or law — that use highly specialized jargon, which non-experts can barely understand. Similarly, when a general BERT model is used to process data from one of those domains, it might perform poorly — just like a person without a degree in the field.&lt;/p&gt;
&lt;p&gt;A technique called  &lt;em&gt;domain adaptation&lt;/em&gt;  provides the solution: here, the pretrained model undergoes additional training steps, this time on specialized data like legal documents or medical papers.&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/what-is-a-language-model/domain-adaptation.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;p&gt;The Hugging Face model hub contains BERT-based language models that have been adapted to the scientific, medical, legal, or financial domain. These domain-specific language models can then serve as a basis for further downstream tasks. For instance,  
&lt;a href=&#34;https://huggingface.co/StivenLancheros/roberta-base-biomedical-clinical-es-finetuned-ner-CRAFT_AugmentedTransfer_ES&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this highly specialized model&lt;/a&gt;  extracts named entities (like names for cells and proteins) from biomedical texts in English and Spanish.&lt;/p&gt;
&lt;h2 id=&#34;what-can-language-models-do&#34;&gt;What Can Language Models Do?&lt;/h2&gt;
&lt;p&gt;Language models can seem very smart. In  
&lt;a href=&#34;https://huggingface.co/spaces/Tuana/GoT-QA-Haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this demo&lt;/a&gt;, for example, we show how well  
&lt;a href=&#34;https://huggingface.co/deepset/roberta-base-squad2-distilled&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our RoBERTa model&lt;/a&gt;  can answer questions about the Game of Thrones universe. It’s important to note, though, that this language model doesn’t actually  &lt;em&gt;know&lt;/em&gt;  anything. It is just very good at extracting the right answers from documents — thanks to its mastery of human language and the fine-tuning it received on a question-answering dataset. It operates similarly to a human agent reading through documents to extract information from them, only much, much faster!&lt;/p&gt;
&lt;p&gt;Other types of language models take a completely different approach. For example, the famed  
&lt;a href=&#34;https://openai.com/blog/better-language-models/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GPT family of generative language models&lt;/a&gt;  actually  &lt;em&gt;do&lt;/em&gt;  
&lt;a href=&#34;https://www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;memorize information&lt;/a&gt;. They have so many parameters — billions — that they can store information picked up during training  &lt;em&gt;in addition to&lt;/em&gt;  learning the language’s regularities.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;So what can a language model do? Exactly what it’s been trained to do — not more, not less. Some models are trained to extract answers from text, others to generate answers from scratch. Some are trained to summarize text, others simply learn to represent language.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If your documents don’t use highly specialized language, a pre-trained model might work just fine — no further training required. Other use cases, however, might benefit from additional training steps. In our upcoming blog post, we’ll explore in more detail how you can work with techniques like fine-tuning and domain adaptation to get the most out of language models.&lt;/p&gt;
&lt;h2 id=&#34;composable-nlp-with-haystack&#34;&gt;Composable NLP with Haystack&lt;/h2&gt;
&lt;p&gt;Modern NLP builds on decades of research and incorporates complex concepts from math and computer science. That’s why we promote a practice of  
&lt;a href=&#34;https://www.deepset.ai/hugging-face-model-hub-integration&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;em&gt;composable NLP&lt;/em&gt;&lt;/a&gt;  with Haystack, which lets users build their own NLP-based systems through a mix-and-match approach. You don’t have to be an NLP practitioner to use our framework, just as you don’t need to know anything about hardware or electricity to use a computer.&lt;/p&gt;
&lt;p&gt;Want to see how to integrate pre-trained language models into an NLP pipeline? Check out our  
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GitHub repository&lt;/a&gt;  or sign up to  
&lt;a href=&#34;https://www.deepset.ai/deepset-cloud&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;deepset Cloud&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To learn more about NLP, make sure to  
&lt;a href=&#34;https://landing.deepset.ai/nlp-for-product-managers&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;download our free ebook&lt;/a&gt;  &lt;em&gt;NLP for Product Managers&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Finally, we’d be happy to see you on  
&lt;a href=&#34;https://haystack.deepset.ai/community&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our Discord server&lt;/a&gt;, where members of the deepset team answer questions from the community.&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>What Is Text Vectorization? Everything You Need to Know</title>
      <link>https://haystack.deepset.ai/blog/what-is-text-vectorization-in-nlp/</link>
      <pubDate>Fri, 03 Dec 2021 00:00:00 +0000</pubDate>
      
      <guid>https://haystack.deepset.ai/blog/what-is-text-vectorization-in-nlp/</guid>
      <description>
      
      
      

      
      
        
        

        

        
      

      &lt;p&gt;For as long as we have had computers, there has been the question of how to represent data in a way that machines can work with. In natural language processing (NLP), we often talk about text vectorization — representing words, sentences, or even larger units of text as vectors (or “vector embeddings”). Other data types, like images, sound, and videos, may be encoded as vectors as well. But what exactly are those vectors, and how can you use them in your own applications?&lt;/p&gt;
&lt;p&gt;In this post, we’ll track the history of text vectorization in machine learning to develop a full understanding of the modern techniques. We’ll briefly look at traditional count-based methods before moving on to Word2Vec embeddings and BERT’s high-dimensional vectors. We will discuss how Transformer-based language models have brought deep semantics to text vectorization, and what that means for modern search systems. Finally, we will look at the recent and exciting trend of vector databases.&lt;/p&gt;
&lt;h2 id=&#34;count-based-text-vectorization-simple-beginnings&#34;&gt;Count-Based Text Vectorization: Simple Beginnings&lt;/h2&gt;
&lt;p&gt;In programming, a vector is a data structure that is similar to a list or an array. For the purpose of input representation, it is simply a succession of values, with the number of values representing the vector’s “dimensionality.” Vector representations contain information about the qualities of an input object. They offer a uniform format that computers can easily process.&lt;/p&gt;
&lt;h2 id=&#34;bag-of-words&#34;&gt;Bag of Words&lt;/h2&gt;
&lt;p&gt;One of the simplest vectorization methods for text is a  
&lt;a href=&#34;https://en.wikipedia.org/wiki/Bag-of-words_model&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;bag-of-words&lt;/a&gt;  (BoW) representation. A BoW vector has the length of the entire vocabulary — that is, the set of unique words in the corpus. The vector’s values represent the frequency with which each word appears in a given text passage:&lt;/p&gt;
&lt;p&gt;






    &lt;img loading=&#34;lazy&#34; https://haystack.deepset.ai/blog/what-is-text-vectorization-in-nlp/text-to-vec.png alt=&#34;&#34;  /&gt;
&lt;/p&gt;
&lt;h3 id=&#34;tf-idf&#34;&gt;TF-IDF&lt;/h3&gt;
&lt;p&gt;Weighted BoW text vectorization techniques like  
&lt;a href=&#34;https://haystack.deepset.ai/components/retriever#tf-idf&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;TF-IDF&lt;/a&gt;  (short for “term frequency-inverse document frequency), on the other hand, attempt to give higher relevance scores to words that occur in fewer documents within the corpus. To that end, TF-IDF measures the frequency of a word in a text against its overall frequency in the corpus.&lt;/p&gt;
&lt;p&gt;Think of a document that mentions the word “oranges” with high frequency. TF-IDF will look at all the other documents in the corpus. If “oranges” occurs in many documents, then it is not a very significant term and is given a lower weighting in the TF-IDF text vector. If it occurs in just a few documents, however, it is considered a distinctive term. In that case, it helps characterize the document within the corpus and as such receives a higher value in the vector.&lt;/p&gt;
&lt;h2 id=&#34;bm25&#34;&gt;BM25&lt;/h2&gt;
&lt;p&gt;While more sophisticated than the simple BoW approach, TF-IDF has some shortcomings. For example, it does not address the fact that, in short documents, even just a single mention of a word might mean that the term is highly relevant.  
&lt;a href=&#34;https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BM25&lt;/a&gt;  was introduced to address this and other issues. It is an improvement over TF-IDF, in that it takes into account the length of the document. It also dampens the effect of having many occurrences of a word in a document.&lt;/p&gt;
&lt;p&gt;Because BoW methods will produce long vectors that contain many zeros, they’re often called “sparse.” In addition to being language-independent, sparse vectors are quick to compute and compare.  
&lt;a href=&#34;https://www.deepset.ai/blog/understanding-semantic-search&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Semantic search&lt;/a&gt;  systems use them for quick document retrieval.&lt;/p&gt;
&lt;p&gt;Let’s now look at a more recent encoding technique that aims to capture not just the lexical but also the semantic properties of words.&lt;/p&gt;
&lt;h2 id=&#34;word2vec-inferring-meaning-from-context&#34;&gt;Word2Vec: Inferring Meaning from Context&lt;/h2&gt;
&lt;p&gt;Words are more than just a collection of letters. As speakers of a language, we might understand what a word means and how to use it in a sentence. In short, we would understand its semantics. The sparse, count-based methods we saw above do not account for the  &lt;em&gt;meaning&lt;/em&gt; of the words or phrases that our system processes.&lt;/p&gt;
&lt;p&gt;In 2013, a team led by NLP researcher Tomáš Mikolov came up with the  
&lt;a href=&#34;https://arxiv.org/abs/1301.3781&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Word2Vec method&lt;/a&gt;, which could represent the semantic and syntactic properties of words through “word embeddings.” Word2Vec follows the idea that the meaning of words lies in their distributional properties — the contexts in which a word is used. There are two main implementations of Word2Vec (
&lt;a href=&#34;https://www.kdnuggets.com/2018/04/implementing-deep-learning-methods-feature-engineering-text-data-cbow.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CBOW&lt;/a&gt;  and  
&lt;a href=&#34;http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;skip-gram&lt;/a&gt;). Both train a shallow neural net to represent words as feature vectors of variable length (typically 300). These vectors are  &lt;em&gt;dense&lt;/em&gt;, meaning that they consist of mostly floating point values, rather than zeros.&lt;/p&gt;
&lt;p&gt;In the high-dimensional Word2Vec embedding space, similar words lie close to each other. For example, we would expect the words “orange” and “apple” to be close, while, say, “house” or “space ship” should be further away from the pair. Semantic textual similarity is measured by way of a distance metric, typically  
&lt;a href=&#34;https://deepai.org/machine-learning-glossary-and-terms/cosine-similarity&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;cosine similarity&lt;/a&gt;. While it is virtually impossible to picture 300 dimensions in your head, the embeddings can be reduced to just two dimensions for visualization purposes, as in  
&lt;a href=&#34;https://lamyiowce.github.io/word2viz/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this interactive example&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Often, we will want to encode the meaning of longer passages, like sentences of entire documents. We could add or average the individual word vectors to produce one vector for the entire text. Another approach is the  
&lt;a href=&#34;https://arxiv.org/abs/1405.4053&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Doc2Vec&lt;/a&gt;  model, which has been trained in a manner similar to Word2Vec, only on documents rather than words.&lt;/p&gt;
&lt;p&gt;However, despite encoding semantic properties of words, the resulting text vectorization still leaves much to be desired. For one, Word2Vec encodings are rigid: a given word will always be encoded in the same way, regardless of its position in the sentence, or whether it has multiple meanings. The model also has no way to account for unknown words.&lt;/p&gt;
&lt;p&gt;Finally, Word2Vec-based methods have not had much success at representing longer text passages. Luckily, all of these problems have been timely addressed by a new technique for producing vectors: the Transformer-powered BERT.&lt;/p&gt;
&lt;h3 id=&#34;transformers-bringing-deep-semantics-to-text-vectorization&#34;&gt;Transformers: Bringing Deep Semantics to Text Vectorization&lt;/h3&gt;
&lt;p&gt;The legendary  
&lt;a href=&#34;https://arxiv.org/abs/1810.04805&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;BERT&lt;/a&gt;  (“Bidirectional Encoder Representations from Transformers”) language model by Jacob Devlin and colleagues overcomes the obstacles presented by Word2Vec-based embedding methods. BERT is able to produce contextualized word vectors by encoding a word’s position in the text in addition to the word itself. Furthermore, this approach can account for unknown words as well as words with multiple meanings.&lt;/p&gt;
&lt;p&gt;BERT’s success is based on its  
&lt;a href=&#34;https://jalammar.github.io/illustrated-transformer/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Transformer architecture&lt;/a&gt;, as well as the vast amounts of data that it uses to learn. During training, BERT “reads” the entire English-language Wikipedia and the BooksCorpus, a large collection of unpublished novels. Successors to BERT, like  
&lt;a href=&#34;https://ai.facebook.com/blog/roberta-an-optimized-method-for-pretraining-self-supervised-nlp-systems/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;RoBERTa&lt;/a&gt;, are trained on even bigger text collections. This way, Transformer-based language models learn a deep, context-aware representation of language.&lt;/p&gt;
&lt;h3 id=&#34;semantic-search-systems-better-document-retrieval-with-high-dimensional-vectors&#34;&gt;Semantic Search Systems: Better Document Retrieval with High-Dimensional Vectors&lt;/h3&gt;
&lt;p&gt;In the context of semantic search systems, we need a way to vectorize both the documents in our document store, as well as the query that we want to match them to. We’ll present two different techniques that use Transformers to accomplish that task: Sentence-BERT and Dense Passage Retrieval.&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://www.sbert.net/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Sentence-BERT&lt;/a&gt;, or SBERT, uses an encoder that can convert longer passages of text into vectors. In a semantic document search system, SBERT generates one vector for each existing document. These vectors are then indexed alongside the original document in the database. At inference time, SBERT converts the new document into a vector and compares it to the vectors in the database. The more similar two vectors are, the more similar their underlying documents.&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://haystack.deepset.ai/components/retriever#dense-passage-retrieval-recommended&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dense Passage Retrieval (DPR)&lt;/a&gt;  is a popular text vectorization method in the context of question answering systems. It is based on a dual-encoder approach and can retrieve documents that are likely to contain the answer to a given question. Consider an example from the  
&lt;a href=&#34;https://arxiv.org/abs/2004.04906&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;DPR paper&lt;/a&gt;, where the query “Who is the bad guy in lord of the rings?” is mapped to the sentence “Sala Baker is best known for portraying the villain Sauron in the Lord of the Rings trilogy.”&lt;/p&gt;
&lt;p&gt;As the example shows, DPR is able to pair questions and answer passages, even when they do not use the exact same words. The model understands that “bad guy” is a synonym for “villain.” This is what sets Transformer-based methods apart from keyword-matching techniques like TF-IDF and BM25. However, both SBERT and DPR do take longer during indexing time. That is because they run every document through a Transformer model, before storing it in the database alongside its high-dimensional vector representation.&lt;/p&gt;
&lt;h3 id=&#34;vectors-first-the-power-of-vector-databases&#34;&gt;Vectors First: The Power of Vector Databases&lt;/h3&gt;
&lt;p&gt;Popular document oriented databases like  
&lt;a href=&#34;https://www.elastic.co/what-is/elasticsearch&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Elasticsearch&lt;/a&gt;  and  
&lt;a href=&#34;https://opensearch.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;OpenSearch&lt;/a&gt;  let you store and search your vectorized text along with the original documents. This is useful as it allows you to combine keyword-based approaches with the more recent, Transformer-based methods. But with growing data collections and a shift from lexical, keyword-based to semantic search systems, there’s an increased need for databases that are tailored specifically to vectorized data.&lt;/p&gt;
&lt;p&gt;Lately, a new generation of databases has emerged to address that need. Vector-optimized databases were designed specifically for storing and searching vector data. Note that “data” doesn’t just mean text — other data types, such as images or videos, can be stored in a vector database as well. These databases are particularly fast at computing similarity between text vectors through optimization for “
&lt;a href=&#34;https://pub.towardsai.net/knn-k-nearest-neighbors-is-dead-fc16507eb3e?gi=456eb30da280&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;nearest neighbor&lt;/a&gt;” algorithms.&lt;/p&gt;
&lt;p&gt;Vector databases are highly scalable, allowing for quick search even when on millions of documents. Haystack supports  
&lt;a href=&#34;https://faiss.ai/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FAISS&lt;/a&gt;,  
&lt;a href=&#34;https://milvus.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Milvus&lt;/a&gt;, and  
&lt;a href=&#34;http://weaviate.com/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Weaviate&lt;/a&gt;  vector databases— and we cannot wait to see what you’ll build with them!&lt;/p&gt;
&lt;h2 id=&#34;get-started-with-haystack&#34;&gt;Get Started with Haystack&lt;/h2&gt;
&lt;p&gt;
&lt;a href=&#34;https://www.deepset.ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Haystack&lt;/a&gt;  is our flagship NLP framework enabling developers to add a variety of modern NLP tasks, such as semantic search, question answering, and summarization to their applications. Compute your own vectors, add them to the  
&lt;a href=&#34;https://haystack.deepset.ai/components/document-store&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;database of choice&lt;/a&gt;, and start building powerful natural language processing pipelines.&lt;/p&gt;
&lt;p&gt;Get started today by checking out our  
&lt;a href=&#34;https://github.com/deepset-ai/haystack&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GitHub repository&lt;/a&gt;  or by joining our community on  
&lt;a href=&#34;https://haystack.deepset.ai/community&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Discord&lt;/a&gt;!&lt;/p&gt;

      </description>
    </item>
    
  </channel>
</rss>