Releases · deepset-ai/haystack-experimental

04 Sep 14:57

julian-risch

v0.13.0

5502eff

v0.13.0 Latest

Latest

🧪 New Experiments

Semantic Chunking based on Sentence Embeddings

We added a new EmbeddingBasedDocumentSplitter component that splits longer texts based on sentences that are semantically related. Users benefit from Documents that are more semantically coherent. The component is initialized with a TextEmbedder. PR #353

from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack_experimental.components.preprocessors import EmbeddingBasedDocumentSplitter
doc = Document(
    content="This is a first sentence. This is a second sentence. This is a third sentence. "
    "Completely different topic. The same completely different topic."
)
embedder = SentenceTransformersDocumentEmbedder()
splitter = EmbeddingBasedDocumentSplitter(
    document_embedder=embedder,
    sentences_per_group=2,
    percentile=0.95,
    min_length=50,
    max_length=1000
)
splitter.warm_up()
result = splitter.run(documents=[doc])

Hallucination Risk Assessment for LLM Answers

The OpenAIChatGenerator can now estimate the risk of hallucinations in generated answers. You can configure a risk threshold and the OpenAIChatGenerator will refuse to return an answer if the risk of hallucination is above the threshold. Refer to research paper and repo for technical details on risk calculation. PR #359

👉 Try out the component here!

from haystack.dataclasses import ChatMessage
from haystack_experimental.components.generators.chat.openai import HallucinationScoreConfig, OpenAIChatGenerator

llm = OpenAIChatGenerator(model="gpt-4o")

rag_result = llm.run(
    messages=[
        ChatMessage.from_user(
            text="Task: Answer strictly based on the evidence provided below.\n"
                 "Question: Who won the Nobel Prize in Physics in 2019?\n"
                 "Evidence:\n"
                 "- Nobel Prize press release (2019): James Peebles (1/2); Michel Mayor & Didier Queloz (1/2).\n"
                 "Constraints: If evidence is insufficient or conflicting, refuse."
        )
    ],
    hallucination_score_config=HallucinationScoreConfig(skeleton_policy="evidence_erase"),
)
print(f"Decision: {rag_result['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {rag_result['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {rag_result['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{rag_result['replies'][0].text}")

Multi-Query Retrieval for Query Expansion

Two newly introduced components MultiQueryKeywordRetriever and MultiQueryEmbeddingRetriever enable concurrent processing of multiple queries. It works best in combination with a QueryExpander component, which given a single query of a user, generates multiple queries. You can learn more about query expansion in this jupyter notebook. PR #358

✅ Adopted Experiments

chore: removing code related with breakpoints #349
chore: remove multimodal experiment #350

Full Changelog: v0.12.0...v0.13.0

Assets 2

14 Jul 14:19

Amnah199

v0.12.0

1c7bd91

v0.12.0

🧪 New Experiments

🧠 Agent Breakpoints

We’ve introduced Agent Breakpoints—a feature that allows you to pause and inspect specific stages within the Agent component's execution.

You can use this feature to:

Place breakpoints directly on the chat_generator to debug interactions.
Add breakpoints to the tools used by the agent to inspect tool behavior during execution.

🔧 Example Usage for Agent within Pipeline

from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.tools.tool import Tool
from haystack_experimental.components.agents.agent import Agent
from typing import List
from haystack_experimental.dataclasses.breakpoints import AgentBreakpoint, Breakpoint, ToolBreakpoint


# Tool Function
def calculate(expression: str) -> dict:
    try:
        result = eval(expression, {"__builtins__": {}})
        return {"result": result}
    except Exception as e:
        return {"error": str(e)}

# Tool Definition
calculator_tool = Tool(
    name="calculator",
    description="Evaluate basic math expressions.",
    parameters={
        "type": "object",
        "properties": {
            "expression": {"type": "string", "description": "Math expression to evaluate"}
        },
        "required": ["expression"]
    },
    function=calculate,
    outputs_to_state={"calc_result": {"source": "result"}}
)

# Agent Setup
agent = Agent(
    chat_generator=OpenAIChatGenerator(),
    tools=[calculator_tool],
    exit_conditions=["calculator"],
    state_schema={
        "calc_result": {"type": int},
    }
)
debug_path = "Path to save the state"

# Breakpoint on the chat_generator of the Agent
chat_generator_breakpoint = Breakpoint("chat_generator", visit_count=0)
agent_breakpoint = AgentBreakpoint(break_point=chat_generator_breakpoint, agent_name='database_agent')

# Run the Agent
agent.warm_up()
response = agent.run(messages=[ChatMessage.from_user("What is 7 * (4 + 2)?")], break_point=agent_breakpoint, debug_path=debug_path)

# Breakpoint on the tools of the Agent
tool_breakpoint = ToolBreakpoint(component_name="tool_invoker", visit_count=0, tool_name="calculator")
agent_breakpoint = AgentBreakpoint(break_point=tool_breakpoint, agent_name='database_agent')

# Run the Agent
agent.warm_up()
response = agent.run(messages=[ChatMessage.from_user("What is 7 * (4 + 2)?")], break_point=agent_breakpoint, debug_path=debug_path)

📦 Breakpoints Dataclass

We’ve added a dedicated Breakpoint dataclass interface to standardize the way breakpoints are declared and managed.

Use Breakpoint to target generic components.
Use AgentBreakpoint for setting breakpoints on the agent.
Use ToolBreakpoint to set breakpoints on specific tools used by the agent.

Related PRs

feat: adding agents back to the experimental repo (#326)

Other Updates

test: update Bedrock tests with ComponentInfo (#343)
docs: improve some multimodal docstrings (#342)

Assets 2

02 Jul 10:36

anakin87

v0.11.0

8f13872

v0.11.0

🧪 New Experiments

Query Expander component

We are introducing a component that generates a list of semantically similar queries to improve retrieval recall in RAG systems.

from haystack.components.generators.chat.openai import OpenAIChatGenerator
from haystack_experimental.components.query import QueryExpander

expander = QueryExpander(
    chat_generator=OpenAIChatGenerator(model="gpt-4.1-mini"),
    n_expansions=3
)

result = expander.run(query="green energy sources")
print(result["queries"])
# Output: ['alternative query 1', 'alternative query 2', 'alternative query 3', 'green energy sources']
# Note: Up to 3 additional queries + 1 original query (if include_original_query=True)

# To control total number of queries:
expander = QueryExpander(n_expansions=2, include_original_query=True)  # Up to 3 total
# or
expander = QueryExpander(n_expansions=3, include_original_query=False)  # Exactly 3 total

feat: add QueryExpander component by @mpangrazzi in #331

🔀 New Document Routers

We're introducing two new Routers: DocumentTypeRouter and DocumentLengthRouter.

🖼️ New Multimodal Features

We introduced several new multimodal features, mostly focused on indexing and retrieval.
A notebook will be published soon to show practical usage examples.

multimodal support in AmazonBedrockChatGenerator
new image Converters
SentenceTransformersDocumentImageEmbedder: a component to compute embeddings for image-based documents
LLMDocumentContentExtractor: a component to extract textual content from image-based documents using a vision-enabled LLM

Related PRs

refactor: adopt pypdfium2 for PDF to image conversion by @anakin87 in #308
feat: multimodal support in AmazonBedrockChatGenerator by @anakin87 in #307
test: Fix mypy typing by @sjrl in #309
feat: Add DocumentToImageConent component to help enable RAG with image Documents by @sjrl in #311
chore: fix format for DocumentToImageContent by @anakin87 in #318
chore: ignore type errors in Bedrock monkey patches by @anakin87 in #322
feat: add SentenceTransformersDocumentImageEmbedder by @anakin87 in #319
feat: Add DocumentTypeRouter by @sjrl in #321
refactor: refactor multimodal components and utility functions by @anakin87 in #324
fix: Fix storage of file path in ImageContent by @sjrl in #325
refactor: Refactor converters to follow embedders directory structure by @sjrl in #333
feat: Add normalize_embeddings to SentenceTransformersDocumentImageEmbedder to match signature of other embedders by @sjrl in #335
feat: add DocumentLengthRouter component by @anakin87 in #334
feat: Add ImageFileToDocument converter by @sjrl in #336
feat: Add LLMDocumentContentExtractor to enable Vision-based LLMs to describe/convert an image into text by @sjrl in #338
docs: add usage examples to docstrings of multimodal components by @anakin87 in #340

Other Updates

refactor: synchronising/merging all pipeline related code with haystack main repository by @davidsbatista in #312
chore: align Haystack experimental Hatch scripts by @anakin87 in #315
chore: align experimental type checking with Haystack by @anakin87 in #320
refactor: Refactor experimental Pipeline to use inheritancee by @sjrl in #323
fix: refactor code and update init_params in debug_state by @Amnah199 in #317
chore: fix ruff linting error by @Amnah199 in #329
fix: Fix logger message for pipeline breakpoints by @sjrl in #327
fix: Fix validate_input becoming public method by @sjrl in #337
Refactor serialization of breakpoints by @Amnah199 in #332

New Contributors

@mpangrazzi made their first contribution in #331

Full Changelog: v0.10.0...v0.11

Contributors

mpangrazzi, davidsbatista, and 3 other contributors

Assets 2

19 May 10:27

anakin87

v0.10.0

106aa00

v0.10.0

🧪 New Experiments

🖼️ Multimodal Text Generation

We are adding support for passing images in user messages and other multimodal features.

from haystack_experimental.dataclasses import ImageContent, ChatMessage
from haystack_experimental.components.generators.chat import OpenAIChatGenerator

image_url = "https://cdn.britannica.com/79/191679-050-C7114D2B/Adult-capybara.jpg"
image_content = ImageContent.from_url(image_url)

message = ChatMessage.from_user(
    content_parts=["Describe the image in short.", image_content]
)

llm = OpenAIChatGenerator(model="gpt-4o-mini")

print(llm.run([message])["replies"][0].text)

For the list of implemented features, see #302.

For more usage examples, check out the example: 📓 Introduction to Multimodal Text Generation.

Related PRs

feat: ImageContent dataclass by @anakin87 in #286
feat: Add ImageFileToImageContent and PDFToImageContent converters by @sjrl in #290
feat: multimodal support in OpenAIChatGenerator by @anakin87 in #292
chore: improve Image Converters pydoc config by @anakin87 in #295
feat: add convenience class methods to Imagecontent by @anakin87 in #294
chore: move ImageContent to a separate module by @anakin87 in #296
feat: add Jinja2 ChatMessage extension by @anakin87 in #297
feat: ImageContent visualization by @anakin87 in #300
feat: extend ChatPromptBuilder to support string templates by @anakin87 in #299
chore: update README with multimodal experiment by @anakin87 in #303
fix: move IPython import by @anakin87 in #304
feat: ImageContent validation by @anakin87 in #305

🐛 Bug Fixes

fix: Update __init__.py to use double underscore by @sjrl in #288
fix: preserve initialization parameters in debug state when run params are not supplied by @Amnah199 in #293

✅ Adopted Experiments

chore: update/clean up experimental by @anakin87 in #285
chore: Remove SuperComponent and pre-made super components. Update Readme by @sjrl in #287
chore: remove dependencies needed for MultiFileConverter by @anakin87 in #298

Other Updates

Update issue template for adding new experiments by @bilgeyucel in #283
docs: add missing pydocs by @dfokina in #291

Full Changelog: v0.9.0...v0.10.0

Contributors

sjrl, Amnah199, and 3 other contributors

Assets 2

16 Apr 11:48

davidsbatista

v0.9.0

a9da65d

v0.9.0

🔧 Updates to Experiments

Adding breakpoints to components in a Pipeline

It's now possible to set breakpoints at any component in any pipeline, forcing the pipeline execution to stop before that component runs and generating a JSON file with the complete state of the pipeline before the breakpoint component was run.

Usage Examples

# Setting breakpoints
pipeline.run(
    data={"input": "value"},
    breakpoints={("component_name", 0)},  # Break at the first visit
    debug_path="debug_states/"
)

This will generate a JSON with the complete pipeline state before the next component is run, i.e.: the one receiving the output of the component set in the breakpoint

# Resuming from a saved state
state = Pipeline.load_state("debug_states/component_state.json")
pipeline.run(
    data={"input": "value"},
    resume_state=state
)

🧑‍🍳 See an example notebook here
💬 Share your feedback in this discussion

✅ Adopted Experiments

chore: Remove Agent after Haystack 2.12 release (#263) @julian-risch
chore: Remove AutoMergingRetriever after Haystack 2.12 release (#265) @davidsbatista

Other Updates

Proposal for changing internal working of Agent (#245) @sjrl
refactor: Streamline super components input and output mapping logic (#243) @sjrl
refactor: Small updates to Agent. Make pipeline internal, add check for warm_up (#244) @sjrl
feat: Updates to insertion of values into State (#239) @sjrl
feat: Add unclassified to output of MultiFileConverter (#240) @julian-risch
feat: Enhance tool error logs and some refactoring (#235) @sjrl

Full Changelog: v0.8.0...v0.9.0

Contributors

julian-risch, davidsbatista, and sjrl

Assets 2

11 Mar 14:32

julian-risch

v0.8.0

06b9833

v0.8.0

🔧 Updates to Experiments

Stream ChatGenerator responses with `Agent`

The Agent component now allows setting a streaming callback at init and run time. This way, an Agent's response can be streamed in chunks, enabling faster feedback for developers and end users. #233

agent = Agent(chat_generator=chat_generator, tools=[weather_tool])
response = agent.run([ChatMessage.from_user("Hello")], streaming_callback=streaming_callback)

🐛 Bug Fixes

We fixed a bug that prevented ComponentTool to work with Jinja2-based components (PromptBuilder, ChatPromptBuilder, ConditionalRouter, OutputAdapter). #234
The Agent component now deserializes Tools with the right class and uses deserialize_tools_inplace. #213 #222

✅ Adopted Experiments

chore: remove LLMMetadataExtractor by @davidsbatista in #227
chore: Remove some missed utility functions from previous experiments by @sjrl in #232
chore: removing async version of InMemoryDocumentStore, DocumentWriter, OpenAIChatGenerator, InMemory Retrievers by @davidsbatista in #220
chore: remove pipeline experiments by @mathislucka in #214

🛑 Discontinued Experiments

chore: remove evaluation harness experiment by @julian-risch in #231

Full Changelog: v0.7.0...v0.8.0

Contributors

julian-risch, davidsbatista, and 2 other contributors

Assets 2

27 Feb 11:02

julian-risch

v0.7.0

a045396

v0.7.0

🧪 New Experiments

New `Agent` component

Agent component enables tool-calling functionality with provider-agnostic chat model support and can be used as a standalone component or within a pipeline.
👉 See the Agent in action: 🧑‍🍳 Build a GitHub Issue Resolver Agent

from haystack.dataclasses import ChatMessage
from haystack.components.websearch import SerperDevWebSearch
from haystack_experimental.tools.component_tool import ComponentTool
from haystack_experimental.components.agents import Agent
from haystack.components.generators.chat import OpenAIChatGenerator

web_tool = ComponentTool(
   component=SerperDevWebSearch(),
)

agent = Agent(
   chat_generator=OpenAIChatGenerator(model="gpt-4o-mini"),
   tools=[web_tool],
   exit_condition="text",
)

result = agent.run(
   messages=[ChatMessage.from_user("Find information about Haystack")]
)

Improved `ComponentTool` and `@tool` Decorator

The ComponentTool and @tool decorator are extended for better integration with the new Agent component

New Ready-Made `SuperComponents`

Introducing new SuperComponents that bundle commonly used components and logic for indexing pipelines: MultiFileConverter, SentenceTransformersDocumentIndexer, DocumentPreprocessor

from haystack_experimental.super_components.converters import MultiFileConverter

# process all common file types (.csv, .docx, .html, .json, .md, .txt, .pdf, .pptx, .xlsx) with one component
converter = MultiFileConverter()
converter.run(sources=["test.txt", "test.pdf"], meta={})

What's Changed

docs: add Supercomponent pydoc, delete outdated by @dfokina in #193
docs: updating trace comparison tool README.md by @davidsbatista in #195
chore: Create issue templates for adding, removing, moving an experiment by @julian-risch in #192
chore: remove OpenSearch from experimental by @anakin87 in #200
fix: fixing auto-merging tests, removing hard-coded doc ids by @davidsbatista in #202
chore: add tool related code to prepare Agent PR by @mathislucka in #203
feat: add file and indexing related super components by @mathislucka in #184
docs: Add SuperComponent to catalog by @julian-risch in #190
feat: Introduce Agent by @mathislucka in #175
docs: add pydoc config for Agent component by @julian-risch in #208
docs: Notebook for Agent component by @mathislucka in #204
docs: add MultiFileConverter, SentenceTransformerrsDocumentIndexer, and DocumentPreprocessor to docs by @dfokina in #210

Full Changelog: v0.6.0...v0.7.0

Contributors

julian-risch, davidsbatista, and 3 other contributors

Assets 2

10 Feb 17:09

julian-risch

v0.6.0

c689c05

v0.6.0

New Experiments

New SuperComponent abstraction that allows to wrap any pipeline into a friendly component interface and to create your own super components 1

from haystack_experimental import SuperComponent

# rag_pipeline = basic RAG pipeline with retriever, prompt builder, generator and answer builder components

input_mapping = {
    "search_query": ["retriever.query", "prompt_builder.query", "answer_builder.query"]
}
output_mapping = {
    "answer_builder.answers": "final_answers"
}

wrapper = SuperComponent(
    pipeline=rag_pipeline,
    input_mapping=input_mapping,
    output_mapping=output_mapping
)

result = wrapper.run(search_query="What is the capital of France?")
print(result["final_answers"][0])

New AsyncPipeline that can schedule components to run concurrently 2

Other Updates:

Added a debug/tracing script to compare two pipeline runs with the old and new pipeline run logic 3
Changed LLMMetadaExtractor to use ChatGenerator instead of Generator 4

Full Changelog: v0.5.0...v0.6.0

Assets 2

27 Jan 13:57

julian-risch

v0.5.0

cbbf088

v0.5.0

New Experiments

New Pipeline class with new pipeline run logic - Pipeline example

Full Changelog: v0.4.0...v0.5.0

🧬 New Pipeline Logic

This release introduces a reimplementation of the pipeline-run logic to resolve multiple issues, improving reliability and performance. These changes will also be included in Haystack 2.10.

Fixed Issues:

Exceptions in pipelines with two cycles
- Pipelines with two cycles sharing an optional (like in PromptBuilder) or a greedy variadic edge (e.g., in BranchJoiner) might raise exceptions. Details here.
Incorrect execution in cycles with multiple optional or variadic edges
- Entry points for cycles were non-deterministic, causing components to run with unexpected inputs or multiple times. This impacted execution time and final outputs.
Missing intermediate outputs in cycles
- Outputs produced within a cycle were overwritten, preventing downstream components from receiving them.
Premature execution of lazy variadic components
- Components like DocumentJoiner sometimes executed before receiving all inputs, leading to repeated partial executions that affected downstream results.
Order-sensitive behavior in add_component and connect
- Some bugs above occurred due to specific orderings of add_component and connect in pipeline creation, causing non-deterministic behavior in cyclic pipelines.

Am I Affected by this Change?

Non-cyclic pipelines without lazy variadic components:
No impact—your pipelines should function as before.
Non-cyclic pipelines with lazy variadic components:
Check inputs and outputs of components like DocumentJoiner for issues #4 and #5. Use LoggingTracer with content tracing to validate behavior. Component execution order now uses lexicographical sorting; rename upstream components if necessary.
Pipelines with cycles:
Review your pipeline outputs as well as the component input and outputs to ensure expected behavior, as you may encounter any of the above issues.

Share your comments in discussion #177

Assets 2

11 Dec 11:06

julian-risch

v0.4.0

7ade6a2

v0.4.0

New Experiments

AsyncPipeline and async-enabled components - AsyncPipeline example

Full Changelog: v0.3.0...v0.4.0

Assets 2

Releases: deepset-ai/haystack-experimental

v0.13.0

🧪 New Experiments

Semantic Chunking based on Sentence Embeddings

Hallucination Risk Assessment for LLM Answers

Multi-Query Retrieval for Query Expansion

✅ Adopted Experiments

Uh oh!

v0.12.0

🧪 New Experiments

🧠 Agent Breakpoints

📦 Breakpoints Dataclass

Related PRs

Other Updates

Uh oh!

v0.11.0

🧪 New Experiments

Query Expander component

🔀 New Document Routers

🖼️ New Multimodal Features

Other Updates

New Contributors

Contributors

Uh oh!

v0.10.0

🧪 New Experiments

🖼️ Multimodal Text Generation

🐛 Bug Fixes

✅ Adopted Experiments

Other Updates

Contributors

Uh oh!

v0.9.0

🔧 Updates to Experiments

Adding breakpoints to components in a Pipeline

Usage Examples

✅ Adopted Experiments

Other Updates

Contributors

Uh oh!

v0.8.0

🔧 Updates to Experiments

Stream ChatGenerator responses with Agent

🐛 Bug Fixes

✅ Adopted Experiments

🛑 Discontinued Experiments

Contributors

Uh oh!

v0.7.0

🧪 New Experiments

New Agent component

Improved ComponentTool and @tool Decorator

New Ready-Made SuperComponents

What's Changed

Contributors

Uh oh!

v0.6.0

New Experiments

Other Updates:

Uh oh!

v0.5.0

New Experiments

🧬 New Pipeline Logic

Fixed Issues:

Am I Affected by this Change?

Uh oh!

v0.4.0

New Experiments

Uh oh!

Stream ChatGenerator responses with `Agent`

New `Agent` component

Improved `ComponentTool` and `@tool` Decorator

New Ready-Made `SuperComponents`