Skip to content

FileTypeRouter handles non-existent files inconsistently #9638

@julian-risch

Description

@julian-risch

While working on #9573 we found that FileTypeRouter raises a FileNotFoundError only when a non-existent file is passed with the meta parameter. Without meta, the component does not raise an error and still classifies the file based on its extension. This behavior is inconsistent.

The reason in the implementation is that we internally convert file paths to ByteStream objects in case metadata is provided so that we can add the metadata to the ByteStream object. We don't convert file paths if there is no metadata.
To handle this consistently, one option is to raise a FileNotFoundError too if there is no metadata. That's a breaking change. We could add a raise_on_failure=False parameter in addition. That's a breaking change for the case where there is no metadata but at least users could opt-in to the previous behavior.

I suggest to add a deprecation warning, wait for the next release, and then ensure consistent behavior in the release after.

from haystack.components.routers import FileTypeRouter

router = FileTypeRouter(mime_types=[r'text/plain'])

# No meta - does not raise error
router.run(sources=["non_existent.txt"])
# → {'text/plain': [PosixPath('non_existent.txt')]}

# With meta - raises FileNotFoundError
router.run(sources=["non_existent.txt"], meta={"spam": "eggs"})
# → FileNotFoundError: [Errno 2] No such file or directory: 'non_existent.txt'

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priority, add to the next sprint if no P1 available

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions