Payload size limits #1985

kbzowski · 2025-07-12T14:31:51Z

kbzowski
Jul 12, 2025

I need to pass some large images (5-50MB) between tasks. I searched the docs but only found a 4MB limit mentioned for the old V0 version.

What are the current payload size limits in V1? Any performance tips for handling large files/binary data?

I can imagine the simplest approach would be to store data in S3 and pass URLs instead of the actual binary data, but in case of creating them in one task and passing to another, then deleting after, handling errors, etc., it would be painful to manage.

szvsw · 2025-08-06T18:44:49Z

szvsw
Aug 6, 2025

@kbzowski I'm not sure what the limits are in V1 (I'm also wondering this), but I do lots of S3 URI passing around. It's pretty straightforward and not particularly painful IMO once you commit to it. You can also write some middleware to automate some of that if you want (e.g. with pydantic validators or with decorators that essentially wrap the hatchet standalone).

The middleware approach looks something like this:

def Register(
    description: str | None = None,
    name: str | None = None,
    desired_worker_labels: dict[str, DesiredWorkerLabel] | None = None,
    schedule_timeout: Duration = "1h",
    execution_timeout: Duration = "1h",
    retries: int = 1,
    overwrite_log_method: bool = True,
    inject_workflow_run_id: bool = True,
    **task_config,
):
    """Decorator to make a standalone experiment from a function.

    Usage:
        @ExperimentRegistry.Register(description="desc", ...)
        def my_experiment(input_spec: MyInput) -> MyOutput:
            ...
    """

    def decorator(
        fn: ExperimentFunction[TInput, TOutput],
    ) -> ExperimentStandaloneType:
        input_type = cast(type[TInput], get_type_hints(fn)["input_spec"])
        return_type = cast(type[TOutput], get_type_hints(fn)["return"])

        fn_name = fn.__name__
        fn_doc = fn.__doc__

        @cls.Include
        @hatchet.task(
            name=name or f"scythe_experiment_{fn_name}",
            description=description or f"{fn_doc}",
            input_validator=input_type,
            desired_worker_labels=desired_worker_labels,
            schedule_timeout=schedule_timeout,
            execution_timeout=execution_timeout,
            retries=retries,
            **task_config,
        )
        def task(input_: input_type, context: Context) -> return_type:  # pyright: ignore [reportInvalidTypeForm]
            """The task implementation."""
            if overwrite_log_method:
                input_.log = lambda msg: context.log(msg)
            if inject_workflow_run_id:
                input_.workflow_run_id = context.workflow_run_id
            # do some more middleware-y stuff, e.g. fetching artifacts if needed etc
            ...

            # call the actual function that got decorated
            output = fn(input_)

            # do some more middlewarey stuff
            output.add_scalars(input_)

            return output

        return task

    return decorator

Regardless, I think if you commit to using S3 (or some other storage medium) you will find that you can get most of the kind of functionality you want for interfacing with it squeezed into some relatively simple modules.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Payload size limits #1985

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Payload size limits #1985

Uh oh!

kbzowski Jul 12, 2025

Replies: 1 comment

Uh oh!

Uh oh!

szvsw Aug 6, 2025

kbzowski
Jul 12, 2025

szvsw
Aug 6, 2025