Replies: 1 comment
-
@kbzowski I'm not sure what the limits are in V1 (I'm also wondering this), but I do lots of S3 URI passing around. It's pretty straightforward and not particularly painful IMO once you commit to it. You can also write some middleware to automate some of that if you want (e.g. with pydantic validators or with decorators that essentially wrap the hatchet standalone). The middleware approach looks something like this: def Register(
description: str | None = None,
name: str | None = None,
desired_worker_labels: dict[str, DesiredWorkerLabel] | None = None,
schedule_timeout: Duration = "1h",
execution_timeout: Duration = "1h",
retries: int = 1,
overwrite_log_method: bool = True,
inject_workflow_run_id: bool = True,
**task_config,
):
"""Decorator to make a standalone experiment from a function.
Usage:
@ExperimentRegistry.Register(description="desc", ...)
def my_experiment(input_spec: MyInput) -> MyOutput:
...
"""
def decorator(
fn: ExperimentFunction[TInput, TOutput],
) -> ExperimentStandaloneType:
input_type = cast(type[TInput], get_type_hints(fn)["input_spec"])
return_type = cast(type[TOutput], get_type_hints(fn)["return"])
fn_name = fn.__name__
fn_doc = fn.__doc__
@cls.Include
@hatchet.task(
name=name or f"scythe_experiment_{fn_name}",
description=description or f"{fn_doc}",
input_validator=input_type,
desired_worker_labels=desired_worker_labels,
schedule_timeout=schedule_timeout,
execution_timeout=execution_timeout,
retries=retries,
**task_config,
)
def task(input_: input_type, context: Context) -> return_type: # pyright: ignore [reportInvalidTypeForm]
"""The task implementation."""
if overwrite_log_method:
input_.log = lambda msg: context.log(msg)
if inject_workflow_run_id:
input_.workflow_run_id = context.workflow_run_id
# do some more middleware-y stuff, e.g. fetching artifacts if needed etc
...
# call the actual function that got decorated
output = fn(input_)
# do some more middlewarey stuff
output.add_scalars(input_)
return output
return task
return decorator Regardless, I think if you commit to using S3 (or some other storage medium) you will find that you can get most of the kind of functionality you want for interfacing with it squeezed into some relatively simple modules. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I need to pass some large images (5-50MB) between tasks. I searched the docs but only found a 4MB limit mentioned for the old V0 version.
What are the current payload size limits in V1? Any performance tips for handling large files/binary data?
I can imagine the simplest approach would be to store data in S3 and pass URLs instead of the actual binary data, but in case of creating them in one task and passing to another, then deleting after, handling errors, etc., it would be painful to manage.
Beta Was this translation helpful? Give feedback.
All reactions