Skip to content

Commit 0126e6e

Browse files
authored
Workflow sandbox (#164)
1 parent e436ecc commit 0126e6e

26 files changed

+2397
-51
lines changed

.github/workflows/ci.yml

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,12 @@ jobs:
1212
strategy:
1313
fail-fast: true
1414
matrix:
15-
python: ["3.7", "3.10"]
15+
# TODO(cretz): 3.10.8 is breaking Windows Rust build
16+
python: ["3.7", "3.10.7"]
1617
os: [ubuntu-latest, macos-latest, windows-latest]
1718
include:
1819
- os: ubuntu-latest
19-
python: 3.10
20+
python: 3.10.7
2021
docsTarget: true
2122
protoCheckTarget: true
2223
runs-on: ${{ matrix.os }}
@@ -32,7 +33,7 @@ jobs:
3233
- uses: Swatinem/rust-cache@v1
3334
with:
3435
working-directory: temporalio/bridge
35-
- uses: actions/setup-python@v1
36+
- uses: actions/setup-python@v4
3637
with:
3738
python-version: ${{ matrix.python }}
3839
# Needed for tests since they use external server
@@ -88,9 +89,9 @@ jobs:
8889
- uses: actions/checkout@v2
8990
with:
9091
submodules: recursive
91-
- uses: actions/setup-python@v1
92+
- uses: actions/setup-python@v4
9293
with:
93-
python-version: "3.10"
94+
python-version: "3.10.7"
9495

9596
# Install Rust locally for non-Linux (Linux uses an internal docker
9697
# command to build with cibuildwheel which uses rustup install defined
@@ -142,9 +143,9 @@ jobs:
142143
- uses: actions/checkout@v2
143144
with:
144145
submodules: recursive
145-
- uses: actions/setup-python@v1
146+
- uses: actions/setup-python@v4
146147
with:
147-
python-version: "3.10"
148+
python-version: "3.10.7"
148149

149150
# Need QEMU for ARM build on Linux
150151
- uses: docker/setup-qemu-action@v1

README.md

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,6 +296,8 @@ async def create_greeting_activity(info: GreetingInfo) -> str:
296296

297297
Some things to note about the above code:
298298

299+
* Workflows run in a sandbox by default. Users are encouraged to define workflows in files with no side effects or other
300+
complicated code. See the [Workflow Sandbox](#workflow-sandbox) section for more details.
299301
* This workflow continually updates the queryable current greeting when signalled and can complete with the greeting on
300302
a different signal
301303
* Workflows are always classes and must have a single `@workflow.run` which is an `async def` function
@@ -581,6 +583,145 @@ method.
581583
Activities are just functions decorated with `@activity.defn`. Simply write different ones and pass those to the worker
582584
to have different activities called during the test.
583585

586+
#### Workflow Sandbox
587+
588+
By default workflows are run in a sandbox to help avoid non-deterministic code. If a call that is known to be
589+
non-deterministic is performed, an exception will be thrown in the workflow which will "fail the task" which means the
590+
workflow will not progress until fixed.
591+
592+
The sandbox is not foolproof and non-determinism can still occur. It is simply a best-effort way to catch bad code
593+
early. Users are encouraged to define their workflows in files with no other side effects.
594+
595+
##### How the Sandbox Works
596+
597+
The sandbox is made up of two components that work closely together:
598+
599+
* Global state isolation
600+
* Restrictions preventing known non-deterministic library calls
601+
602+
Global state isolation is performed by using `exec`. Upon workflow start, the file that the workflow is defined in is
603+
imported into a new sandbox created for that workflow run. In order to keep the sandbox performant a known set of
604+
"passthrough modules" are passed through from outside of the sandbox when they are imported. These are expected to be
605+
side-effect free on import and have their non-deterministic aspects restricted. By default the entire Python standard
606+
library, `temporalio`, and a couple of other modules are passed through from outside of the sandbox. To update this
607+
list, see "Customizing the Sandbox".
608+
609+
Restrictions preventing known non-deterministic library calls are achieved using proxy objects on modules wrapped around
610+
the custom importer set in the sandbox. Many restrictions apply at workflow import time and workflow run time, while
611+
some restrictions only apply at workflow run time. A default set of restrictions is included that prevents most
612+
dangerous standard library calls. However it is known in Python that some otherwise-non-deterministic invocations, like
613+
reading a file from disk via `open` or using `os.environ`, are done as part of importing modules. To customize what is
614+
and isn't restricted, see "Customizing the Sandbox".
615+
616+
##### Avoiding the Sandbox
617+
618+
There are three increasingly-scoped ways to avoid the sandbox. Users are discouraged from avoiding the sandbox if
619+
possible.
620+
621+
To remove restrictions around a particular block of code, use `with temporalio.workflow.unsafe.sandbox_unrestricted():`.
622+
The workflow will still be running in the sandbox, but no restrictions for invalid library calls will be applied.
623+
624+
To run an entire workflow outside of a sandbox, set `sandboxed=False` on the `@workflow.defn` decorator when defining
625+
it. This will run the entire workflow outside of the workflow which means it can share global state and other bad
626+
things.
627+
628+
To disable the sandbox entirely for a worker, set the `Worker` init's `workflow_runner` keyword argument to
629+
`temporalio.worker.UnsandboxedWorkflowRunner()`. This value is defaulted to
630+
`temporalio.worker.workflow_sandbox.SandboxedWorkflowRunner()` so by changing it to the unsandboxed runner, the sandbox
631+
will not be used at all.
632+
633+
##### Customizing the Sandbox
634+
635+
⚠️ WARNING: APIs in the `temporalio.worker.workflow_sandbox` module are not yet considered stable and may change in
636+
future releases.
637+
638+
When creating the `Worker`, the `workflow_runner` is defaulted to
639+
`temporalio.worker.workflow_sandbox.SandboxedWorkflowRunner()`. The `SandboxedWorkflowRunner`'s init accepts a
640+
`restrictions` keyword argument that is defaulted to `SandboxRestrictions.default`. The `SandboxRestrictions` dataclass
641+
is immutable and contains three fields that can be customized, but only two have notable value
642+
643+
###### Passthrough Modules
644+
645+
To make the sandbox quicker when importing known third party libraries, they can be added to the
646+
`SandboxRestrictions.passthrough_modules` set like so:
647+
648+
```python
649+
my_restrictions = dataclasses.replace(
650+
SandboxRestrictions.default,
651+
passthrough_modules=SandboxRestrictions.passthrough_modules_default | SandboxMatcher(access={"pydantic"}),
652+
)
653+
my_worker = Worker(..., runner=SandboxedWorkflowRunner(restrictions=my_restrictions))
654+
```
655+
656+
If an "access" match succeeds for an import, it will simply be forwarded from outside of the sandbox. See the API for
657+
more details on exact fields and their meaning.
658+
659+
###### Invalid Module Members
660+
661+
`SandboxRestrictions.invalid_module_members` contains a root matcher that applies to all module members. This already
662+
has a default set which includes things like `datetime.date.today()` which should never be called from a workflow. To
663+
remove this restriction:
664+
665+
```python
666+
my_restrictions = dataclasses.replace(
667+
SandboxRestrictions.default,
668+
invalid_module_members=SandboxRestrictions.invalid_module_members_default.with_child_unrestricted(
669+
"datetime", "date", "today",
670+
),
671+
)
672+
my_worker = Worker(..., runner=SandboxedWorkflowRunner(restrictions=my_restrictions))
673+
```
674+
675+
Restrictions can also be added by `|`'ing together matchers, for example to restrict the `datetime.date` class from
676+
being used altogether:
677+
678+
```python
679+
my_restrictions = dataclasses.replace(
680+
SandboxRestrictions.default,
681+
invalid_module_members=SandboxRestrictions.invalid_module_members_default | SandboxMatcher(
682+
children={"datetime": SandboxMatcher(use={"date"})},
683+
),
684+
)
685+
my_worker = Worker(..., runner=SandboxedWorkflowRunner(restrictions=my_restrictions))
686+
```
687+
688+
See the API for more details on exact fields and their meaning.
689+
690+
##### Known Sandbox Issues
691+
692+
Below are known sandbox issues. As the sandbox is developed and matures, some may be resolved.
693+
694+
###### Global Import/Builtins
695+
696+
Currently the sandbox references/alters the global `sys.modules` and `builtins` fields while running workflow code. In
697+
order to prevent affecting other sandboxed code, thread locals are leveraged to only intercept these values during the
698+
workflow thread running. Therefore, technically if top-level import code starts a thread, it may lose sandbox
699+
protection.
700+
701+
###### Sandbox is not Secure
702+
703+
The sandbox is built to catch many non-deterministic and state sharing issues, but it is not secure. Some known bad
704+
calls are intercepted, but for performance reasons, every single attribute get/set cannot be checked. Therefore a simple
705+
call like `setattr(temporalio.common, "__my_key", "my value")` will leak across sandbox runs.
706+
707+
The sandbox is only a helper, it does not provide full protection.
708+
709+
###### Sandbox Performance
710+
711+
TODO: This is actively being measured; results to come soon
712+
713+
###### Extending Restricted Classes
714+
715+
Currently, extending classes marked as restricted causes an issue with their `__init__` parameters. This does not affect
716+
most users, but if there is a dependency that is, say, extending `zipfile.ZipFile` an error may occur and the module
717+
will have to be marked as pass through.
718+
719+
###### is_subclass of ABC-based Restricted Classes
720+
721+
Due to [https://bugs.python.org/issue44847](https://bugs.python.org/issue44847), classes that are wrapped and then
722+
checked to see if they are subclasses of another via `is_subclass` may fail (see also
723+
[this wrapt issue](https://github.com/GrahamDumpleton/wrapt/issues/130)).
724+
584725
### Activities
585726

586727
#### Definition

pyproject.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,13 +150,17 @@ intersphinx = [
150150
privacy = [
151151
"PRIVATE:temporalio.bridge",
152152
"PRIVATE:temporalio.types",
153+
"PRIVATE:temporalio.worker.workflow_sandbox.restrictions",
154+
"PRIVATE:temporalio.worker.workflow_sandbox.runner",
153155
"HIDDEN:temporalio.testing.activity",
154156
"HIDDEN:temporalio.testing.workflow",
155157
"HIDDEN:temporalio.worker.activity",
156158
"HIDDEN:temporalio.worker.interceptor",
157159
"HIDDEN:temporalio.worker.worker",
158160
"HIDDEN:temporalio.worker.workflow",
159161
"HIDDEN:temporalio.worker.workflow_instance",
162+
"HIDDEN:temporalio.worker.workflow_sandbox.importer",
163+
"HIDDEN:temporalio.worker.workflow_sandbox.in_sandbox",
160164
"HIDDEN:**.*_pb2*",
161165
]
162166
project-name = "Temporal Python"

temporalio/worker/replayer.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
from .worker import load_default_build_id
3838
from .workflow import _WorkflowWorker
3939
from .workflow_instance import UnsandboxedWorkflowRunner, WorkflowRunner
40+
from .workflow_sandbox import SandboxedWorkflowRunner
4041

4142
logger = logging.getLogger(__name__)
4243

@@ -49,7 +50,8 @@ def __init__(
4950
*,
5051
workflows: Sequence[Type],
5152
workflow_task_executor: Optional[concurrent.futures.ThreadPoolExecutor] = None,
52-
workflow_runner: WorkflowRunner = UnsandboxedWorkflowRunner(),
53+
workflow_runner: WorkflowRunner = SandboxedWorkflowRunner(),
54+
unsandboxed_workflow_runner: WorkflowRunner = UnsandboxedWorkflowRunner(),
5355
namespace: str = "ReplayNamespace",
5456
data_converter: temporalio.converter.DataConverter = temporalio.converter.default(),
5557
interceptors: Sequence[Interceptor] = [],
@@ -70,6 +72,7 @@ def __init__(
7072
workflows=list(workflows),
7173
workflow_task_executor=workflow_task_executor,
7274
workflow_runner=workflow_runner,
75+
unsandboxed_workflow_runner=unsandboxed_workflow_runner,
7376
namespace=namespace,
7477
data_converter=data_converter,
7578
interceptors=interceptors,
@@ -146,6 +149,7 @@ async def replay_workflows(
146149
workflows=self._config["workflows"],
147150
workflow_task_executor=self._config["workflow_task_executor"],
148151
workflow_runner=self._config["workflow_runner"],
152+
unsandboxed_workflow_runner=self._config["unsandboxed_workflow_runner"],
149153
data_converter=self._config["data_converter"],
150154
interceptors=self._config["interceptors"],
151155
debug_mode=self._config["debug_mode"],
@@ -235,6 +239,7 @@ class ReplayerConfig(TypedDict, total=False):
235239
workflows: Sequence[Type]
236240
workflow_task_executor: Optional[concurrent.futures.ThreadPoolExecutor]
237241
workflow_runner: WorkflowRunner
242+
unsandboxed_workflow_runner: WorkflowRunner
238243
namespace: str
239244
data_converter: temporalio.converter.DataConverter
240245
interceptors: Sequence[Interceptor]

temporalio/worker/worker.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
from .interceptor import Interceptor
3030
from .workflow import _WorkflowWorker
3131
from .workflow_instance import UnsandboxedWorkflowRunner, WorkflowRunner
32+
from .workflow_sandbox import SandboxedWorkflowRunner
3233

3334
logger = logging.getLogger(__name__)
3435

@@ -49,7 +50,8 @@ def __init__(
4950
workflows: Sequence[Type] = [],
5051
activity_executor: Optional[concurrent.futures.Executor] = None,
5152
workflow_task_executor: Optional[concurrent.futures.ThreadPoolExecutor] = None,
52-
workflow_runner: WorkflowRunner = UnsandboxedWorkflowRunner(),
53+
workflow_runner: WorkflowRunner = SandboxedWorkflowRunner(),
54+
unsandboxed_workflow_runner: WorkflowRunner = UnsandboxedWorkflowRunner(),
5355
interceptors: Sequence[Interceptor] = [],
5456
build_id: Optional[str] = None,
5557
identity: Optional[str] = None,
@@ -96,6 +98,8 @@ def __init__(
9698
provided, the caller is responsible for shutting it down after
9799
the worker is shut down.
98100
workflow_runner: Runner for workflows.
101+
unsandboxed_workflow_runner: Runner for workflows that opt-out of
102+
sandboxing.
99103
interceptors: Collection of interceptors for this worker. Any
100104
interceptors already on the client that also implement
101105
:py:class:`Interceptor` are prepended to this list and should
@@ -202,6 +206,7 @@ def __init__(
202206
activity_executor=activity_executor,
203207
workflow_task_executor=workflow_task_executor,
204208
workflow_runner=workflow_runner,
209+
unsandboxed_workflow_runner=unsandboxed_workflow_runner,
205210
interceptors=interceptors,
206211
build_id=build_id,
207212
identity=identity,
@@ -246,6 +251,7 @@ def __init__(
246251
workflows=workflows,
247252
workflow_task_executor=workflow_task_executor,
248253
workflow_runner=workflow_runner,
254+
unsandboxed_workflow_runner=unsandboxed_workflow_runner,
249255
data_converter=client_config["data_converter"],
250256
interceptors=interceptors,
251257
debug_mode=debug_mode,
@@ -378,6 +384,7 @@ class WorkerConfig(TypedDict, total=False):
378384
activity_executor: Optional[concurrent.futures.Executor]
379385
workflow_task_executor: Optional[concurrent.futures.ThreadPoolExecutor]
380386
workflow_runner: WorkflowRunner
387+
unsandboxed_workflow_runner: WorkflowRunner
381388
interceptors: Sequence[Interceptor]
382389
build_id: Optional[str]
383390
identity: Optional[str]

temporalio/worker/workflow.py

Lines changed: 24 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ def __init__(
4949
workflows: Sequence[Type],
5050
workflow_task_executor: Optional[concurrent.futures.ThreadPoolExecutor],
5151
workflow_runner: WorkflowRunner,
52+
unsandboxed_workflow_runner: WorkflowRunner,
5253
data_converter: temporalio.converter.DataConverter,
5354
interceptors: Sequence[Interceptor],
5455
debug_mode: bool,
@@ -70,6 +71,7 @@ def __init__(
7071
)
7172
self._workflow_task_executor_user_provided = workflow_task_executor is not None
7273
self._workflow_runner = workflow_runner
74+
self._unsandboxed_workflow_runner = unsandboxed_workflow_runner
7375
self._data_converter = data_converter
7476
# Build the interceptor classes and collect extern functions
7577
self._extern_functions: MutableMapping[str, Callable] = {}
@@ -98,6 +100,15 @@ def __init__(
98100
# Confirm name unique
99101
if defn.name in self._workflows:
100102
raise ValueError(f"More than one workflow named {defn.name}")
103+
# Prepare the workflow with the runner (this will error in the
104+
# sandbox if an import fails somehow)
105+
try:
106+
if defn.sandboxed:
107+
workflow_runner.prepare_workflow(defn)
108+
else:
109+
unsandboxed_workflow_runner.prepare_workflow(defn)
110+
except Exception as err:
111+
raise RuntimeError(f"Failed validating workflow {defn.name}") from err
101112
self._workflows[defn.name] = defn
102113

103114
async def run(self) -> None:
@@ -162,7 +173,7 @@ async def _handle_activation(
162173
# If the workflow is not running yet, create it
163174
workflow = self._running_workflows.get(act.run_id)
164175
if not workflow:
165-
workflow = await self._create_workflow_instance(act)
176+
workflow = self._create_workflow_instance(act)
166177
self._running_workflows[act.run_id] = workflow
167178
# Run activation in separate thread so we can check if it's
168179
# deadlocked
@@ -254,7 +265,7 @@ async def _handle_activation(
254265
logger.debug("Shutting down worker on eviction")
255266
asyncio.create_task(self._bridge_worker().shutdown())
256267

257-
async def _create_workflow_instance(
268+
def _create_workflow_instance(
258269
self, act: temporalio.bridge.proto.workflow_activation.WorkflowActivation
259270
) -> WorkflowInstance:
260271
# First find the start workflow job
@@ -311,13 +322,15 @@ async def _create_workflow_instance(
311322
)
312323

313324
# Create instance from details
314-
return await self._workflow_runner.create_instance(
315-
WorkflowInstanceDetails(
316-
payload_converter_class=self._data_converter.payload_converter_class,
317-
interceptor_classes=self._interceptor_classes,
318-
defn=defn,
319-
info=info,
320-
randomness_seed=start.randomness_seed,
321-
extern_functions=self._extern_functions,
322-
)
325+
det = WorkflowInstanceDetails(
326+
payload_converter_class=self._data_converter.payload_converter_class,
327+
interceptor_classes=self._interceptor_classes,
328+
defn=defn,
329+
info=info,
330+
randomness_seed=start.randomness_seed,
331+
extern_functions=self._extern_functions,
323332
)
333+
if defn.sandboxed:
334+
return self._workflow_runner.create_instance(det)
335+
else:
336+
return self._unsandboxed_workflow_runner.create_instance(det)

0 commit comments

Comments
 (0)