Skip to content

Commit cc679b9

Browse files
authored
Sandbox Improvements (#219)
Fixes #216 Fixes #210 Fixes #203 Fixes #208 Fixes #204
1 parent 6a43919 commit cc679b9

File tree

8 files changed

+375
-107
lines changed

8 files changed

+375
-107
lines changed

README.md

Lines changed: 66 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,9 @@ The Python SDK is under development. There are no compatibility guarantees at th
7878
- [Sandbox is not Secure](#sandbox-is-not-secure)
7979
- [Sandbox Performance](#sandbox-performance)
8080
- [Extending Restricted Classes](#extending-restricted-classes)
81+
- [Certain Standard Library Calls on Restricted Objects](#certain-standard-library-calls-on-restricted-objects)
8182
- [is_subclass of ABC-based Restricted Classes](#is_subclass-of-abc-based-restricted-classes)
83+
- [Compiled Pydantic Sometimes Using Wrong Types](#compiled-pydantic-sometimes-using-wrong-types)
8284
- [Activities](#activities)
8385
- [Definition](#definition-1)
8486
- [Types of Activities](#types-of-activities)
@@ -672,6 +674,10 @@ workflow will not progress until fixed.
672674
The sandbox is not foolproof and non-determinism can still occur. It is simply a best-effort way to catch bad code
673675
early. Users are encouraged to define their workflows in files with no other side effects.
674676

677+
The sandbox offers a mechanism to pass through modules from outside the sandbox. By default this already includes all
678+
standard library modules and Temporal modules. **For performance and behavior reasons, users are encouraged to pass
679+
through all third party modules whose calls will be deterministic.** See "Passthrough Modules" below on how to do this.
680+
675681
##### How the Sandbox Works
676682

677683
The sandbox is made up of two components that work closely together:
@@ -718,23 +724,46 @@ future releases.
718724
When creating the `Worker`, the `workflow_runner` is defaulted to
719725
`temporalio.worker.workflow_sandbox.SandboxedWorkflowRunner()`. The `SandboxedWorkflowRunner`'s init accepts a
720726
`restrictions` keyword argument that is defaulted to `SandboxRestrictions.default`. The `SandboxRestrictions` dataclass
721-
is immutable and contains three fields that can be customized, but only two have notable value
727+
is immutable and contains three fields that can be customized, but only two have notable value. See below.
722728

723729
###### Passthrough Modules
724730

725-
To make the sandbox quicker and use less memory when importing known third party libraries, they can be added to the
726-
`SandboxRestrictions.passthrough_modules` set like so:
731+
By default the sandbox completely reloads non-standard-library and non-Temporal modules for every workflow run. To make
732+
the sandbox quicker and use less memory when importing known-side-effect-free third party modules, they can be marked
733+
as passthrough modules.
734+
735+
**For performance and behavior reasons, users are encouraged to pass through all third party modules whose calls will be
736+
deterministic.**
737+
738+
One way to pass through a module is at import time in the workflow file using the `imports_passed_through` context
739+
manager like so:
727740

728741
```python
729-
my_restrictions = dataclasses.replace(
730-
SandboxRestrictions.default,
731-
passthrough_modules=SandboxRestrictions.passthrough_modules_default | SandboxMatcher(access={"pydantic"}),
742+
# my_workflow_file.py
743+
744+
from temporalio import workflow
745+
746+
with workflow.unsafe.imports_passed_through():
747+
import pydantic
748+
749+
@workflow.defn
750+
class MyWorkflow:
751+
...
752+
```
753+
754+
Alternatively, this can be done at worker creation time by customizing the runner's restrictions. For example:
755+
756+
```python
757+
my_worker = Worker(
758+
...,
759+
workflow_runner=SandboxedWorkflowRunner(
760+
restrictions=SandboxRestrictions.default.with_passthrough_modules("pydantic")
761+
)
732762
)
733-
my_worker = Worker(..., runner=SandboxedWorkflowRunner(restrictions=my_restrictions))
734763
```
735764

736-
If an "access" match succeeds for an import, it will simply be forwarded from outside of the sandbox. See the API for
737-
more details on exact fields and their meaning.
765+
In both of these cases, now the `pydantic` module will be passed through from outside of the sandbox instead of
766+
being reloaded for every workflow run.
738767

739768
###### Invalid Module Members
740769

@@ -749,7 +778,7 @@ my_restrictions = dataclasses.replace(
749778
"datetime", "date", "today",
750779
),
751780
)
752-
my_worker = Worker(..., runner=SandboxedWorkflowRunner(restrictions=my_restrictions))
781+
my_worker = Worker(..., workflow_runner=SandboxedWorkflowRunner(restrictions=my_restrictions))
753782
```
754783

755784
Restrictions can also be added by `|`'ing together matchers, for example to restrict the `datetime.date` class from
@@ -762,7 +791,7 @@ my_restrictions = dataclasses.replace(
762791
children={"datetime": SandboxMatcher(use={"date"})},
763792
),
764793
)
765-
my_worker = Worker(..., runner=SandboxedWorkflowRunner(restrictions=my_restrictions))
794+
my_worker = Worker(..., workflow_runner=SandboxedWorkflowRunner(restrictions=my_restrictions))
766795
```
767796

768797
See the API for more details on exact fields and their meaning.
@@ -802,16 +831,39 @@ To mitigate this, users should:
802831

803832
###### Extending Restricted Classes
804833

805-
Currently, extending classes marked as restricted causes an issue with their `__init__` parameters. This does not affect
806-
most users, but if there is a dependency that is, say, extending `zipfile.ZipFile` an error may occur and the module
807-
will have to be marked as pass through.
834+
Extending a restricted class causes Python to instantiate the restricted metaclass which is unsupported. Therefore if
835+
you attempt to use a class in the sandbox that extends a restricted class, it will fail. For example, if you have a
836+
`class MyZipFile(zipfile.ZipFile)` and try to use that class inside a workflow, it will fail.
837+
838+
Classes used inside the workflow should not extend restricted classes. For situations where third-party modules need to
839+
at import time, they should be marked as pass through modules.
840+
841+
###### Certain Standard Library Calls on Restricted Objects
842+
843+
If an object is restricted, internal C Python validation may fail in some cases. For example, running
844+
`dict.items(os.__dict__)` will fail with:
845+
846+
> descriptor 'items' for 'dict' objects doesn't apply to a '_RestrictedProxy' object
847+
848+
This is a low-level check that cannot be subverted. The solution is to not use restricted objects inside the sandbox.
849+
For situations where third-party modules need to at import time, they should be marked as pass through modules.
808850

809851
###### is_subclass of ABC-based Restricted Classes
810852

811853
Due to [https://bugs.python.org/issue44847](https://bugs.python.org/issue44847), classes that are wrapped and then
812854
checked to see if they are subclasses of another via `is_subclass` may fail (see also
813855
[this wrapt issue](https://github.com/GrahamDumpleton/wrapt/issues/130)).
814856

857+
###### Compiled Pydantic Sometimes Using Wrong Types
858+
859+
If the Pydantic dependency is in compiled form (the default) and you are using a Pydantic model inside a workflow
860+
sandbox that uses a `datetime` type, it will grab the wrong validator and use `date` instead. This is because our
861+
patched form of `issubclass` is bypassed by compiled Pydantic.
862+
863+
To work around, either don't use `datetime`-based Pydantic model fields in workflows, or mark `datetime` library as
864+
passthrough (means you lose protection against calling the non-deterministic `now()`), or use non-compiled Pydantic
865+
dependency.
866+
815867
### Activities
816868

817869
#### Definition

temporalio/worker/workflow_sandbox/_importer.py

Lines changed: 102 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
import sys
1515
import threading
1616
import types
17+
import warnings
1718
from contextlib import ExitStack, contextmanager
1819
from typing import (
1920
Any,
@@ -29,6 +30,7 @@
2930
Set,
3031
Tuple,
3132
TypeVar,
33+
no_type_check,
3234
)
3335

3436
from typing_extensions import ParamSpec
@@ -107,6 +109,33 @@ def restrict_built_in(name: str, orig: Any, *args, **kwargs):
107109
)
108110
)
109111

112+
# Need to unwrap params for isinstance and issubclass. We have
113+
# chosen to do it this way instead of customize __instancecheck__
114+
# and __subclasscheck__ because we may have proxied the second
115+
# parameter which does not have a way to override. It is unfortunate
116+
# we have to change these globals for everybody.
117+
def unwrap_second_param(orig: Any, a: Any, b: Any) -> Any:
118+
a = RestrictionContext.unwrap_if_proxied(a)
119+
b = RestrictionContext.unwrap_if_proxied(b)
120+
return orig(a, b)
121+
122+
thread_local_is_inst = _get_thread_local_builtin("isinstance")
123+
self.restricted_builtins.append(
124+
(
125+
"isinstance",
126+
thread_local_is_inst,
127+
functools.partial(unwrap_second_param, thread_local_is_inst.orig),
128+
)
129+
)
130+
thread_local_is_sub = _get_thread_local_builtin("issubclass")
131+
self.restricted_builtins.append(
132+
(
133+
"issubclass",
134+
thread_local_is_sub,
135+
functools.partial(unwrap_second_param, thread_local_is_sub.orig),
136+
)
137+
)
138+
110139
@contextmanager
111140
def applied(self) -> Iterator[None]:
112141
"""Context manager to apply this restrictive import.
@@ -153,17 +182,21 @@ def _import(
153182
fromlist: Sequence[str] = (),
154183
level: int = 0,
155184
) -> types.ModuleType:
185+
# We have to resolve the full name, it can be relative at different
186+
# levels
187+
full_name = _resolve_module_name(name, globals, level)
188+
156189
# Check module restrictions and passthrough modules
157-
if name not in sys.modules:
190+
if full_name not in sys.modules:
158191
# Make sure not an entirely invalid module
159-
self._assert_valid_module(name)
192+
self._assert_valid_module(full_name)
160193

161194
# Check if passthrough
162-
passthrough_mod = self._maybe_passthrough_module(name)
195+
passthrough_mod = self._maybe_passthrough_module(full_name)
163196
if passthrough_mod:
164197
# Load all parents. Usually Python does this for us, but not on
165198
# passthrough.
166-
parent, _, child = name.rpartition(".")
199+
parent, _, child = full_name.rpartition(".")
167200
if parent and parent not in sys.modules:
168201
_trace(
169202
"Importing parent module %s before passing through %s",
@@ -174,17 +207,17 @@ def _import(
174207
# Set the passthrough on the parent
175208
setattr(sys.modules[parent], child, passthrough_mod)
176209
# Set the passthrough on sys.modules and on the parent
177-
sys.modules[name] = passthrough_mod
210+
sys.modules[full_name] = passthrough_mod
178211
# Put it on the parent
179212
if parent:
180-
setattr(sys.modules[parent], child, sys.modules[name])
213+
setattr(sys.modules[parent], child, sys.modules[full_name])
181214

182215
# If the module is __temporal_main__ and not already in sys.modules,
183216
# we load it from whatever file __main__ was originally in
184-
if name == "__temporal_main__":
217+
if full_name == "__temporal_main__":
185218
orig_mod = _thread_local_sys_modules.orig["__main__"]
186219
new_spec = importlib.util.spec_from_file_location(
187-
name, orig_mod.__file__
220+
full_name, orig_mod.__file__
188221
)
189222
if not new_spec:
190223
raise ImportError(
@@ -195,7 +228,7 @@ def _import(
195228
f"Spec for __main__ file at {orig_mod.__file__} has no loader"
196229
)
197230
new_mod = importlib.util.module_from_spec(new_spec)
198-
sys.modules[name] = new_mod
231+
sys.modules[full_name] = new_mod
199232
new_spec.loader.exec_module(new_mod)
200233

201234
mod = importlib.__import__(name, globals, locals, fromlist, level)
@@ -219,10 +252,20 @@ def _assert_valid_module(self, name: str) -> None:
219252
raise RestrictedWorkflowAccessError(name)
220253

221254
def _maybe_passthrough_module(self, name: str) -> Optional[types.ModuleType]:
222-
if not self.restrictions.passthrough_modules.match_access(
223-
self.restriction_context, *name.split(".")
255+
# If imports not passed through and name not in passthrough modules,
256+
# check parents
257+
if (
258+
not temporalio.workflow.unsafe.is_imports_passed_through()
259+
and name not in self.restrictions.passthrough_modules
224260
):
225-
return None
261+
end_dot = -1
262+
while True:
263+
end_dot = name.find(".", end_dot + 1)
264+
if end_dot == -1:
265+
return None
266+
elif name[:end_dot] in self.restrictions.passthrough_modules:
267+
break
268+
# Do the pass through
226269
with self._unapplied():
227270
_trace("Passing module %s through from host", name)
228271
global _trace_depth
@@ -409,3 +452,50 @@ def _get_thread_local_builtin(name: str) -> _ThreadLocalCallable:
409452
ret = _ThreadLocalCallable(getattr(builtins, name))
410453
_thread_local_builtins[name] = ret
411454
return ret
455+
456+
457+
def _resolve_module_name(
458+
name: str, globals: Optional[Mapping[str, object]], level: int
459+
) -> str:
460+
if level == 0:
461+
return name
462+
# Calc the package from globals
463+
package = _calc___package__(globals or {})
464+
# Logic taken from importlib._resolve_name
465+
bits = package.rsplit(".", level - 1)
466+
if len(bits) < level:
467+
raise ImportError("Attempted relative import beyond top-level package")
468+
base = bits[0]
469+
return f"{base}.{name}" if name else base
470+
471+
472+
# Copied from importlib._calc__package__
473+
@no_type_check
474+
def _calc___package__(globals: Mapping[str, object]) -> str:
475+
"""Calculate what __package__ should be.
476+
__package__ is not guaranteed to be defined or could be set to None
477+
to represent that its proper value is unknown.
478+
"""
479+
package = globals.get("__package__")
480+
spec = globals.get("__spec__")
481+
if package is not None:
482+
if spec is not None and package != spec.parent:
483+
warnings.warn(
484+
"__package__ != __spec__.parent " f"({package!r} != {spec.parent!r})",
485+
DeprecationWarning,
486+
stacklevel=3,
487+
)
488+
return package
489+
elif spec is not None:
490+
return spec.parent
491+
else:
492+
warnings.warn(
493+
"can't resolve package from __spec__ or __package__, "
494+
"falling back on __name__ and __path__",
495+
ImportWarning,
496+
stacklevel=3,
497+
)
498+
package = globals["__name__"]
499+
if "__path__" not in globals:
500+
package = package.rpartition(".")[0]
501+
return package

0 commit comments

Comments
 (0)