Dependency handling for electrons #661
Replies: 7 comments 17 replies
-
| Just noting here that this will involve changes to the DB schema. We'll need to have a broader discussion about how we're storing electron metadata in the DB and object store. | 
Beta Was this translation helpful? Give feedback.
-
| @kessler-frost / one quick thing thats missing in this class is the dependency on local import files. This is majorly allowed by simply passing the imported module which can be passed by value to cloudpickle. (refer cloudpickle Readme) | 
Beta Was this translation helpful? Give feedback.
-
| This design doc is really well written. I just have a few questions. 
 def new_func(*args, **kwargs):
	deps_object_1.apply()
	deps_object_2.apply()
	
	task(*args, **kwargs)
	deps_object_3.apply()(This implementation wouldn't actually work if the deps need to be installed before unpickling the task, but one might imagine introducing "dep" electrons, which would fit into the Transport Graph framework.) There a few considerations here: readability, ease of use, and ease of reasoning about the code. The latter is especially important when code doesn't run the way one expects, and error handling would be most straightforward if the user issues the setup commands explicitly. 
 | 
Beta Was this translation helpful? Give feedback.
-
| Do we expect the Deps classes to be cloud-picklable? | 
Beta Was this translation helpful? Give feedback.
-
| For the implementation, we need to distinguish between two types of deps. Some classes of Deps can be applied in a separate task before the actual task; these can be packaged in "Dep" electrons and marked as a dependency of the core electron. But others only have an effect if applied in the same session as the actual task and need special handling by the executor. To illustrate, recall that the SSH and Slurm executors runs each task through a series of noninteractive SSH sessions: 
 Installing pip packages into a venv/condaenv can be done in a separate SSH session before Step 1. Other setup commands, however, only persist for one shell session. These must be run in the same session as Step 2. For example, any environment variables must be set at the beginning of Step 2 since they will be unset in the next SSH session. To apply the latter kind of deps, the  | 
Beta Was this translation helpful? Give feedback.
-
| Mutating the state of the executor's environment outside of the user's explicitly defined tasks can introduce some new and potentially subtle failure modes: 
 We need to figure out how to handle these in a transparent way. | 
Beta Was this translation helpful? Give feedback.
-
| Finally closed by #1876 | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Deps Design Document
Terminology
task- electron decorated function by the userbackend environment- the environment where thetaskfunction is actually executed, e.g an aws instance or inside a slurm clustercall_beforeandcall_after- functions to be called beforetaskand aftertaskrespectively in the samebackend environmentpartial- a function decorated with thefunctools.partialdecorator (link) which basically “freezes” the functionProblems
Dependency features that are not available right now in a simpler more direct way:
call_beforeandcall_afterfunction executions. It’ll basically mean packing all three types of these functions into 1 function and calling that instead.pip,conda,modulepackage dependency installation on the backend environment.Proposal
UX - Explicit
UX - Implicit conversion to objects
Deps Class
This will be the parent class for any kind of dependency. For implementing any new type of dependency we’ll have to subclass this and override the
__init__()andapply()methods.__init__():Depsobject with given variables.Depsobject.apply():self.*kind of variables, i.e, internal variables local to the object assigned at initialization.build_graphtime by the electronDepssubclasses, or the user wants to have a customDeps, they can easily do it without having to worry about argument managementpartialfunction runnable without having thisDepsobject available on the backend environmentInitial Deps SubClasses
PipDeps:pip installhence even version number can be provided as “numpy==0.23”requirements.txtfile path can also be givenapply()methodCondaDeps:environment.ymlfile for creating a new conda environmentModuleDeps:modulepackage managermoduleEnvDeps:BashDeps:CallDeps:funcfunccall_beforelist andcall_afterlist of ordered executableDepsobjects, e.g:call_before=[CallDeps(a, args=(1, 2))], CallDeps(a, args=(3, 4))]a(1, 2)function will be run beforea(3, 4)Special Case of
ImportDepsor any such composite “Deps”:Depsbut just a proxy for UX to handle multiple package installation dependencies togetherPipDepsobjectCondaDepsobjectModuleDepsobjectDepsobject, e.g:pipmetadata field will be assigned theimport_deps_object.pipvalue, etc.__new__()method of this classSome things worth mentioning:
apply()methods will be run every time for every electron, so if some dependencies are already present in the environment then we shouldn’t try to install/download them in there - the creator of theDepssubclass should keep this in mind when writing theapply()methodapply()in a way where the backend environment does not need theDepsobject to be there hence no covalent dependency should be thereapply()functions is intentionally avoided but can be implemented if need be, although some thought should be given when deciding that as to why exactly is that needed.Beta Was this translation helpful? Give feedback.
All reactions