Skip to content

Python UDFs are incompatible with elasticity #895

@senderista

Description

@senderista

Since the DbCreateFunction operator used by the REST API for registering Python UDFs stores the pickled form of the UDF in each extant worker's local Postgres database, without registering the implementation in the Myria system catalog, new workers will not inherit the implementation in their local Postgres database, and queries using previously-registered UDFs on those new workers will fail.

Based on discussion with @BrandonHaynes, I think the registration API needs to be redesigned to be compatible with elasticity. It would be relatively simple to store the pickled form of each Python UDF as a file in a well-known directory on the master, with filename corresponding to the function's registered name in the catalog (this is roughly the design we're using for Java UDFs). REEF will be responsible for copying the pickled function files to each worker on cluster startup, and each worker will register the pickled function in its local Postgres database in its initialization stage. We can use the same approach for Postgres UDFs, if we store them as script files in a well-known directory on the master.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions