Skip to content

Parallelisation #2

@znichollscr

Description

@znichollscr

Parallelisation, as usual, is tricky.

We can't use the prefect ThreadPoolTaskRunner because then the threads are blocking so we don't actually get parallelisation. For whatever reason, the DaskTaskRunner is also not happy. I think it's because of some global netCDF lock which means that we can only write one netCDF file at a time, but maybe the error is something else. The dask runner also causes a bunch of warnings about prematurely cleaned up futures to be raised, which makes me also feel like something isn't quite working.

We could potentially just use the ThreadPoolTaskRunner and roll our own parallelisation internally (e.g. each notebook run starts in a new process), but that also feels kind of yuck.

For now the safest option is to use a single worker and the ThreadPoolTaskRunner (the default).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions