dispel4py examples

This repository has a collection of different dispel4py workflows - tested with Python 3.10+

Requirements

You need to have previously installed our latest dispel4py (Python 3.10+).

Follow the instruction detailed here to install dispel4py with a Conda enviroment. Alternative you can follow this instructions to build a docker image and run dispel4py within a docker container.

Known Issues

Multiprocessing mappings (multi, dyn_multi, dyn_auto_multi) do not seem to work properly in MacOS (M1 chip).See bellow:

File "/Users/...../anaconda3/envs/py310/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: 'TestProducer' object has no attribute 'simple_logger'

In order to fix that, we recommend to use our Docker file to create an image and later a container.

You might have to use the following command to install mpi in your MacOS laptop:

conda install -c conda-forge mpi4py mpich

In Linux enviroments to install mpi you can use:

pip install mpi4py

For the mpi mapping, we need to indicate twice the number of processes, using twice the -n flag (one at te beginning and one at the end ):

mpiexec -n 10 dispel4py mpi dispel4py.examples.graph_testing.pipeline_test -i 20 -n 10

In some enviroments, you might need these flags for the mpi mapping:

--allow-run-as-root --oversubscribe

When running workflows with mpi mapping you may encounter messages like Read -1, expected 56295, errno = 1. There's no need for concern; these messages are typical and do not indicate a problem. Rest assured, your workflow is still running as expected.

Worflow Collection

If you run a workflow from other directories you just need to indicate it <DIR1>.<DIR2>.<NAME_WORKFLOW> without py. If you are in the directory of the workflow, then you will need to use <NAME_WOWRKFLOW>.py . You will find detailed explanations in each workflow's README.

Mappings

The mappings of dispel4py refer to the connections between the processing elements (PEs) in a dataflow graph. Dispel4py is a Python library used for specifying and executing data-intensive workflows. In a dataflow graph, each PE represents a processing step, and the mappings define how data flows between the PEs during execution. These mappings ensure that data is correctly routed and processed through the dataflow, enabling efficient and parallel execution of tasks. We currently support the following ones:

Sequential
- "simple": it executes dataflow graphs sequentially on a single process, suitable for small-scale data processing tasks.
Parallel:
- Fixed fixed workload distribution - support stateful and stateless PEs:
  - "mpi": it distributes dataflow graph computations across multiple nodes (distributed memory) using the Message Passing Interface (MPI).
  - "multi": it runs multiple instances of a dataflow graph concurrently using multiprocessing Python library, offering parallel processing on a single machine.
  - "zmq_multi": it runs multiple instances of a dataflow graph concurrently using ZMQ library, offering parallel processing on a single machine.
  - "redis" : it runs multiple instances of a dataflow graph concurrently using Redis library.
- Dynamic workfload distribution - support only stateless PEs
  - "dyn_multi": it runs multiple instances of a dataflow graph concurrently using multiprocessing Python library. Worload assigned dynamically (but no autoscaling).
  - "dyn_auto_multi": same as above, but allows autoscaling. We can indicate the number of threads to use.
  - "dyn_redis": it runs multiple instances of a dataflow graph concurrently using Redis library. Workload assigned dynamically (but no autocasling).
  - "dyn_auto_redis": same as above, but allows autoscaling. We can indicate the number of threads to use.
- Hybrid workload distribution - supports stateful and stateless PEs
  - "hybrid_redis": it runs multiple instances of a dataflow graph concurrently using Redis library. Hybrid approach for workloads: Stafeless PEs assigned dynamically, while Stateful PEs are assigned from the begining.

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
article_sentiment_analysis		article_sentiment_analysis
internal_extinction		internal_extinction
others		others
rapid_assessment		rapid_assessment
seismic_preparation		seismic_preparation
tc_cross_correlation		tc_cross_correlation
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dispel4py examples

Requirements

Known Issues

Worflow Collection

Mappings

About

Uh oh!

Releases

Packages

Languages

StreamingFlow/d4py_workflows

Folders and files

Latest commit

History

Repository files navigation

dispel4py examples

Requirements

Known Issues

Worflow Collection

Mappings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages