You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am interested in having some assets be defined in separate processes, perhaps even implemented in other languages. My understand is that to do this, I have to use pipes as shown here: https://docs.dagster.io/guides/build/external-pipelines/
However, it isn't clear how (if at all) this works when we want to send/receive data from other assets.
For example:
given this simple pair of assets, where the first retrieves some data, and the second transforms it:
@dg.asset
def raw_data() -> str:
# Code to fetch raw data from an external source
return "Raw data"
@dg.asset(deps=[raw_data])
def transformed_data(raw_data) -> str:
# Transform it
return raw_data.upper()
defs = dg.Definitions(
assets=[raw_data, transformed_data],
)
Suppose I want to change transformed_data to do the work in an external process. I'd switch it to use pipes_subprocess_client, right? But how would I pass the raw_data to that piped process and then returns it's stdout?
I suppose that I can literally just use the python subprocess module or similar and just do something like sending the raw data as a JSON object to stdin and look for a JSON object on stdout. But is there a "dagster way" to do things using pipes_subprocess_client while maintaining the ability to send/receive the data with simplicity?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am interested in having some assets be defined in separate processes, perhaps even implemented in other languages. My understand is that to do this, I have to use pipes as shown here: https://docs.dagster.io/guides/build/external-pipelines/
However, it isn't clear how (if at all) this works when we want to send/receive data from other assets.
For example:
given this simple pair of assets, where the first retrieves some data, and the second transforms it:
Suppose I want to change
transformed_data
to do the work in an external process. I'd switch it to usepipes_subprocess_client
, right? But how would I pass theraw_data
to that piped process and then returns it'sstdout
?I suppose that I can literally just use the python subprocess module or similar and just do something like sending the raw data as a JSON object to stdin and look for a JSON object on stdout. But is there a "dagster way" to do things using
pipes_subprocess_client
while maintaining the ability to send/receive the data with simplicity?Beta Was this translation helpful? Give feedback.
All reactions