Sanity check on Dagster use case #31172
Unanswered
nicpottier
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I feel like I might be trying to put a square peg in a round hole here so want to double check whether me using dagster for a particular use case makes any sense.
We are building a pipeline to turn PDFs into websites. That involves various steps of image extraction, text extraction, translation, upscaling, filtering etc.. at a high level a lot of these things feel like software defined assets and the dagster model of dependencies and lineage sure makes a lot of sense. I've played with putting together a prototype and I rather like the structure dagster is forcing us into, it feels maintainable and easy to reason about.
..except that we are building this as a tool that can run on a variety of user defined PDFs and that's where things start to feel wonky. I can make the PDF path and output directory part of a config and that's natural enough but it gets weird with IOManagers. I can't seem to figure out how to make an IOManager cleanly write its output to a different directory per "book". It doesn't cleanly fit into partitions because these are config defined and I can't access the config from the IOManager. It also feels like I need to have these outputs segmented somehow otherwise I could be overwriting other books assets when I rerun things.
I'm very much at the "I don't know what I don't know" stage of dagster but I'm starting to wonder if this is the right fit? I realize dagster is very much made for more typical ETL workflows which is very different, so perhaps I'm off the mark to use it at all.
For those who know Dagster better is this a fool's errand and should I be looking elsewhere?
Beta Was this translation helpful? Give feedback.
All reactions