How to define transformation logic for multiple datasets #267
-
Hi, I've gone through the framework and I am still trying to understand how the framework scales up for multiple datasets and transformation logic., sorry if it is too basic a question or if the phrasing is not clear.
Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi!
Pattern 1:There is also a second pattern below in another comment. Declare each table as a distinct dataset, specifying which lambda to use in stageA (example with two tables,
Then when defining your pipeline, make sure to process all datasets by providing the relevant event pattern:
Of course this means the |
Beta Was this translation helpful? Give feedback.
-
Pattern 2Define a single dataset (unlike pattern 1), and modify What you can do is, provide multiple Lambda ARN when defining the dataset:
Of course and again this means the Then in preupdate-metadata make sure to get the relevant value:
This assumes the table name (
it will return I would say this is my preferred pattern here, but it depends on what you're comfortable with and other requirements you have I may not be aware of. |
Beta Was this translation helpful? Give feedback.
Pattern 2
Define a single dataset (unlike pattern 1), and modify
sdlf-stageA
to update the logic deciding which Lambda to run. If you look at thepreupdate-metadata
lambda, it fetches the Lambda ARN from DynamoDB and put it in the outputs for use by the next step of the Step Functions.What you can do is, provide multiple Lambda ARN when defining the dataset: