-
Notifications
You must be signed in to change notification settings - Fork 48
Plain Text Keys as S3 Filename and S3 files can be grouped on Multitenancy columns #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Also updated callbacks to work with Node 4
Also Added tests for Plain Key as filename in S3
Also updated to be compatible with Node 4 runtime in Lambda |
Hi! Could you explain a little more what the use-case is for what you're calling a MultiTenancy column? I'm seeing that you're sending data to slightly different S3 locations? As for the clear-text s3 keys, I would advise against this -- we implemented the hashed filenames as a way to add randomness to the S3 keys. Without this randomness, S3 can run into some very hard throughput limitations that can cripple the incremental backup if write loads on your dynamo table are above ~400 per second. http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html for more information. |
Sorry, I should have provided some context. So an explanation:
Both these features could be worked around, but I this just makes life a little easier. I was interested to find out if you are keen to have these merged in (this PR hasn't been reviewed, just worth starting the conversation) The last update I made was update to leverage Node v4 runtime on Lambda. |
I'm hesitant about both of these scenarios because of the potential to cause S3 throttling.
|
Hey Ryan, Absolutely understand your concerns regarding the throttling, and that is the reason why I've left these as options that can be opted into, opposed to default on. The idea behind the prefix option is similar - it can also cause throttling issues. MultiTenancyColumn (bad name), can be viewed as a dynamic version of the prefix, based on the data. MD5 is a great solution for the throttling problem, but only if you don't use the prefix option. But the problem it creates is correlating dynamo keys to their S3 Key if they have been deleted - could be impossible if you dont know the entire key itself. I have had a look at the tool you linked, and they can be useful in creating an MD5 of the key specified (I'm not sure if it would work for deleted records?) We are aiming to use this as a DR solution. Which enables us to solve problems where a developer (or security breach) accidently deletes/updates records (or tables). We would need to roll back to a point in time, opposed to knowing the specific key(s) we need to restore. Out of interest, how reliable has the replicator tool been for you in terms of incremental backups to S3? Abhaya |
We implement versioning on the S3 bucket where incremental backups land. With this in hand, the CLI tool is capable of finding the complete history of any dynamodb record, including deleted ones. Further, we run a separate process that routinely scans the S3 incremental backup and rolls results into a single file. We call it a "snapshot" because it roughly represents the state of the entire table at some point in time. See https://github.com/mapbox/dynamodb-replicator/blob/master/s3-snapshot.js. These files give us the ability to roll back the entire table to a previous state, though we are more inclined to roll back individual records if needs be, using S3 versioning and history.
👌 we love it. We've yet to encounter any evidence of data that was dropped from the dynamodb stream --> lambda --> s3 pipeline. |
…eadme. Making changes for x-account permissions Allowing the app to be packaged with custom env config Renaming config files to be more forms specific Naming packages does nothing for lambda Updating readme to reflect addition config + scripts improvements to powershell script Add a canned ACL for the S3 upload to that we don't have permission issues cross-account. Remove baked config. Add ACL permission. Alter package script
correct Markdown formatting
correct Markdown formatting
Some features I needed for my project - Would love to hear your thoughts and if you are interested in these features?