Replies: 1 comment
-
MIght be worth trying -> PR with proposal is better to discuss such things, and by preparing a PR proposal you might find out more limitations/issues or that it is actually easy. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
I've been trying to help my colleagues to set up
SqlToS3Operator
recently, and together we've come across the fact that this operator can't handle big tables correctly. See, it usesget_pandas_df
method to read the full table in RAM first, then loads it to S3, optionally in multiple files ifmax_rows_per_file
argument is provided. The problem is, this logic is not suitable for big tables, but can be (with not so much effort) fixed.Given that
SqlToS3Operator._get_hook()
method is designed to returnDbApiHook
instance, and that the latter has aget_pandas_df_by_chunks
method, isn't it only natural to use this method instead ofget_pandas_df
whenmax_rows_per_file
is specified for theSqlToS3Operator
?Beta Was this translation helpful? Give feedback.
All reactions