-
Notifications
You must be signed in to change notification settings - Fork 39
Description
Firstly, thanks for a great package, caqti
has proved to be super-useful for handling database interactions in ocaml.
I'm currently trying to work out how to efficiently extract large amounts of data (multiple GB) from a postgresql database, which I then want to fold over using lwt
. Unfortunately, naively using a single SELECT
query and fold_s
does not work, as caqti
loads all of the data in memory, and causes my machine to run out of RAM).
I have had a look through the postgresql documentation, and it seems like there are two ways to do this (if you can think of others I would be happy to hear them):
- Break the query into multiple smaller queries using
OFFSET
/LIMIT
batching, potentially with an extra query at the start to count the rows that will be returned - Making use of postgresql's single row mode in order to stream through the results. This is supported by the ocaml postgres package, but isn't currently used by
caqti
as far as I can tell
I could potentially implement the OFFSET
/LIMIT
behaviour in my application, but that would mean that I could no-longer make use of the fold_s
helper function, which is very useful IMO. Similarly, it probably does not make sense for caqti
to implement this sort of batching, as the API does not provide enough detail about the SQL statements being executed to make sure that adding OFFSET
and LIMIT
parameters would not conflict.
Do you think it would be worth adding the single row mode behaviour to caqti
when using the postgres driver - either by default or (more likely) as an option?
I would be happy to put in a PR to add this function.