Skip to content

Retrieving large amounts of data from a database #24

@leamingrad

Description

@leamingrad

Firstly, thanks for a great package, caqti has proved to be super-useful for handling database interactions in ocaml.

I'm currently trying to work out how to efficiently extract large amounts of data (multiple GB) from a postgresql database, which I then want to fold over using lwt. Unfortunately, naively using a single SELECT query and fold_s does not work, as caqti loads all of the data in memory, and causes my machine to run out of RAM).

I have had a look through the postgresql documentation, and it seems like there are two ways to do this (if you can think of others I would be happy to hear them):

  • Break the query into multiple smaller queries using OFFSET /LIMIT batching, potentially with an extra query at the start to count the rows that will be returned
  • Making use of postgresql's single row mode in order to stream through the results. This is supported by the ocaml postgres package, but isn't currently used by caqti as far as I can tell

I could potentially implement the OFFSET/LIMIT behaviour in my application, but that would mean that I could no-longer make use of the fold_s helper function, which is very useful IMO. Similarly, it probably does not make sense for caqti to implement this sort of batching, as the API does not provide enough detail about the SQL statements being executed to make sure that adding OFFSET and LIMIT parameters would not conflict.

Do you think it would be worth adding the single row mode behaviour to caqti when using the postgres driver - either by default or (more likely) as an option?

I would be happy to put in a PR to add this function.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions