Replies: 1 comment 1 reply
-
Hi @js2702, thanks for the feedback! I'll try to give some insight into the design decisions leading to the current architecture.
The hardcoded A smaller feature that might work for your use case is if we add support for "column aliases" on the client. That way you could sync the data using
We currently store the data as raw JSON strings, since that is what the client works with, and that requires the least amount of parsing and re-serialization throughout the sync process. For the most part the data is kept as a string all the way from bucket storage on the service to storage on the client, until the data is queried on the client - we're not even doing JSON parsing on it when syncing. We are considering adding BSON as an option - this is what MongoDB uses for "objects". The main advantage would be for supporting binary data though, rather than reducing storage size or parsing overhead.
The "column aliases" idea from above could actually also work here - that would allow you to alias to shorter column names in sync rules, and back to the full column names on the client.
We're using the ISO8601 format by default since it's simple to inspect and work with. You can however convert to unix epoch in your sync rules - see the example here. Of course the number is still stored as a JSON string, so it's still going to be more than 8 bytes.
Storing the data directly as "objects" in the MongoDB storage database won't reduce the storage overhead significantly - it would still include the column names for each row, and it would add overhead to convert to JSON on each sync request. For the 17GB you're currently seeing - is that the "storage size" (often compressed), or data size (uncompressed size)? With MongoDB you can enable snappy or zstd compression, which can significantly reduce storage size, especially for JSON data like this. Other notesNote that you can also use Postgres for bucket storage as alternative, if you prefer that over MongoDB. It will have most of the same storage size trade-offs though, and is currently not quite as optimized as our MongoDB implementation. We are also planning on adding support for S3 or equivalent object storage, to offload the bulk data storage (with MongoDB/Postgres still used for the real-time updates). That could be significantly cheaper and scale better when working with large amounts of data. We don't have specific timelines for this yet though. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, we are currently testing PowerSync through a private beta with some of our users. We currently have onboarded 3800 enrolled users. Those users are using ~17 GB. Seeing as we have north of 300K users we would like to find some way to optimize this storage.
As this was a migration from an existing app, which only was using SQLite, we had to reach some compromises.
1. Sync op ids.
For example, id columns are duplicated because we usually name id columns with a prefix for the table. For example the
food
table the id column would bef_id
. We have a large codebase with joins and multiple queries which could start getting conflicts if we were to use the "id" name. That's why we opted to duplicate the id column in the sync rules, keeping the original column too.Would it be possible to have some kind of mapping in powersync where we could specify the name of the powersync op id for a particular table?
2. Column names
Exploring the bucket storage in mongoDb in our local development setup we've noticed that it stores the full row as a Json string. That includes the column names, which on large buckets like in our domain could be greatly affected. For instance taking the data string field for an op in our example the full data field takes 377 bytes and 166 of those bytes are column names (14 columns) without the double quotes.
An idea would be to encode the column names to single characters in a transparent way. This would reduce in our example the column names to only 14 bytes, instead of 166 bytes.
3. Dates
We have a few timestamp columns in some of our large tables. Currently each date is using the ISO text format so it takes about 24 bytes. If we were to use milliseconds since epoch encoded as a 64 bits integer (int8) it could represent up to September 13, 275760 AD, and it would only take 8 bytes.
4. Extra
I'm not familiar with MongoDb, but wouldn't using an Object in mongo for the data field take less space? Assuming you can read the column/field names in the same way.
Hopefully some of these optimizations are feasible.
Keep up the great work!
Beta Was this translation helpful? Give feedback.
All reactions