Switch to using v2 schema for importer #5583
Replies: 12 comments
-
In v1, we initially used a lot of user defined types but in later versions removed most of those. So you could use v1 against a postgres database, then export the latest version of that schema for use in bigquery. You could use v2, but it's pre-alpha and we break it all the time. Plus it's specific to Citus so wouldn't work without tweaking. Just set the We also already export the data to bigquery via our |
Beta Was this translation helpful? Give feedback.
-
@steven-sheehy I've had a lot of trouble with I'm still seeing too many user-defined types in the schema to be used in bigquery. Right now, I have almost one user-defined type in almost every table, there by making every table have errors with bigquery (most common one being |
Beta Was this translation helpful? Give feedback.
-
True, some tables still use user defined types even in the latest version. You can adjust anything that uses entity_id or hbar_tinybars to be bigint to make it work. Should not require any code changes since they're compatible. We do also use enum types which I'm not sure how bigquery would handle. |
Beta Was this translation helpful? Give feedback.
-
@steven-sheehy Big query is not a fan. Do you have any recommendations on the best way to do this? I'm currently running a kubernetes cluster on GCP with the hedera-mirror helm chart and a long custom values.yaml file. I could write a script which a series of |
Beta Was this translation helpful? Give feedback.
-
Hi @josneville, for the BigQuery schema errors, the schema described here should help solve your problem with the common user defined types, such as Can you reference that and see if that helps with all the user defined types in BigQuery? |
Beta Was this translation helpful? Give feedback.
-
@josneville For connection errors and only 1/10th of the transactions going through, can you post a snippet of the importer log output? |
Beta Was this translation helpful? Give feedback.
-
@edwin-greene This is the main error I see with the importer when I use it in pubsub mode:
On the topic, I receive about 40-ish requests a second, far below what I need speed wise to ingest live data. |
Beta Was this translation helpful? Give feedback.
-
@edwin-greene I've tried increasing |
Beta Was this translation helpful? Give feedback.
-
@josneville Please see the Pub/Sub batching settings here: https://github.com/hashgraph/hedera-mirror-node/blob/main/hedera-mirror-importer/src/test/resources/config/application-pubsub.yml By enabling batching and adjusting the batch settings to your environment/setup it may help resolve the connection problems you have been experiencing. There is an open issue to move the batch settings to the main application.yml. |
Beta Was this translation helpful? Give feedback.
-
@edwin-greene that makes sense. We decided to stick to the postgres download for now. Do you or @steven-sheehy have recommendations for speeding up syncing the mirror node to live data (starting with historical first)? If cost wasn't a restriction, how would you guys recommending scaling up this node to download faster? It's currently downloading 1-2 days worth of data each day, does that sound about right? |
Beta Was this translation helpful? Give feedback.
-
Pub/Sub publishing speed was recently improved. Make sure you are on version 0.75 or greater with batch settings as described in |
Beta Was this translation helpful? Give feedback.
-
@josneville Converted this issue to a discussion since it seems to have strayed from its original purpose with a bunch of separate questions. Historical syncing is known to be slow as we have not optimized for that path. There is a ticket that notes some ways that should be able to dramatically speed it up. If you end up attempting that we'd greatly appreciate any documentation you can contribute that goes toward solving that ticket. 🙏🏻 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
I'm trying to build a Google datastream from the postgres database in the helm chart to a bigquery database. I'm running into an issue where a lot of the column types in postgres aren't recognizable by bigquery since they are user-defined types (
entity_id
,hbar_tinybars
etc). This is leading to a lot of columns unable to be imported into bigquery.Solution
I noticed that there is a v2 schema provided in the code that uses native postgres types as opposed to user-defined types under
hedera-mirror-node/hedera-mirror-importer/bin/main/db/migration/v2/V2.0.0__create_tables.sql
. Is it possible to use this schema to initiate the database as opposed to using V1 schema using flags or env variables or some kubeconfig?Alternatives
No response
Beta Was this translation helpful? Give feedback.
All reactions