Skip to content

BigQuery

Alexander Diemand edited this page Aug 3, 2023 · 1 revision

Note

to start querying data, you need to have a Google project. In case you don’t yet have a Google project:

NB. If you don't query the dataset from your own project you'll be getting the error: "Access Denied: Project iog-data-analytics: User does not have bigquery.jobs.create permission in project iog-data-analytics."

About BigQuery

According to its official description🔗, BigQuery is a fully managed enterprise data warehouse that helps to manage and analyze data with built-in features like machine learning, geospatial analysis, and business intelligence. BigQuery's serverless architecture allows using SQL queries to get the data you are looking for with zero infrastructure management.

Motivation

Cardano’s on-chain data has considerably grown over the last few months. This means that the time to sync the whole history of the blockchain increases accordingly. Running a node and a Db-sync process (mapping the on-chain data to a relational database) now requires more time and a more robust software instance. We estimate these costs to be around $200 per month. As the on-chain data is considered immutable after a number of confirmations or blocks on top, our BigQuery project allows querying these data at very low costs.

Google BigQuery makes it easier to look up data without the need to run specialized software. And using Google Data Studio, one can seamlessly create advanced visualizations and dashboards based on this data.

Costs

The cost for querying data from BigQuery is paid by the user initiating the query. BigQuery charges an amount analogous to the amount of the data being transferred for the query.

Currently, the cost is $5 per terabyte (TB) of data queried. Find more information on the BigQuery pricing page🔗.

Note: the data tables in this project are divided by epoch numbers so that queries can select only the epochs they need to access, resulting in less data transfer and thus incurring less costs.

Notes

There are several things you should note when working with this BigQuery project:

  • the data is fetched (updated) from Db-sync every two hours.
  • the data is only updated to approximately the last 20 blocks before the current block height in Db-sync. This is essential to prevent rollbacks of blocks in the case of chain forks.
  • the data is organized by epoch numbers. This allows limiting queries to one or several epochs worth of data, which results in a lower cost per query.
Clone this wiki locally