You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a Google Cloud Storage bucket that contains a Parquet data lake structured as below:
bucket/deviceid/message/yyyy/mm/dd/xyz.parquet
I aim to use BigQuery to create external tables to enable serverless SQL queries from this bucket (with the end goal of visualizing the data in Grafana dashboards). To do so I wish to leverage partitioning.
As evident, my data is not hive-partitioned. In my previous setup in Amazon Athena + Grafana I managed to get around this via my AWS Glue Job, where I could define my partitioning logic as below:
Is there an equivalent way to define custom partitioning in Google BigQuery? I'm OK with creating my external tables via a Python script as I will anyway need to do so at scale.
In the ideal scenario, I would be able to perform my partition pruning similarly to Amazon Athena, where I can add a filter as below:
WHERE date_created between '2020-04-22' and '2021-01-19'
I tried achieving something similar on-the-fly in my BigQuery SQL, but from what I can tell this does not effectively prune partitions and will therefore not be of practical use:
SELECT
DATE(PARSE_DATE('%Y/%m/%d', REGEXP_EXTRACT(_FILE_NAME, r'.*/(\d{4}/\d{2}/\d{2})/.*'))) AS event_date,
COUNT(AccelerationX) as cnt_accx
FROM
cssdataset2.tbl_0BFD7754_CAN9_ImuData
WHERE
DATE(PARSE_DATE('%Y/%m/%d', REGEXP_EXTRACT(_FILE_NAME, r'.*/(\d{4}/\d{2}/\d{2})/.*'))) BETWEEN '2023-06-22' AND '2023-06-22'
GROUP BY
event_date
I have also considered switching my data structure to hive partitions ala year=yyyy/month=mm/day=dd/ - but even then I seem unable to gain something similar to the Athena filter in my SQL queries, which seems vital to me for a Grafana plugin integration with time series data, as we will otherwise not have a simple dynamic way to prune irrelevant dates from the queries. Is the only practical way of achieving this to use e.g. a /dt=yyyy-mm-dd/ folder structure?
The text was updated successfully, but these errors were encountered:
hi @MatinF , sorry, i do not know the answer to the question. as this seems to be a how-to-do-x question, and not a bugreport or a feature-request, i recommend asking about it in the grafana community, at https://community.grafana.com/tag/bigquery .
if i misunderstood, and this is something that the grafana bigquery plugin could/should directly support, just reply please with more details, and we can reopen the issue.
Uh oh!
There was an error while loading. Please reload this page.
I have a Google Cloud Storage bucket that contains a Parquet data lake structured as below:
I aim to use BigQuery to create external tables to enable serverless SQL queries from this bucket (with the end goal of visualizing the data in Grafana dashboards). To do so I wish to leverage partitioning.
As evident, my data is not hive-partitioned. In my previous setup in Amazon Athena + Grafana I managed to get around this via my AWS Glue Job, where I could define my partitioning logic as below:
Is there an equivalent way to define custom partitioning in Google BigQuery? I'm OK with creating my external tables via a Python script as I will anyway need to do so at scale.
In the ideal scenario, I would be able to perform my partition pruning similarly to Amazon Athena, where I can add a filter as below:
I tried achieving something similar on-the-fly in my BigQuery SQL, but from what I can tell this does not effectively prune partitions and will therefore not be of practical use:
I have also considered switching my data structure to hive partitions ala
year=yyyy/month=mm/day=dd/
- but even then I seem unable to gain something similar to the Athena filter in my SQL queries, which seems vital to me for a Grafana plugin integration with time series data, as we will otherwise not have a simple dynamic way to prune irrelevant dates from the queries. Is the only practical way of achieving this to use e.g. a/dt=yyyy-mm-dd/
folder structure?The text was updated successfully, but these errors were encountered: