Skip to content

DT-1701: Improve Snapshot by Query error message #1974

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

snf2ye
Copy link
Contributor

@snf2ye snf2ye commented May 27, 2025

Jira ticket: https://broadworkbench.atlassian.net/browse/DT-1701

Addresses

It would be nice to give users a more helpful error message if they did not correctly format their query for snapshot by query.

After a very long debugging session, we found that a user did not "correctly" format their query for a snapshot by query error message. It turns out, if you do not properly qualify both column and table names, then the alias generated for the table doesn't match up with the alias generated for the column. This left the user with an BQ does not recognize table with alias "alias123" error message. Instead, we should provide users with an actionable error message with instructions for how to format their table and column names in the query.

For example,
Select column1 FROM tdrDataset1.table1 where column1 = true will not successfully get translated into a BigQuery query.
The query would end up getting translated to something like the following:
Select alias1.column1 FROM tdrDataset1.table1 AS alias2 where alias1.column1 = true
Which is invalid because alias1 and alias2 do not line up.

Instead, the user should provide something like the following:
Select tdrDataset1.table1.column1 FROM tdrDataset1.table1 where tdrDataset1.table1.column1 = true

There is an example of this sort of query in the terra support docs, but we can help by providing a more direct error message.
https://support.terra.bio/hc/en-us/articles/4407243163291-How-to-create-snapshots-in-TDR-GCP
image

Summary of changes

  • Throw an error if the database or table name is null when generate an alias during query translation.

Testing Strategy

  • unit tests

Copy link

@snf2ye snf2ye changed the title Sh/add error handling by query Improve Snapshot by Query error message May 27, 2025
@snf2ye snf2ye marked this pull request as ready for review May 27, 2025 16:49
@snf2ye snf2ye requested a review from a team as a code owner May 27, 2025 16:49
@snf2ye snf2ye requested review from pshapiro4broad and rjohanek and removed request for a team May 27, 2025 16:49
Copy link
Contributor

@rjohanek rjohanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great!!!! thanks

@snf2ye snf2ye changed the title Improve Snapshot by Query error message DT-1701: Improve Snapshot by Query error message May 27, 2025
@@ -16,6 +17,12 @@ public BigQueryVisitor(Map<String, DatasetModel> datasetMap) {
}

public String generateAlias(String datasetName, String tableName) {
if (datasetName == null || tableName == null) {
Copy link
Member

@pshapiro4broad pshapiro4broad May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible for datasetName to be null. It's possible for ctx.dataset_name() to be null but since the name is always prefixed by prefixDatasetName() before it gets here, it will never be null in this method. If it's possible that it is null in practice, we need to do this check in the caller before the prefix is added.

If this check is done in the caller, then we have Table_exprContext or Column_exprContext which would let us include the specific part of the query in the error message, which we could use to make a more helpful error message.

assertThrows(
InvalidQueryException.class,
() -> parsedQuery.translateSql(bqVisitor),
"All column and table names must be qualified with a dataset/table name. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I wouldn't assert the entire text, or any of it unless it includes something specific based on the method parameters.

@@ -16,6 +17,12 @@ public BigQueryVisitor(Map<String, DatasetModel> datasetMap) {
}

public String generateAlias(String datasetName, String tableName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this is public is a bit confusing in that it has no meaning outside this class. It looks like we need it to perform string matching in tests, if so it should have @VisibleForTesting.

Copy link
Member

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the null tableName case but, if datasetName can be null, that case isn't fixed. I would either come up with a test case for null datasetName and fix it, or remove that check for null.

Doing the check in the caller does lead to some duplication but it would be nice to have the parser context, which includes the user's source text and could be a big help in tracking down where in the query the problem is.

String testQuery, boolean queryHasQualifiedNames) {
BigQueryVisitor bqVisitor = new BigQueryVisitor(datasetMap);
Query parsedQuery = Query.parse(testQuery);
if (!queryHasQualifiedNames) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a small thing but you could flip the if branches and remove the ! since there's no need to order the branches in a particular way.

@@ -16,6 +17,12 @@ public BigQueryVisitor(Map<String, DatasetModel> datasetMap) {
}

public String generateAlias(String datasetName, String tableName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method doesn't need any state in BigQueryVisitor so it can be changed to static

Suggested change
public String generateAlias(String datasetName, String tableName) {
public static String generateAlias(String datasetName, String tableName) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants