Skip to content

DT-1701: Improve Snapshot by Query error message #1974

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
7 changes: 7 additions & 0 deletions src/main/java/bio/terra/grammar/google/BigQueryVisitor.java
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import bio.terra.common.PdaoConstant;
import bio.terra.grammar.DatasetAwareVisitor;
import bio.terra.grammar.SQLParser;
import bio.terra.grammar.exception.InvalidQueryException;
import bio.terra.model.DatasetModel;
import bio.terra.model.SnapshotModel;
import bio.terra.service.snapshotbuilder.query.TableNameGenerator;
Expand All @@ -16,6 +17,12 @@ public BigQueryVisitor(Map<String, DatasetModel> datasetMap) {
}

public String generateAlias(String datasetName, String tableName) {
if (datasetName == null || tableName == null) {
Copy link
Member

@pshapiro4broad pshapiro4broad May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible for datasetName to be null. It's possible for ctx.dataset_name() to be null but since the name is always prefixed by prefixDatasetName() before it gets here, it will never be null in this method. If it's possible that it is null in practice, we need to do this check in the caller before the prefix is added.

If this check is done in the caller, then we have Table_exprContext or Column_exprContext which would let us include the specific part of the query in the error message, which we could use to make a more helpful error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, thank you so much!
It turns out that we don't even let an unqualified table name even get to the translate step: it fails on parsing the query. So, I've moved the logic to the column call and removed the check for the dataset. I've added a test for parsing an unqualified dataset, so we should be alerted if we try to update the grammar to no longer require qualified table names.

throw new InvalidQueryException(
"All column and table names must be qualified with a dataset/table name. "
+ "Please ensure that your query uses the format `dataset.table.column` for columns and `dataset.table` for tables. "
+ "For example, use `my_dataset.my_table.my_column` instead of just `my_column` and `my_dataset.my_table` instead of just `my_table`.");
}
return "alias" + Math.abs(Objects.hash(datasetName, tableName));
}

Expand Down
37 changes: 37 additions & 0 deletions src/test/java/bio/terra/grammar/GrammarTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import static org.hamcrest.CoreMatchers.hasItems;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.containsInAnyOrder;
import static org.hamcrest.Matchers.containsString;
import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.equalToCompressingWhiteSpace;
import static org.junit.jupiter.api.Assertions.assertThrows;
Expand Down Expand Up @@ -237,6 +238,42 @@ void testBqTranslate() {
+ "`.x = 'string'"));
}

@ParameterizedTest
@MethodSource
void testRequiredQualifiedNameOnQueryTranslation(
String testQuery, boolean queryHasQualifiedNames) {
BigQueryVisitor bqVisitor = new BigQueryVisitor(datasetMap);
Query parsedQuery = Query.parse(testQuery);
if (!queryHasQualifiedNames) {
assertThrows(
InvalidQueryException.class,
() -> parsedQuery.translateSql(bqVisitor),
"All column and table names must be qualified with a dataset/table name. "
+ "Please ensure that your query uses the format `dataset.table.column` for columns and `dataset.table` for tables. "
+ "For example, use `my_dataset.my_table.my_column` instead of just `my_column` and `my_dataset.my_table` instead of just `my_table`.");
} else {
String bqDatasetName = PdaoConstant.PDAO_PREFIX + "dataset";
String tableName = "table";
String translated = parsedQuery.translateSql(bqVisitor);
String aliasedTableName = bqVisitor.generateAlias(bqDatasetName, tableName);
assertThat(
"query translates to valid bigquery syntax",
translated,
containsString(aliasedTableName));
}
}

static Stream<Arguments> testRequiredQualifiedNameOnQueryTranslation() {
return Stream.of(
Arguments.of(
"SELECT dataset.table.datarepo_row_id FROM dataset.table WHERE dataset.table.x = 'string'",
true),
Arguments.of(
"SELECT datarepo_row_id FROM dataset.table WHERE dataset.table.x = 'string'", false),
Arguments.of("SELECT * FROM dataset.table WHERE dataset.table.x = 'string'", true),
Arguments.of("SELECT datarepo_row_id FROM dataset.table WHERE x = 'string'", false));
}

@Test
void testBqTranslateTwoTables() {
BigQueryVisitor bqVisitor = new BigQueryVisitor(datasetMap);
Expand Down