Add test validation for schema comparison 50-row limit bug (Issue #1973) #1984
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses Issue #1973 by adding comprehensive tests to validate the reported bug where schema comparison dashboards only display 50 rows even when tables have more than 50 columns.
Problem
Users reported that when performing schema comparison on tables with more than 50 columns, the reconciliation dashboard only shows 50 columns in the schema details, despite validation occurring for all columns. This suggests a limitation in either the data persistence layer or dashboard display logic.
Solution
Added three targeted test cases to
tests/integration/reconcile/test_schema_compare.pyto systematically validate this bug:Test Cases Added
test_schema_compare_exactly_50_columns- Baseline test with exactly 50 columns to confirm normal operationtest_schema_compare_51_columns_edge_case- Edge case test with 51 columns to identify the exact threshold where the bug manifeststest_schema_compare_large_column_count_bug_validation- Clear reproduction case with 60 columnsExpected Behavior
If the bug exists:
When the bug is fixed:
Root Cause Analysis
Investigation revealed the potential issue lies in the data persistence pipeline in
recon_capture.py:collect_list()to aggregate column dataexplode()to convert aggregated data back to individual rowsImplementation Details
schema_fixture_factoryandmock_sparkfixturesThese tests provide a foundation for confirming the bug exists and validating any future fix while preventing regression.
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.