Skip to content

Conversation

Copy link

Copilot AI commented Sep 3, 2025

This PR addresses Issue #1973 by adding comprehensive tests to validate the reported bug where schema comparison dashboards only display 50 rows even when tables have more than 50 columns.

Problem

Users reported that when performing schema comparison on tables with more than 50 columns, the reconciliation dashboard only shows 50 columns in the schema details, despite validation occurring for all columns. This suggests a limitation in either the data persistence layer or dashboard display logic.

Solution

Added three targeted test cases to tests/integration/reconcile/test_schema_compare.py to systematically validate this bug:

Test Cases Added

  1. test_schema_compare_exactly_50_columns - Baseline test with exactly 50 columns to confirm normal operation
  2. test_schema_compare_51_columns_edge_case - Edge case test with 51 columns to identify the exact threshold where the bug manifests
  3. test_schema_compare_large_column_count_bug_validation - Clear reproduction case with 60 columns

Expected Behavior

If the bug exists:

  • 50-column test should pass (✅)
  • 51-column test should fail with descriptive error: "BUG DETECTED: Expected 51 rows but got 50" (❌)
  • 60-column test should fail with error: "BUG CONFIRMED: Expected 60 rows but got 50" (❌)

When the bug is fixed:

  • All tests should pass, returning row counts equal to the number of columns (✅)

Root Cause Analysis

Investigation revealed the potential issue lies in the data persistence pipeline in recon_capture.py:

  1. Schema results are stored using Spark's collect_list() to aggregate column data
  2. Dashboard queries use explode() to convert aggregated data back to individual rows
  3. There may be a default limit in this aggregation/explode process that caps results at 50 rows

Implementation Details

  • Tests follow existing patterns using schema_fixture_factory and mock_spark fixtures
  • Added comprehensive debug output for troubleshooting
  • Enhanced error messages clearly identify when the 50-row limit bug is detected
  • No changes to existing production code - only test additions for validation

These tests provide a foundation for confirming the bug exists and validating any future fix while preventing regression.


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Copilot AI and others added 2 commits September 3, 2025 04:42
…mparison (issue #1973)

Co-authored-by: sundarshankar89 <72757199+sundarshankar89@users.noreply.github.com>
…essages for issue #1973

Co-authored-by: sundarshankar89 <72757199+sundarshankar89@users.noreply.github.com>
Copilot AI changed the title [WIP] write a small test to validate the bug in test_schema_compare.py @databrickslabs/lakebridge/issues/1973 Raise only PRs do not apply changes directly to main Add test validation for schema comparison 50-row limit bug (Issue #1973) Sep 3, 2025
Copilot finished work on behalf of sundarshankar89 September 3, 2025 04:45
@github-actions
Copy link

github-actions bot commented Sep 3, 2025

✅ 21/21 passed, 1m7s total

Running from acceptance #2115

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants