-
Notifications
You must be signed in to change notification settings - Fork 310
HPCC-35100 XRef should locate dir-per-part files during scanDirectory #20529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: candidate-9.12.x
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Optimizes directory scanning by moving dir-per-part file detection into addFile during initial scan, removing the later merge logic and reducing per-item logging. Adds final heartbeat completion statistics and suppresses previous debug logging blocks.
- Moved dir-per-part detection from post-processing (mergeDirPerPartDirs) into on-the-fly file addition (getFile/addFile changes)
- Added finishHeartbeat to emit aggregate stats and replaced end-of-scan messages
- Removed mergeDirPerPartDirs logic and disabled several debug logging sections
|
I'm not sure about the prepareFileAndReturnLock structure. Should the logic be left in addFile to avoid having to return a critical block? Should I break up the dir-per-part detection to make it clearer? Copilot seemed to have a lot of the same complaints for the new code, so I don't think it is currently the best way to approach it. |
|
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-35100 Jirabot Action Result: |
The optimization marginally improved the runtime. I have included the timings because I though it was strange how the Directory scan phase sped up slightly despite the extra workload and the Orphan scan only slightly sped up despite it doing seemingly a lot less work.
Type of change:
Checklist:
Smoketest:
Testing: