-
Notifications
You must be signed in to change notification settings - Fork 35
Description
In working the new bridge lidar code, we used the sf.progress_bar (or similar name) and found it was actually blocking MP from working. Pretty stunning find. It was noticed by having at least one print statement inside a MP function with the HUC number or something to help validate if the MP is really working. The key that something is ordered being feed into the MP functions that can help us understand where we are in processing order. Which is a very valuable piece of info that should be used in every MP everywhere.
ie) When we ran data/bridges/make_dem_dif_for_bridges.py, it has a MP in it:
with ProcessPoolExecutor(max_workers=number_jobs) as executor:
executor_dict = {}
for dem_file in dem_files:
.....
# Send the executor to the progress bar and wait for all tasks to finish
sf.progress_bar_handler(executor_dict, "Making HUC8/6 Diff Raster files")
When we made the "dem_files" object sorted and put in a print line inside the called function.. we found that it really was not using multi proc.
sf.progress_bar_handler - is the in src/shared_functions.
How did we know?
If the MP was workign.... we would see output lines with something like this:
- starting HUC 01000000
- starting HUC 12000000
- starting HUC 13000000
- starting HUC 14000000
- finishing HUC 0100000
- finishing HUC 1400000
- finishing HUC 1200000
- starting HUC 1500000
Notice the starts and finishes. They won't all start in order exactly or especially finish in order.
BUT... even with MP working, if you see this:
- starting HUC 01000000
- finishing HUC 0100000
- starting HUC 12000000
- finishing HUC 1200000
- starting HUC 13000000
- finishing HUC 1300000
- starting HUC 14000000
- finishing HUC 1400000
Let it processes more than just 4 in the above example to be sure. But if you see one start and finish, then another start and finish... something is wrong with MP.
With the make_dem_dif_for_bridges, I noticed that MP was really not working and the progress bar was doing some weird things in its printing.
I commented out the sf.progress_bar_handler and now MP was working and it was flying.
So... what happened here. Not sure but sure points to a problem with either using MP and status bars together or maybe just the sf.progress_bar_handler.
With performance and how long tools take have a direct impact on costs and costs getting a lot of visibility these days... it is very important. It can mean the difference literally of 4 hours or 40 hours. and that is just j-10. MP's can have 30 or 40 times impact (ie -j 30 or -j 40)
The task:
- What happened here?
- is this a problem between MP and tqdm?
- Is this a system with sf.progress_bar_handler or maybe how we use it?
- with sf.progress_bar_handler plugged in a number of places.. maybe they are not working either?
- There are other variants of MP's workign with tqdm but maybe not sf.progress_bar_handler. Are they workign?
- Do we even really need status bars? maybe, maybe not. Maybe an ordered printing statement is good enough?
And what do we do about it.
As mentioned.. with dramatic differences (10x, 30x, 40x), this has major impacts on duration and ultimately costs. Sometimes even major impact on testing or partial dev testing.
It might not seem like lower than High Priority but in my opinion it is, especially in places like "synthesize_test_cases". Each tool that uses it will help sort out priority of the fix in that tool. Synthesize test case review and some places possibly in post-processing are pretty important to be looked into.