Skip to content

Conversation

RobHanna-NOAA
Copy link
Contributor

@RobHanna-NOAA RobHanna-NOAA commented Sep 8, 2025

Going into the FIM 6.0 release, we planned on getting usgs_rating_curve files. Then we found a CatFIM problem that triggered some changes to shared functions that get_usgs_rating_curves needed. A quick test after the CatFIM change showed it broke getting usgs rating curves. We deferred it until now. We also wanted to add multi proc as it took over 32 hours to run. Multi-processing has now been added to bring this duration down drastically.

A couple of other minor updates were done related to this:

  • Add the new shared_function run_with_mp and setup_mp_file_logger. In implementation of that system, a few updates and adjustments were added/required.
    • It original needed child mp tasks to return either True or False. This one needed to return a dataframe.
    • We also found that originally it determined that the mp tasks was either True / False meaning success or fail. If fail, it optionally may want to shut down the entire script. This script has three status: Success, Fail but continue or Fail and shut down the script. This script needed to have both options available for each task to decide if it was an "acceptable" fail or a "catastrophic" fail. A "status" return code system was added to handle the three scenarios of what each child mp task could return.
    • Forced shutdowns of processpools and thread queues is well known to be tricky. A lot of factors can play into when and how catastrophic fails. Depending on what code objects any companies has in their code requires different strategies to handle full shut downs. In our case, it was a bit trickier than normal because of the combination of the processpool, tqdm and the screen queue. With lots of testing and experimentation, we now have a stronger system helping reducing the risk of the app hanging, orphaned threads or memory leaks. Note: It is not perfect and never will be. CTRL-C is now handled better but it will never be perfect. As with all of our scripts, product wide, whenever you use CTRL-C to abort, and you may have to do it a number of times, you need to close your docker container and restart to fully clean it up.
    • With the change of returns for run_with_mp and it's tasks, we went back to adjust all current scripts using this system and fixed them. Minor adjustments were made to all three scripts: make_dem_difs_for_bridges.py, pull_osm_roads.py and download_fema_nfhl.py. All three were stub tested to ensure their small code adjustments were fine.
  • Another change: The file was renamed from rating_curve_get_usgs_curves.py to get_usgs_rating_curves.py. Some other files had note changed to reflect the updated file name.
  • get_usgs_rating_curves.py picked up a major upgrade on logging including more details of what failed, when and details to details to help show what record id's were being processed when failed.
  • A bug was found and fixed in tools_shared_functions.py.

A full new valid set of usgs_rating_curve data files was created, will be copied to all enviros and bash_variables.env updated to reflect the new set.

Renamed files:

  • Was: data\usgs\rating_curve_get_usgs_curves.py to data\usgs\get_usgs_rating_curves.py,

Files updated related to file name change (all just note updates):

  • data\nws\preprocess_ahps_nws.py, data\usgs\preprocess_ahps_usgs.py, src\src_adjust_usgs_rating_trace.py, tools\fimr_to_benchmark.py, tools\generate_nws_lid.py and tools\rating_curve_comparison.py

Changes

  • data

    • bridges\make_dem_dif_for_bridges.py: Updated for return values to run_with_mp shared functions.
    • nflh\download_fema_nfhl.py: Updated for return values to run_with_mp shared functions.
    • roads\pull_osm_roads.py: Updated for return values to run_with_mp shared functions.
    • usgs\get_usgs_rating_curve.py: as described above.
  • pyproject.toml: Updated linting rules doc to reflect the new get_usgs_rating_curve.py file name.

  • src\bash_variables.env: Updated for new path for the new usgs rating curve files

  • src\utils\shared_functions: In addition to the updates mentioned above for run_with_mp, the function named setup_mp_file_logger was updated to make an error file to help identify errors that occur. Logging types of both error and critical now show up in the regular log, but are copied into the new "error" log files to help bring the error(s) to attention. A new function named setup_file_logger was added which is nearly identical to the previous existing setup_mp_file_logger but is for non multi-processing usages. It has a few critical differences. There is likely a way to merge them. A couple of new small logging related utility functions were also added.

  • tools\tools_shared_functions.py: Fixed a bug that assumed a particular https response node was returned.

Closes: 1578 and 1596


Testing

A very large amounts of tests were performed during development.

  • Tests to make_dem_dif_for_bridges.py, make_dem_dif_for_bridges.py and pull_osm_roads.py were made to ensure their mp changes were valid. Granted minor test adjustments were made to other parts of code in those files to stub test the new changes.
  • A full new dataset was created for usgs rating curve files were created and distributed.
  • A wide range of scenarios and forced types of exceptions were made to help valid the upgraded run_with_mp function exception handling.

Deployment Plan (For FIM developers use)

  • Does the change impact inputs, docker or python packages?
    • Yes
    • No (f no.. skip the rest of the Deployment Plan section)

It has new usgs_rating_curve files and bash_variables update. Needs to be copied to all enviros.

  • If you are not a FIM dev team member: Please let us know what you need and we can help with it.

  • If you are a FIM Dev team member:

    • Please work with the DevOps team and do not just go ahead and do it without some co-ordination.

    • Copy where you can, assign where you can not, and it is your responsibility to ensure it is done. Please ensure it is completed before the PR is merged.

    • Has new or updated python packages, PipFile, Pipefile.lock or Dockerfile changes? DevOps can help or take care of it if you want. Just need to know if it is required.

      • Yes
      • No
    • Require new or adjusted data inputs? Does it have a way to version (folder or file dates)?

      • No
      • Yes
        • Require new pre-clip set or any other data reloads, such as DEMS, osm, etc. ie.. pre-requisite re-data upstream of your input changes.
          • Yes
          • No
        • Has the inputs been copied/exist in all five enviros:
          • FIM EFS
          • FIM S3
          • ESIP
          • Dev1
  • Please use caution in removing older version unless it is at least two versions ago. Confirm with DevOps if cleanup might be involved.

  • If new or updated data sets, has the FIM code, including running fim_pipeline.sh, been updated and tested with the new/adjusted data? You can dev test against subsets if you like.

    • Yes

Notes to DevOps Team or others:

Please add any notes that are helpful for us to make sure it is all done correctly. Do not put actual server names or full true paths, just shortcut paths like 'efs..../inputs/, or 'dev1....inputs', etc.


Issuer Checklist (For developer use)

You may update this checklist before and/or after creating the PR. If you're unsure about any of them, please ask, we're here to help! These items are what we are going to look for before merging your code.

  • Informative and human-readable title, using the format: [_pt] PR: <description>
  • Links are provided if this PR resolves an issue, or depends on another other PR
  • If submitting a PR to the dev branch (the default branch), you have a descriptive Feature Branch name using the format: dev-<description-of-change> (e.g. dev-revise-levee-masking)
  • Changes are limited to a single goal (no scope creep)
  • The feature branch you're submitting as a PR is up to date (merged) with the latest dev branch
  • pre-commit hooks were run locally
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • CHANGELOG updated with template version number, e.g. 4.x.x.x
  • Add yourself as an assignee in the PR as well as the FIM Technical Lead

Merge Checklist (For Technical Lead use only)

  • Update CHANGELOG with latest version number and merge date
  • Update the Citation.cff file to reflect the latest version number in the CHANGELOG
  • If applicable, update README with major alterations

@RobHanna-NOAA RobHanna-NOAA marked this pull request as ready for review September 15, 2025 18:35
@RobHanna-NOAA RobHanna-NOAA changed the title [1pt] PR: WIP - Add multiproc to get usgs rating curves script [1pt] PR: Add multiproc to get usgs rating curves script Sep 15, 2025
@CarsonPruitt-NOAA CarsonPruitt-NOAA merged commit 7b5c424 into dev Sep 19, 2025
1 check passed
@CarsonPruitt-NOAA CarsonPruitt-NOAA deleted the dev-get-usgs branch September 19, 2025 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[5pt] Bug Fix rating_curve_get_usgs_rating.py and re-run full release [5pt] Add multiprocessing to USGS data download script

2 participants