Skip to content

Conversation

@carlosrinc
Copy link

Refactored logging_utils.py with improved directory creation, error handling for wandb.init, and a configurable log file path.
Refactored dist_checkpoint_utils.py with improved path handling, directory creation, and error handling.
Added unit tests for logging_utils.py and dist_checkpoint_utils.py to verify the refactorings.

This commit introduces several improvements to training/utils/logging_utils.py and training/utils/dist_checkpoint_utils.py.

In logging_utils.py:

  • Replaced os.system("mkdir -p ...") with os.makedirs(..., exist_ok=True) for safer directory creation.
  • Added error handling for wandb.init() to catch potential initialization failures.
  • Made the loguru log file path configurable via arguments, defaulting to "logs/file_{time}.log".

In dist_checkpoint_utils.py:

  • Replaced os.system("mkdir -p ...") with os.makedirs(..., exist_ok=True).
  • Refactored path joining to reduce redundancy and improve readability.
  • Enhanced error handling in load_checkpoint by catching more specific exceptions (e.g., FileNotFoundError) and providing clearer messages.
  • Corrected path construction in load_stream_dataloader_state_dict to use existing variables.
  • Added basic error handling for saving dataset state dicts.

Unit tests have been added for both modules to cover the new functionality and error handling, ensuring the stability of these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant