-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add function for getting full data name, combined data source and dataset name (used in readii_2_roqc) #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nd dataset name (used in readii_2_roqc)
WalkthroughA new utility function, Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant utils.dataset_config
participant utils.__init__
Caller->>utils.__init__: get_full_data_name(config)
utils.__init__->>utils.dataset_config: get_full_data_name(config)
utils.dataset_config->>utils.dataset_config: (If config is Path) loadImageDatasetConfig(config)
utils.dataset_config->>utils.dataset_config: (If config is dict) use directly
utils.dataset_config->>utils.dataset_config: (Else) log and raise TypeError
utils.dataset_config-->>utils.__init__: Return "DATA_SOURCE_DATASET_NAME"
utils.__init__-->>Caller: Return result
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/readii/utils/dataset_config.py (2)
7-7
: Improve type annotation and add return type.The type annotation can be simplified and the return type should be specified for better code clarity.
-def get_full_data_name(config: Union[Dict | Path]): +def get_full_data_name(config: Dict | Path) -> str:
8-8
: Enhance function documentation.The docstring is too brief and doesn't specify parameters, return value, or potential exceptions.
- """Combine DATA_SOURCE and DATASET_NAME config variables into a single string.""" + """Combine DATA_SOURCE and DATASET_NAME config variables into a single string. + + Parameters + ---------- + config : Dict | Path + Either a dictionary containing dataset configuration with 'DATA_SOURCE' + and 'DATASET_NAME' keys, or a Path to a dataset configuration file. + + Returns + ------- + str + Combined data name in format: "{DATA_SOURCE}_{DATASET_NAME}" + + Raises + ------ + TypeError + If config is neither a Dict nor a Path. + ValueError + If required keys are missing from the configuration. + """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/readii/utils/__init__.py
(1 hunks)src/readii/utils/dataset_config.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/readii/utils/dataset_config.py (1)
src/readii/io/loaders/general.py (1)
loadImageDatasetConfig
(19-61)
src/readii/utils/__init__.py (1)
src/readii/utils/dataset_config.py (1)
get_full_data_name
(7-20)
🔇 Additional comments (1)
src/readii/utils/__init__.py (1)
2-2
: LGTM! Proper integration of new function.The import and export of the new
get_full_data_name
function is correctly implemented, following Python package conventions by adding it to the__all__
list.Also applies to: 6-6
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/readii/utils/dataset_config.py (2)
7-7
: Simplify the type hint syntax.The
Union[Dict | Path]
syntax is redundant. In Python 3.10+, you can use the simplerDict | Path
syntax directly.-def get_full_data_name(config: Union[Dict | Path]): +def get_full_data_name(config: Dict | Path):Additionally, consider being more specific about the Dict type:
-def get_full_data_name(config: Dict | Path): +def get_full_data_name(config: Dict[str, str] | Path):
8-8
: Enhance the docstring with parameter and return information.The current docstring is too brief. Consider adding parameter descriptions, return type, and usage examples for better documentation.
- """Combine DATA_SOURCE and DATASET_NAME config variables into a single string.""" + """Combine DATA_SOURCE and DATASET_NAME config variables into a single string. + + Parameters + ---------- + config : Dict[str, str] | Path + Either a dictionary containing dataset configuration with 'DATA_SOURCE' and + 'DATASET_NAME' keys, or a Path to a YAML configuration file. + + Returns + ------- + str + Combined data name in the format "{DATA_SOURCE}_{DATASET_NAME}". + + Raises + ------ + TypeError + If config is neither a Dict nor a Path. + ValueError + If required keys 'DATA_SOURCE' or 'DATASET_NAME' are missing from config. + """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pixi.lock
is excluded by!**/*.lock
📒 Files selected for processing (1)
src/readii/utils/dataset_config.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/readii/utils/dataset_config.py (1)
src/readii/io/loaders/general.py (1)
loadImageDatasetConfig
(19-61)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Unit-Tests (windows-latest, py312)
- GitHub Check: Unit-Tests (windows-latest, py311)
- GitHub Check: Unit-Tests (windows-latest, py310)
- GitHub Check: Unit-Tests (macos-14, py312)
- GitHub Check: Unit-Tests (macos-14, py310)
- GitHub Check: Unit-Tests (macos-latest, py312)
- GitHub Check: Unit-Tests (macos-latest, py310)
- GitHub Check: Unit-Tests (ubuntu-latest, py312)
🔇 Additional comments (2)
src/readii/utils/dataset_config.py (2)
1-5
: LGTM! Imports are properly structured.The imports are well-organized and include all necessary dependencies for the function implementation.
20-27
: Excellent error handling implementation!The error handling for missing dictionary keys is well-implemented with proper logging and exception chaining. This addresses the concerns from previous review comments effectively.
match type(config): | ||
case Path(): | ||
dataset_config = loadImageDatasetConfig(config.stem, config.parent) | ||
case Dict(): | ||
dataset_config = config | ||
case _: | ||
message = "Error getting full data name: config must be of type Path or Dict." | ||
logger.debug(message) | ||
raise TypeError(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify the assumption about Path naming convention.
The implementation assumes that config.stem
(filename without extension) corresponds to the dataset name expected by loadImageDatasetConfig
. This assumption should be documented or validated.
Consider adding validation or documentation about the expected file naming convention:
🏁 Script executed:
#!/bin/bash
# Description: Check if there are examples of how dataset config files are named
# to verify the assumption that filename stem equals dataset name
# Search for calls to loadImageDatasetConfig to understand naming patterns
ast-grep --pattern 'loadImageDatasetConfig($_, $_)'
# Look for YAML config files to understand naming conventions
fd -e yaml -e yml | head -10
Length of output: 552
Add validation and documentation for dataset config file naming
Please ensure that when passing a Path
to get_dataset_config
(in src/readii/utils/dataset_config.py
), the filename stem matches both the expected dataset name and its parent directory. Although tests use matching names (e.g. tests/4D-Lung/4D-Lung.yaml
→ stem “4D-Lung”), this assumption should be explicit and enforced.
• File: src/readii/utils/dataset_config.py
Lines: 10–18
Suggested changes:
match type(config):
case Path():
+ # Enforce filename stem == parent directory name == dataset name
+ if config.parent.name != config.stem:
+ raise ValueError(
+ f"Dataset config file stem '{config.stem}' must match its parent directory name "
+ f"'{config.parent.name}'"
+ )
dataset_config = loadImageDatasetConfig(config.stem, config.parent)
case Dict():
dataset_config = config
Additionally, update the function’s docstring (or README) to state:
When providing a file path, the config YAML filename (without extension) must match the dataset name and its enclosing directory.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
match type(config): | |
case Path(): | |
dataset_config = loadImageDatasetConfig(config.stem, config.parent) | |
case Dict(): | |
dataset_config = config | |
case _: | |
message = "Error getting full data name: config must be of type Path or Dict." | |
logger.debug(message) | |
raise TypeError(message) | |
match type(config): | |
case Path(): | |
# Enforce filename stem == parent directory name == dataset name | |
if config.parent.name != config.stem: | |
raise ValueError( | |
f"Dataset config file stem '{config.stem}' must match its parent directory name " | |
f"'{config.parent.name}'" | |
) | |
dataset_config = loadImageDatasetConfig(config.stem, config.parent) | |
case Dict(): | |
dataset_config = config | |
case _: | |
message = "Error getting full data name: config must be of type Path or Dict." | |
logger.debug(message) | |
raise TypeError(message) |
🤖 Prompt for AI Agents
In src/readii/utils/dataset_config.py around lines 10 to 18, the code assumes
that when a Path is passed as config, its filename stem matches the dataset name
and parent directory for loadImageDatasetConfig. To fix this, add validation to
check that the Path's stem equals its parent directory name and raise an error
if not. Also, update the function's docstring to explicitly document that the
config YAML filename (without extension) must match the dataset name and its
enclosing directory.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #127 +/- ##
==========================================
- Coverage 76.08% 75.51% -0.57%
==========================================
Files 41 42 +1
Lines 2003 2026 +23
==========================================
+ Hits 1524 1530 +6
- Misses 479 496 +17 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary by CodeRabbit