Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions pixi.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions src/readii/utils/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
from .logging_config import logger
from .dataset_config import get_full_data_name

__all__ = [
'logger',
'get_full_data_name'
]
27 changes: 27 additions & 0 deletions src/readii/utils/dataset_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
from pathlib import Path
from typing import Dict, Union

from readii.io.loaders import loadImageDatasetConfig
from readii.utils import logger

def get_full_data_name(config: Union[Dict | Path]):
"""Combine DATA_SOURCE and DATASET_NAME config variables into a single string."""

match type(config):
case Path():
dataset_config = loadImageDatasetConfig(config.stem, config.parent)
case Dict():
dataset_config = config
case _:
message = "Error getting full data name: config must be of type Path or Dict."
logger.debug(message)
raise TypeError(message)

Check warning on line 18 in src/readii/utils/dataset_config.py

View check run for this annotation

Codecov / codecov/patch

src/readii/utils/dataset_config.py#L10-L18

Added lines #L10 - L18 were not covered by tests
Comment on lines +10 to +18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify the assumption about Path naming convention.

The implementation assumes that config.stem (filename without extension) corresponds to the dataset name expected by loadImageDatasetConfig. This assumption should be documented or validated.

Consider adding validation or documentation about the expected file naming convention:


🏁 Script executed:

#!/bin/bash
# Description: Check if there are examples of how dataset config files are named
# to verify the assumption that filename stem equals dataset name

# Search for calls to loadImageDatasetConfig to understand naming patterns
ast-grep --pattern 'loadImageDatasetConfig($_, $_)'

# Look for YAML config files to understand naming conventions
fd -e yaml -e yml | head -10

Length of output: 552


Add validation and documentation for dataset config file naming

Please ensure that when passing a Path to get_dataset_config (in src/readii/utils/dataset_config.py), the filename stem matches both the expected dataset name and its parent directory. Although tests use matching names (e.g. tests/4D-Lung/4D-Lung.yaml → stem “4D-Lung”), this assumption should be explicit and enforced.

• File: src/readii/utils/dataset_config.py
Lines: 10–18

Suggested changes:

     match type(config):
         case Path():
+            # Enforce filename stem == parent directory name == dataset name
+            if config.parent.name != config.stem:
+                raise ValueError(
+                    f"Dataset config file stem '{config.stem}' must match its parent directory name "
+                    f"'{config.parent.name}'"
+                )
             dataset_config = loadImageDatasetConfig(config.stem, config.parent)
         case Dict():
             dataset_config = config

Additionally, update the function’s docstring (or README) to state:

When providing a file path, the config YAML filename (without extension) must match the dataset name and its enclosing directory.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
match type(config):
case Path():
dataset_config = loadImageDatasetConfig(config.stem, config.parent)
case Dict():
dataset_config = config
case _:
message = "Error getting full data name: config must be of type Path or Dict."
logger.debug(message)
raise TypeError(message)
match type(config):
case Path():
# Enforce filename stem == parent directory name == dataset name
if config.parent.name != config.stem:
raise ValueError(
f"Dataset config file stem '{config.stem}' must match its parent directory name "
f"'{config.parent.name}'"
)
dataset_config = loadImageDatasetConfig(config.stem, config.parent)
case Dict():
dataset_config = config
case _:
message = "Error getting full data name: config must be of type Path or Dict."
logger.debug(message)
raise TypeError(message)
🤖 Prompt for AI Agents
In src/readii/utils/dataset_config.py around lines 10 to 18, the code assumes
that when a Path is passed as config, its filename stem matches the dataset name
and parent directory for loadImageDatasetConfig. To fix this, add validation to
check that the Path's stem equals its parent directory name and raise an error
if not. Also, update the function's docstring to explicitly document that the
config YAML filename (without extension) must match the dataset name and its
enclosing directory.


try:
data_source = dataset_config['DATA_SOURCE']
dataset_name = dataset_config['DATASET_NAME']
return f"{data_source}_{dataset_name}"
except KeyError as e:
message = f"Missing required key in dataset configuration: {e}"
logger.error(message)
raise ValueError(message) from e

Check warning on line 27 in src/readii/utils/dataset_config.py

View check run for this annotation

Codecov / codecov/patch

src/readii/utils/dataset_config.py#L20-L27

Added lines #L20 - L27 were not covered by tests
Loading