-
Notifications
You must be signed in to change notification settings - Fork 0
feat: fix loaders some more #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces modifications to the Changes
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
tests/io/loaders/test_features.py (1)
8-9
: Consider making the test data path configurableThe hardcoded path might cause issues in different environments or CI/CD pipelines. Consider using a pytest fixture that gets the path from an environment variable or configuration file.
@pytest.fixture def extracted_feature_4D_lung(): - return Path("tests/4D-lung/results/features/") + test_data_dir = Path(os.getenv("TEST_DATA_DIR", "tests/4D-lung/results/features/")) + if not test_data_dir.exists(): + pytest.skip("Test data directory not found") + return test_data_dirsrc/readii/io/loaders/features.py (2)
50-50
: Consider using a more specific glob patternThe current glob pattern
*.csv
might include unrelated CSV files. Consider using a more specific pattern that matches your feature file naming convention.- feature_file_list = sorted(extracted_feature_dir.glob("*.csv")) + feature_file_list = sorted(extracted_feature_dir.glob("*_features.csv")) # adjust pattern to match your naming convention
Line range hint
1-15
: Add type hints for local variablesWhile function parameters have type hints, local variables like
feature_file_list
andmatching_files
lack them. Consider adding type hints for better code maintainability.+from typing import List + def loadFeatureFilesFromImageTypes(...): + feature_file_list: List[Path] = [] + matching_files: List[Path] = []
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pixi.lock
is excluded by!**/*.lock
📒 Files selected for processing (2)
src/readii/io/loaders/features.py
(2 hunks)tests/io/loaders/test_features.py
(1 hunks)
tests/io/loaders/test_features.py
Outdated
def test_loadFeatureFilesFromImageTypes(extracted_feature_4D_lung): | ||
feature_sets = loadFeatureFilesFromImageTypes(extracted_feature_4D_lung, image_types=["original"], drop_labels=False) | ||
assert isinstance(feature_sets, dict), "Return should be a dictionary." | ||
assert len(feature_sets) == 1, "Should only have one feature file per image type." | ||
assert "original" in feature_sets, "Not finding the specified image type feature file (original) or keys are not correctly named." No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Expand test coverage with additional test cases
The current test only covers the happy path with a single image type. Consider adding:
- Test with multiple image types
- Test with non-existent image types
- Test with drop_labels=True
- Test error conditions (invalid directory, invalid files)
Example additional test:
def test_loadFeatureFilesFromImageTypes_multiple_types(extracted_feature_4D_lung):
feature_sets = loadFeatureFilesFromImageTypes(
extracted_feature_4D_lung,
image_types=["original", "wavelet"],
drop_labels=True
)
assert len(feature_sets) == 2
assert all(key in feature_sets for key in ["original", "wavelet"])
def test_loadFeatureFilesFromImageTypes_invalid_type(extracted_feature_4D_lung):
feature_sets = loadFeatureFilesFromImageTypes(
extracted_feature_4D_lung,
image_types=["nonexistent"],
drop_labels=False
)
with pytest.raises(ValueError):
assert not feature_sets
feature_file_path = matching_files[0] | ||
# Remove the image type file from the list of feature files | ||
feature_file_list.remove(image_type_feature_file) | ||
feature_file_list.remove(feature_file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid modifying the feature_file_list during iteration
Modifying the list while processing it could lead to race conditions or missed files. Consider using a set or maintaining a separate list of processed files.
- feature_file_path = matching_files[0]
- # Remove the image type file from the list of feature files
- feature_file_list.remove(feature_file_path)
+ feature_file_path = matching_files[0]
+ # Keep track of processed files in a separate set
+ processed_files.add(feature_file_path)
Also, consider adding this at the beginning of the function:
processed_files = set()
except Exception as e: | ||
logger.warning(f"Error loading {image_type} feature csv files from {extracted_feature_dir}: {e}") | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error handling consistency
The current error handling logs a warning and then raises the exception. This is inconsistent with other error handling in the function. Consider either:
- Logging an error and raising a specific exception, or
- Logging a warning and continuing to the next iteration
- except Exception as e:
- logger.warning(f"Error loading {image_type} feature csv files from {extracted_feature_dir}: {e}")
- raise e
+ except Exception as e:
+ logger.error(f"Failed to process {image_type} feature files: {e}")
+ raise ValueError(f"Error processing {image_type} feature files: {e}") from e
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
except Exception as e: | |
logger.warning(f"Error loading {image_type} feature csv files from {extracted_feature_dir}: {e}") | |
raise e | |
except Exception as e: | |
logger.error(f"Failed to process {image_type} feature files: {e}") | |
raise ValueError(f"Error processing {image_type} feature files: {e}") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/io/loaders/test_features.py (2)
1-6
: Consider removing unused Path import if not needed for future testsThe
Path
import is currently unused in the code. However, if it's intended for upcoming tests as part of theos
topathlib
transition, consider adding a TODO comment explaining its future use.-from pathlib import Path +# TODO: Path import will be used in upcoming pathlib transition tests +from pathlib import Path🧰 Tools
🪛 Ruff (0.8.2)
5-5:
pathlib.Path
imported but unusedRemove unused import:
pathlib.Path
(F401)
9-19
: Enhance fixture robustness and maintainabilityWhile the fixture is well-structured with session scope and reproducible data generation, consider these improvements:
@pytest.fixture(scope="session") -def original_feature_file_path(tmp_path_factory): +def original_feature_file_path(tmp_path_factory: pytest.TempPathFactory) -> Path: # Create a 10x10 matrix with random float values between 0 and 1 random_matrix = np.random.default_rng(seed=10).random((10,10)) # Convert to dataframe and name the columns feature1, feature2, etc. random_feature_df = pd.DataFrame(random_matrix, columns=[f"feature_{i+1}" for i in range(10)]) # Save the dataframe to a temporary file feature_file_path = tmp_path_factory.mktemp("feature") / "features_original.csv" random_feature_df.to_csv(feature_file_path) + assert feature_file_path.exists(), "Failed to create feature file" return feature_file_path
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/io/loaders/test_features.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/io/loaders/test_features.py
5-5: pathlib.Path
imported but unused
Remove unused import: pathlib.Path
(F401)
🔇 Additional comments (2)
tests/io/loaders/test_features.py (2)
21-25
: Expand test coverage beyond the happy path
The current test only verifies basic dictionary structure but doesn't validate:
- The actual content of loaded features
- Error cases
- Multiple image types
- The
drop_labels
parameter behavior
The previous review comment suggesting additional test cases is still applicable. Would you like me to help implement the suggested test cases?
16-16
: Verify if the feature file naming pattern is documented
The test assumes a specific file naming pattern features_original.csv
. Let's verify if this pattern is documented or configurable.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #98 +/- ##
==========================================
+ Coverage 41.66% 43.76% +2.09%
==========================================
Files 33 33
Lines 1452 1451 -1
==========================================
+ Hits 605 635 +30
+ Misses 847 816 -31 ☔ View full report in Codecov by Sentry. |
Didn't test updating from os to pathlib enough, but should be good now. Added basic test, needs to be expanded upon.
Summary by CodeRabbit
New Features
Bug Fixes
Tests