-
Notifications
You must be signed in to change notification settings - Fork 0
Fix continuous RTSTRUCT index bug #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ze_readii_outputs
…mpy for seaborn install to work
…o plot them as a heatmap
…nds dataframe based on a column name
WalkthroughThis pull request introduces several changes to the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (10)
src/readii/analyze/correlation.py (6)
6-6: Remove unused importissymmetricThe
issymmetricfunction fromscipy.linalgis imported but not used in the code. Removing unused imports helps keep the code clean and maintainable.Apply this diff to remove the unused import:
-import pandas as pd -from typing import Optional -import matplotlib.pyplot as plt -import seaborn as sns -import numpy as np -from scipy.linalg import issymmetric +import pandas as pd +from typing import Optional +import matplotlib.pyplot as plt +import seaborn as sns +import numpy as np🧰 Tools
🪛 Ruff (0.8.0)
6-6:
scipy.linalg.issymmetricimported but unusedRemove unused import:
scipy.linalg.issymmetric(F401)
45-46: Formatifstatements according to PEP 8The
ifstatements on lines 45 and 46 have multiple statements on one line, which is against PEP 8 guidelines. Eachifstatement should be on its own line with an indented block to improve readability.Apply this diff to correct the formatting:
-if vertical_feature_name: vertical_feature_name = f"_{vertical_feature_name}" -if horizontal_feature_name: horizontal_feature_name = f"_{horizontal_feature_name}" +if vertical_feature_name: + vertical_feature_name = f"_{vertical_feature_name}" +if horizontal_feature_name: + horizontal_feature_name = f"_{horizontal_feature_name}"🧰 Tools
🪛 Ruff (0.8.0)
45-45: Multiple statements on one line (colon)
(E701)
46-46: Multiple statements on one line (colon)
(E701)
58-59: Chain exceptions to preserve tracebackWhen re-raising an exception inside an
exceptblock, it's good practice to useraise ... from eto maintain the original traceback and provide more context.Apply this diff to chain the exceptions:
except Exception as e: - raise ValueError(f"Error calculating correlation matrix: {e}") + raise ValueError(f"Error calculating correlation matrix: {e}") from e🧰 Tools
🪛 Ruff (0.8.0)
59-59: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
82-83: Correct typo in docstringThere's a typo in the docstring: "matrixto" should be "matrix to".
Apply this diff to fix the typo:
triangle : str, optional - Which triangle half of the matrixto plot. The default is "lower". + Which triangle half of the matrix to plot. The default is "lower".
112-112: Fix typo in commentThere's a typo in the comment: "visisble" should be "visible".
Apply this diff to correct the typo:
else: # The entire correlation matrix will be visisble in the plot - # The entire correlation matrix will be visisble in the plot + # The entire correlation matrix will be visible in the plot mask = None
44-44: Fix typo in commentThere's a typo in the comment: "beginnging" should be "beginning".
Apply this diff to fix the typo:
# Add _ to beginnging of feature names if they are not blank so they can be used as suffixes -# Add _ to beginnging of feature names if they are not blank so they can be used as suffixes +# Add _ to beginning of feature names if they are not blank so they can be used as suffixestests/test_correlation.py (1)
3-3: Remove unused importsThe following imports are unused and can be removed to clean up the code:
plotCorrelationHeatmapfromreadii.analyze.correlation(line 3)collectionsmodule (line 7)osmodule (line 9)Apply this diff to remove the unused imports:
from readii.analyze.correlation import ( getFeatureCorrelations, - plotCorrelationHeatmap, ) import pytest -import collections import pandas as pd -import osAlso applies to: 7-7, 9-9
🧰 Tools
🪛 Ruff (0.8.0)
3-3:
readii.analyze.correlation.plotCorrelationHeatmapimported but unusedRemove unused import:
readii.analyze.correlation.plotCorrelationHeatmap(F401)
src/readii/data/process.py (1)
37-39: Use logging instead of print statements for error messagesUsing the
loggingmodule instead ofApply this diff to use logging:
import pandas as pd from typing import Optional +import logging def dropUpToFeature(dataframe: DataFrame, feature_name: str, keep_feature_name_column: Optional[bool] = False): # ... [existing code] ... except KeyError: - print(f"Feature {feature_name} was not found as a column in dataframe. No columns dropped.") + logging.error(f"Feature {feature_name} was not found as a column in dataframe. No columns dropped.") return dataframeDon't forget to configure the logging level in your main application.
src/readii/io/loaders/general.py (1)
5-5: Remove unused importsThe imports
Optional,Dict, andUnionfromtypingare not used and can be removed to clean up the code.Apply this diff to remove the unused imports:
import os import pandas as pd import yaml -from typing import Optional, Dict, Union +from typing import AnyNote: Replace with
Anyif needed elsewhere.🧰 Tools
🪛 Ruff (0.8.0)
5-5:
typing.Optionalimported but unusedRemove unused import
(F401)
5-5:
typing.Dictimported but unusedRemove unused import
(F401)
5-5:
typing.Unionimported but unusedRemove unused import
(F401)
src/readii/io/loaders/features.py (1)
62-62: Remove unused variableeThe exception variable
eis assigned but never used in theexceptblock.Apply this diff to remove the unused variable:
except Exception as e: - print(f"{feature_file_path} does not have the labels {labels_to_drop} to drop.") + print(f"{feature_file_path} does not have the labels {labels_to_drop} to drop.")Alternatively, if you want to include the exception message:
except Exception as e: print(f"{feature_file_path} does not have the labels {labels_to_drop} to drop. Error: {e}")🧰 Tools
🪛 Ruff (0.8.0)
62-62: Local variable
eis assigned to but never usedRemove assignment to unused variable
e(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
pixi.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
pyproject.toml(2 hunks)src/readii/__init__.py(0 hunks)src/readii/analyze/correlation.py(1 hunks)src/readii/data/process.py(1 hunks)src/readii/io/loaders/features.py(1 hunks)src/readii/io/loaders/general.py(1 hunks)src/readii/loaders.py(1 hunks)tests/test_correlation.py(1 hunks)
💤 Files with no reviewable changes (1)
- src/readii/init.py
🧰 Additional context used
🪛 Ruff (0.8.0)
tests/test_correlation.py
3-3: readii.analyze.correlation.plotCorrelationHeatmap imported but unused
Remove unused import: readii.analyze.correlation.plotCorrelationHeatmap
(F401)
7-7: collections imported but unused
Remove unused import: collections
(F401)
9-9: os imported but unused
Remove unused import: os
(F401)
src/readii/io/loaders/general.py
5-5: typing.Optional imported but unused
Remove unused import
(F401)
5-5: typing.Dict imported but unused
Remove unused import
(F401)
5-5: typing.Union imported but unused
Remove unused import
(F401)
34-34: Use a context manager for opening files
(SIM115)
src/readii/analyze/correlation.py
6-6: scipy.linalg.issymmetric imported but unused
Remove unused import: scipy.linalg.issymmetric
(F401)
45-45: Multiple statements on one line (colon)
(E701)
46-46: Multiple statements on one line (colon)
(E701)
59-59: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
src/readii/io/loaders/features.py
10-10: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
12-12: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
62-62: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
🔇 Additional comments (2)
src/readii/loaders.py (1)
75-79: LGTM! The fix addresses the IndexError by preventing float indices.
The change to set continuous=False in StructureSetToSegmentation initialization is the correct approach to prevent float indices that were causing the IndexError with HNSCC data.
Let's verify if this change might affect other test cases:
✅ Verification successful
The continuous=False parameter change is safe to apply
Based on the codebase analysis:
- The
loadRTSTRUCTSITKfunction is only used in feature extraction and segmentation loading - Existing tests for RTSTRUCT loading in
test_loaders.pyverify the basic functionality without any specific index-related assertions - The change ensures integer indices which is safer for array indexing operations
- No other usage of
StructureSetToSegmentationexists outside ofloaders.py
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for test files that might be affected by this change
rg -l "loadRTSTRUCTSITK" tests/
Length of output: 64
Script:
#!/bin/bash
# Check the test file content to see how loadRTSTRUCTSITK is used
rg -A 10 "loadRTSTRUCTSITK" tests/test_feature_extraction.py
Length of output: 253
Script:
#!/bin/bash
# Look for any other usage of StructureSetToSegmentation in the codebase
rg -l "StructureSetToSegmentation" src/
Length of output: 61
Script:
#!/bin/bash
# Check if there are any other test files related to RTSTRUCT or segmentation
rg -l "RTSTRUCT|segmentation" tests/
Length of output: 237
Script:
#!/bin/bash
# Check test_loaders.py for any specific RTSTRUCT loading tests
rg -A 10 "RTSTRUCT|loadRTSTRUCTSITK" tests/test_loaders.py
Length of output: 1060
Script:
#!/bin/bash
# Check if there are any existing test cases that verify segmentation indices
rg -B 2 -A 5 "continuous|indices|index" tests/test_loaders.py
Length of output: 63
pyproject.toml (1)
35-36: Please clarify the need for new dependencies.
The addition of numpy and seaborn dependencies appears unrelated to the RTSTRUCT index bug fix. Could you explain:
- Why these dependencies are needed?
- Are they part of other changes not mentioned in the PR description?
Let's check if these dependencies are used in new files:
✅ Verification successful
Dependencies are needed for new correlation analysis functionality
The new dependencies are justified as they support newly added correlation analysis features in this PR:
numpyandseabornare used in the newsrc/readii/analyze/correlation.pymodule which implements correlation matrix calculations and heatmap visualizations- The PR appears to be larger in scope than just an RTSTRUCT bug fix, adding several new modules including correlation analysis functionality
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for numpy and seaborn usage
echo "Files using numpy:"
rg -l "import numpy"
echo "Files using seaborn:"
rg -l "import seaborn"
Length of output: 370
Script:
#!/bin/bash
# Check git diff for these files to see if they are part of this PR
git diff origin/main...HEAD -- src/readii/negative_controls.py src/readii/image_processing.py src/readii/analyze/correlation.py tests/test_negative_controls.py
# Also check if these are new files
git ls-files --others --exclude-standard -- src/readii/negative_controls.py src/readii/image_processing.py src/readii/analyze/correlation.py tests/test_negative_controls.py
Length of output: 6760
Script:
#!/bin/bash
# Check PR description and commits for context
git log --format=%B origin/main...HEAD
# Check other new files in this PR
git diff --name-status origin/main...HEAD
Length of output: 483
| @pytest.mark.parametrize( | ||
| "features", | ||
| [ | ||
| ("nsclc_radiomic_features"), | ||
| ("lung4D_radiomic_features") | ||
| ], | ||
| "correlation_method", | ||
| [ | ||
| ("pearson"), | ||
| ("spearman"), | ||
| ("random") | ||
| ] | ||
| ) | ||
| def test_getFeatureCorrelations(features, correlation_method, request): | ||
| """Test getting correlation matrix for a set of features""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct parameterization in pytest.mark.parametrize
The current usage of @pytest.mark.parametrize is incorrect, which may lead to test failures.
Here's how you can fix the parameterization:
-@pytest.mark.parametrize(
- "features",
- [
- ("nsclc_radiomic_features"),
- ("lung4D_radiomic_features")
- ],
- "correlation_method",
- [
- ("pearson"),
- ("spearman"),
- ("random")
- ]
-)
+@pytest.mark.parametrize("features", ["nsclc_radiomic_features", "lung4D_radiomic_features"])
+@pytest.mark.parametrize("correlation_method", ["pearson", "spearman", "random"])Adjust the test function accordingly:
def test_getFeatureCorrelations(features, correlation_method, request):
"""Test getting correlation matrix for a set of features"""
features = request.getfixturevalue(features)
- correlation_method = request.getfixturevalue(correlation_method)Since correlation_method is a direct parameter, you don't need to use request.getfixturevalue for it.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.parametrize( | |
| "features", | |
| [ | |
| ("nsclc_radiomic_features"), | |
| ("lung4D_radiomic_features") | |
| ], | |
| "correlation_method", | |
| [ | |
| ("pearson"), | |
| ("spearman"), | |
| ("random") | |
| ] | |
| ) | |
| def test_getFeatureCorrelations(features, correlation_method, request): | |
| """Test getting correlation matrix for a set of features""" | |
| @pytest.mark.parametrize("features", ["nsclc_radiomic_features", "lung4D_radiomic_features"]) | |
| @pytest.mark.parametrize("correlation_method", ["pearson", "spearman", "random"]) | |
| def test_getFeatureCorrelations(features, correlation_method, request): | |
| """Test getting correlation matrix for a set of features""" | |
| features = request.getfixturevalue(features) |
| ("pearson"), | ||
| ("spearman"), | ||
| ("random") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle expected exceptions for invalid correlation methods
The correlation method "random" is not valid and will raise a ValueError. The test should expect this exception to ensure proper error handling.
Modify the test function to handle invalid methods:
def test_getFeatureCorrelations(features, correlation_method, request):
"""Test getting correlation matrix for a set of features"""
features = request.getfixturevalue(features)
if correlation_method in ["pearson", "spearman", "kendall"]:
actual = getFeatureCorrelations(vertical_features=features,
horizontal_features=features,
method=correlation_method)
assert isinstance(actual, pd.DataFrame), \
"Wrong return type, expect a pandas DataFrame"
assert actual.shape[0] == 2 * features.shape[1], \
"Wrong return size, should be the same as the number of features"
else:
with pytest.raises(ValueError):
getFeatureCorrelations(vertical_features=features,
horizontal_features=features,
method=correlation_method)This ensures that the test suite properly verifies both valid and invalid input scenarios.
Committable suggestion skipped: line range outside the PR's diff.
| except Exception as e: | ||
| print(f"An error occurred: {e}") | ||
| return None No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider removing general except clause or provide specific handling
Catching a general exception and returning None can obscure unexpected errors and make debugging difficult.
Apply this diff to remove the general except clause or handle specific exceptions:
except KeyError:
print(f"Feature {feature_name} was not found as a column in dataframe. No columns dropped.")
return dataframe
- except Exception as e:
- print(f"An error occurred: {e}")
- return NoneAlternatively, specify the exceptions you expect:
except KeyError:
print(f"Feature {feature_name} was not found as a column in dataframe. No columns dropped.")
return dataframe
except SomeSpecificException as e:
print(f"An error occurred: {e}")
return NoneCommittable suggestion skipped: line range outside the PR's diff.
| # Check if config file exists | ||
| if os.path.exists(config_file_path): | ||
| # Load the config file | ||
| config = yaml.safe_load(open(config_file_path, "r")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use context manager when opening files
It's recommended to use a context manager (with statement) when opening files to ensure they are properly closed.
Apply this diff to use a context manager:
# Load the config file
-config = yaml.safe_load(open(config_file_path, "r"))
+with open(config_file_path, "r") as file:
+ config = yaml.safe_load(file)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| config = yaml.safe_load(open(config_file_path, "r")) | |
| with open(config_file_path, "r") as file: | |
| config = yaml.safe_load(file) |
🧰 Tools
🪛 Ruff (0.8.0)
34-34: Use a context manager for opening files
(SIM115)
|
|
||
|
|
||
| def loadFeatureFilesFromImageTypes(extracted_feature_dir:str, | ||
| image_types:Optional[list]=['original'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid mutable default arguments
Using mutable default arguments like lists can lead to unexpected behavior because default arguments are evaluated only once at function definition time.
Change the default arguments to None and initialize within the function:
def loadFeatureFilesFromImageTypes(extracted_feature_dir: str,
- image_types: Optional[list] = ['original'],
- drop_labels: Optional[bool] = True,
- labels_to_drop: Optional[list] = ["patient_ID", "survival_time_in_years", "survival_event_binary"]) -> Dict[str, pd.DataFrame]:
+ image_types: Optional[list] = None,
+ drop_labels: Optional[bool] = True,
+ labels_to_drop: Optional[list] = None) -> Dict[str, pd.DataFrame]:
"""Function to load in all the extracted imaging feature sets from a directory and return them as a dictionary of dataframes."""
+ if image_types is None:
+ image_types = ['original']
+ if labels_to_drop is None:
+ labels_to_drop = ["patient_ID", "survival_time_in_years", "survival_event_binary"]Also applies to: 12-12
🧰 Tools
🪛 Ruff (0.8.0)
10-10: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
HNSCC data was generating the following error:
IndexError: [-65.00013333] index is out of bounds for image sized (123, 512, 512, 1).This was caused by the
med-imagetoolsclassStructureSettoSegmentationinitialization setting thecontinuousargument toTrue, which allowed for the RTSTRUCT physical points to be converted into indices that were floats instead of integers.Setting
continuous = FalseinloadRTSTRUCTSITKshould solve this issue.Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests