Skip to content

Conversation

strixy16
Copy link
Collaborator

@strixy16 strixy16 commented Dec 6, 2024

HNSCC data was generating the following error:
IndexError: [-65.00013333] index is out of bounds for image sized (123, 512, 512, 1).

This was caused by the med-imagetools class StructureSettoSegmentation initialization setting the continuous argument to True, which allowed for the RTSTRUCT physical points to be converted into indices that were floats instead of integers.

Setting continuous = False in loadRTSTRUCTSITK should solve this issue.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced new functions for calculating feature correlations and visualizing them as heatmaps.
    • Added functionality to load imaging feature sets from a specified directory.
    • Implemented a method to manipulate DataFrames by dropping columns up to a specified feature.
    • Added support for loading dataset configurations from YAML files.
  • Bug Fixes

    • Enhanced the segmentation processing in the loadRTSTRUCTSITK function for improved index handling.
  • Tests

    • Added unit tests for correlation analysis functions to ensure accuracy and reliability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 6, 2024

Walkthrough

This pull request introduces several changes to the readii project, primarily adding new dependencies to the pyproject.toml file and introducing new functions across multiple modules. The dependencies numpy and seaborn are added to enhance functionality. Additionally, new functions for analyzing feature correlations and processing data are implemented, along with corresponding unit tests to validate these functionalities. Minor formatting adjustments are also made in existing files without altering their logic.

Changes

File Path Change Summary
pyproject.toml Added dependencies: numpy = "1.26.4.*" and seaborn = ">=0.13.2,<0.14" under [tool.pixi.dependencies].
src/readii/__init__.py Removed a blank line after the version declaration (__version__ = "1.18.0").
src/readii/analyze/correlation.py Added functions: getFeatureCorrelations and plotCorrelationHeatmap for correlation analysis.
src/readii/data/process.py Added function: dropUpToFeature for DataFrame manipulation.
src/readii/io/loaders/features.py Added function: loadFeatureFilesFromImageTypes to load imaging feature sets from a directory.
src/readii/io/loaders/general.py Added functions: loadImageDatasetConfig and loadFileToDataFrame for configuration and data loading.
src/readii/loaders.py Updated loadRTSTRUCTSITK function to include a new parameter continuous=False.
tests/test_correlation.py Introduced unit tests for correlation analysis, including fixtures for test data.

Possibly related PRs

🐇 In the meadow where bunnies play,
New tools have come to brighten the day.
With numpy and seaborn, our data will dance,
Correlations found, giving insights a chance.
Let's hop with joy, for the code is now bright,
Analyzing features, oh what a delight! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (10)
src/readii/analyze/correlation.py (6)

6-6: Remove unused import issymmetric

The issymmetric function from scipy.linalg is imported but not used in the code. Removing unused imports helps keep the code clean and maintainable.

Apply this diff to remove the unused import:

-import pandas as pd
-from typing import Optional
-import matplotlib.pyplot as plt
-import seaborn as sns
-import numpy as np
-from scipy.linalg import issymmetric

+import pandas as pd
+from typing import Optional
+import matplotlib.pyplot as plt
+import seaborn as sns
+import numpy as np
🧰 Tools
🪛 Ruff (0.8.0)

6-6: scipy.linalg.issymmetric imported but unused

Remove unused import: scipy.linalg.issymmetric

(F401)


45-46: Format if statements according to PEP 8

The if statements on lines 45 and 46 have multiple statements on one line, which is against PEP 8 guidelines. Each if statement should be on its own line with an indented block to improve readability.

Apply this diff to correct the formatting:

-if vertical_feature_name: vertical_feature_name = f"_{vertical_feature_name}"
-if horizontal_feature_name: horizontal_feature_name = f"_{horizontal_feature_name}"
+if vertical_feature_name:
+    vertical_feature_name = f"_{vertical_feature_name}"
+if horizontal_feature_name:
+    horizontal_feature_name = f"_{horizontal_feature_name}"
🧰 Tools
🪛 Ruff (0.8.0)

45-45: Multiple statements on one line (colon)

(E701)


46-46: Multiple statements on one line (colon)

(E701)


58-59: Chain exceptions to preserve traceback

When re-raising an exception inside an except block, it's good practice to use raise ... from e to maintain the original traceback and provide more context.

Apply this diff to chain the exceptions:

 except Exception as e:
-    raise ValueError(f"Error calculating correlation matrix: {e}")
+    raise ValueError(f"Error calculating correlation matrix: {e}") from e
🧰 Tools
🪛 Ruff (0.8.0)

59-59: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


82-83: Correct typo in docstring

There's a typo in the docstring: "matrixto" should be "matrix to".

Apply this diff to fix the typo:

         triangle : str, optional
-            Which triangle half of the matrixto plot. The default is "lower".
+            Which triangle half of the matrix to plot. The default is "lower".

112-112: Fix typo in comment

There's a typo in the comment: "visisble" should be "visible".

Apply this diff to correct the typo:

         else:
             # The entire correlation matrix will be visisble in the plot
-            # The entire correlation matrix will be visisble in the plot
+            # The entire correlation matrix will be visible in the plot
             mask = None

44-44: Fix typo in comment

There's a typo in the comment: "beginnging" should be "beginning".

Apply this diff to fix the typo:

 # Add _ to beginnging of feature names if they are not blank so they can be used as suffixes
-# Add _ to beginnging of feature names if they are not blank so they can be used as suffixes
+# Add _ to beginning of feature names if they are not blank so they can be used as suffixes
tests/test_correlation.py (1)

3-3: Remove unused imports

The following imports are unused and can be removed to clean up the code:

  • plotCorrelationHeatmap from readii.analyze.correlation (line 3)
  • collections module (line 7)
  • os module (line 9)

Apply this diff to remove the unused imports:

 from readii.analyze.correlation import (
     getFeatureCorrelations,
-    plotCorrelationHeatmap,
 )

 import pytest
-import collections
 import pandas as pd
-import os 

Also applies to: 7-7, 9-9

🧰 Tools
🪛 Ruff (0.8.0)

3-3: readii.analyze.correlation.plotCorrelationHeatmap imported but unused

Remove unused import: readii.analyze.correlation.plotCorrelationHeatmap

(F401)

src/readii/data/process.py (1)

37-39: Use logging instead of print statements for error messages

Using the logging module instead of print statements provides better control over how messages are displayed and recorded.

Apply this diff to use logging:

 import pandas as pd
 from typing import Optional
+import logging

 def dropUpToFeature(dataframe: DataFrame,
                     feature_name: str,
                     keep_feature_name_column: Optional[bool] = False):
     # ... [existing code] ...
     except KeyError:
-        print(f"Feature {feature_name} was not found as a column in dataframe. No columns dropped.")
+        logging.error(f"Feature {feature_name} was not found as a column in dataframe. No columns dropped.")
         return dataframe

Don't forget to configure the logging level in your main application.

src/readii/io/loaders/general.py (1)

5-5: Remove unused imports

The imports Optional, Dict, and Union from typing are not used and can be removed to clean up the code.

Apply this diff to remove the unused imports:

 import os
 import pandas as pd 
 import yaml

-from typing import Optional, Dict, Union
+from typing import Any

Note: Replace with Any if needed elsewhere.

🧰 Tools
🪛 Ruff (0.8.0)

5-5: typing.Optional imported but unused

Remove unused import

(F401)


5-5: typing.Dict imported but unused

Remove unused import

(F401)


5-5: typing.Union imported but unused

Remove unused import

(F401)

src/readii/io/loaders/features.py (1)

62-62: Remove unused variable e

The exception variable e is assigned but never used in the except block.

Apply this diff to remove the unused variable:

 except Exception as e:
-    print(f"{feature_file_path} does not have the labels {labels_to_drop} to drop.")
+    print(f"{feature_file_path} does not have the labels {labels_to_drop} to drop.")

Alternatively, if you want to include the exception message:

 except Exception as e:
     print(f"{feature_file_path} does not have the labels {labels_to_drop} to drop. Error: {e}")
🧰 Tools
🪛 Ruff (0.8.0)

62-62: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3bcaa28 and fe56257.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • pyproject.toml (2 hunks)
  • src/readii/__init__.py (0 hunks)
  • src/readii/analyze/correlation.py (1 hunks)
  • src/readii/data/process.py (1 hunks)
  • src/readii/io/loaders/features.py (1 hunks)
  • src/readii/io/loaders/general.py (1 hunks)
  • src/readii/loaders.py (1 hunks)
  • tests/test_correlation.py (1 hunks)
💤 Files with no reviewable changes (1)
  • src/readii/init.py
🧰 Additional context used
🪛 Ruff (0.8.0)
tests/test_correlation.py

3-3: readii.analyze.correlation.plotCorrelationHeatmap imported but unused

Remove unused import: readii.analyze.correlation.plotCorrelationHeatmap

(F401)


7-7: collections imported but unused

Remove unused import: collections

(F401)


9-9: os imported but unused

Remove unused import: os

(F401)

src/readii/io/loaders/general.py

5-5: typing.Optional imported but unused

Remove unused import

(F401)


5-5: typing.Dict imported but unused

Remove unused import

(F401)


5-5: typing.Union imported but unused

Remove unused import

(F401)


34-34: Use a context manager for opening files

(SIM115)

src/readii/analyze/correlation.py

6-6: scipy.linalg.issymmetric imported but unused

Remove unused import: scipy.linalg.issymmetric

(F401)


45-45: Multiple statements on one line (colon)

(E701)


46-46: Multiple statements on one line (colon)

(E701)


59-59: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

src/readii/io/loaders/features.py

10-10: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


12-12: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


62-62: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🔇 Additional comments (2)
src/readii/loaders.py (1)

75-79: LGTM! The fix addresses the IndexError by preventing float indices.

The change to set continuous=False in StructureSetToSegmentation initialization is the correct approach to prevent float indices that were causing the IndexError with HNSCC data.

Let's verify if this change might affect other test cases:

✅ Verification successful

The continuous=False parameter change is safe to apply

Based on the codebase analysis:

  • The loadRTSTRUCTSITK function is only used in feature extraction and segmentation loading
  • Existing tests for RTSTRUCT loading in test_loaders.py verify the basic functionality without any specific index-related assertions
  • The change ensures integer indices which is safer for array indexing operations
  • No other usage of StructureSetToSegmentation exists outside of loaders.py
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for test files that might be affected by this change
rg -l "loadRTSTRUCTSITK" tests/

Length of output: 64


Script:

#!/bin/bash
# Check the test file content to see how loadRTSTRUCTSITK is used
rg -A 10 "loadRTSTRUCTSITK" tests/test_feature_extraction.py

Length of output: 253


Script:

#!/bin/bash
# Look for any other usage of StructureSetToSegmentation in the codebase
rg -l "StructureSetToSegmentation" src/

Length of output: 61


Script:

#!/bin/bash
# Check if there are any other test files related to RTSTRUCT or segmentation
rg -l "RTSTRUCT|segmentation" tests/

Length of output: 237


Script:

#!/bin/bash
# Check test_loaders.py for any specific RTSTRUCT loading tests
rg -A 10 "RTSTRUCT|loadRTSTRUCTSITK" tests/test_loaders.py

Length of output: 1060


Script:

#!/bin/bash
# Check if there are any existing test cases that verify segmentation indices
rg -B 2 -A 5 "continuous|indices|index" tests/test_loaders.py

Length of output: 63

pyproject.toml (1)

35-36: Please clarify the need for new dependencies.

The addition of numpy and seaborn dependencies appears unrelated to the RTSTRUCT index bug fix. Could you explain:

  1. Why these dependencies are needed?
  2. Are they part of other changes not mentioned in the PR description?

Let's check if these dependencies are used in new files:

✅ Verification successful

Dependencies are needed for new correlation analysis functionality

The new dependencies are justified as they support newly added correlation analysis features in this PR:

  • numpy and seaborn are used in the new src/readii/analyze/correlation.py module which implements correlation matrix calculations and heatmap visualizations
  • The PR appears to be larger in scope than just an RTSTRUCT bug fix, adding several new modules including correlation analysis functionality
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for numpy and seaborn usage
echo "Files using numpy:"
rg -l "import numpy" 
echo "Files using seaborn:"
rg -l "import seaborn"

Length of output: 370


Script:

#!/bin/bash
# Check git diff for these files to see if they are part of this PR
git diff origin/main...HEAD -- src/readii/negative_controls.py src/readii/image_processing.py src/readii/analyze/correlation.py tests/test_negative_controls.py

# Also check if these are new files
git ls-files --others --exclude-standard -- src/readii/negative_controls.py src/readii/image_processing.py src/readii/analyze/correlation.py tests/test_negative_controls.py

Length of output: 6760


Script:

#!/bin/bash
# Check PR description and commits for context
git log --format=%B origin/main...HEAD

# Check other new files in this PR
git diff --name-status origin/main...HEAD

Length of output: 483

Comment on lines +20 to +34
@pytest.mark.parametrize(
"features",
[
("nsclc_radiomic_features"),
("lung4D_radiomic_features")
],
"correlation_method",
[
("pearson"),
("spearman"),
("random")
]
)
def test_getFeatureCorrelations(features, correlation_method, request):
"""Test getting correlation matrix for a set of features"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct parameterization in pytest.mark.parametrize

The current usage of @pytest.mark.parametrize is incorrect, which may lead to test failures.

Here's how you can fix the parameterization:

-@pytest.mark.parametrize(
-        "features",
-        [
-            ("nsclc_radiomic_features"),
-            ("lung4D_radiomic_features")
-        ],
-        "correlation_method",
-        [
-            ("pearson"),
-            ("spearman"),
-            ("random")
-        ]
-)
+@pytest.mark.parametrize("features", ["nsclc_radiomic_features", "lung4D_radiomic_features"])
+@pytest.mark.parametrize("correlation_method", ["pearson", "spearman", "random"])

Adjust the test function accordingly:

 def test_getFeatureCorrelations(features, correlation_method, request):
     """Test getting correlation matrix for a set of features"""
     features = request.getfixturevalue(features)
-    correlation_method = request.getfixturevalue(correlation_method)

Since correlation_method is a direct parameter, you don't need to use request.getfixturevalue for it.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.parametrize(
"features",
[
("nsclc_radiomic_features"),
("lung4D_radiomic_features")
],
"correlation_method",
[
("pearson"),
("spearman"),
("random")
]
)
def test_getFeatureCorrelations(features, correlation_method, request):
"""Test getting correlation matrix for a set of features"""
@pytest.mark.parametrize("features", ["nsclc_radiomic_features", "lung4D_radiomic_features"])
@pytest.mark.parametrize("correlation_method", ["pearson", "spearman", "random"])
def test_getFeatureCorrelations(features, correlation_method, request):
"""Test getting correlation matrix for a set of features"""
features = request.getfixturevalue(features)

Comment on lines +28 to +30
("pearson"),
("spearman"),
("random")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle expected exceptions for invalid correlation methods

The correlation method "random" is not valid and will raise a ValueError. The test should expect this exception to ensure proper error handling.

Modify the test function to handle invalid methods:

 def test_getFeatureCorrelations(features, correlation_method, request):
     """Test getting correlation matrix for a set of features"""
     features = request.getfixturevalue(features)
     if correlation_method in ["pearson", "spearman", "kendall"]:
         actual = getFeatureCorrelations(vertical_features=features,
                                         horizontal_features=features,
                                         method=correlation_method)
         assert isinstance(actual, pd.DataFrame), \
             "Wrong return type, expect a pandas DataFrame"
         assert actual.shape[0] == 2 * features.shape[1], \
             "Wrong return size, should be the same as the number of features"
     else:
         with pytest.raises(ValueError):
             getFeatureCorrelations(vertical_features=features,
                                    horizontal_features=features,
                                    method=correlation_method)

This ensures that the test suite properly verifies both valid and invalid input scenarios.

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +41 to +43
except Exception as e:
print(f"An error occurred: {e}")
return None No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider removing general except clause or provide specific handling

Catching a general exception and returning None can obscure unexpected errors and make debugging difficult.

Apply this diff to remove the general except clause or handle specific exceptions:

         except KeyError:
             print(f"Feature {feature_name} was not found as a column in dataframe. No columns dropped.")
             return dataframe
-        except Exception as e:
-            print(f"An error occurred: {e}")
-            return None

Alternatively, specify the exceptions you expect:

         except KeyError:
             print(f"Feature {feature_name} was not found as a column in dataframe. No columns dropped.")
             return dataframe
         except SomeSpecificException as e:
             print(f"An error occurred: {e}")
             return None

Committable suggestion skipped: line range outside the PR's diff.

# Check if config file exists
if os.path.exists(config_file_path):
# Load the config file
config = yaml.safe_load(open(config_file_path, "r"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use context manager when opening files

It's recommended to use a context manager (with statement) when opening files to ensure they are properly closed.

Apply this diff to use a context manager:

 # Load the config file
-config = yaml.safe_load(open(config_file_path, "r"))
+with open(config_file_path, "r") as file:
+    config = yaml.safe_load(file)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
config = yaml.safe_load(open(config_file_path, "r"))
with open(config_file_path, "r") as file:
config = yaml.safe_load(file)
🧰 Tools
🪛 Ruff (0.8.0)

34-34: Use a context manager for opening files

(SIM115)



def loadFeatureFilesFromImageTypes(extracted_feature_dir:str,
image_types:Optional[list]=['original'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid mutable default arguments

Using mutable default arguments like lists can lead to unexpected behavior because default arguments are evaluated only once at function definition time.

Change the default arguments to None and initialize within the function:

 def loadFeatureFilesFromImageTypes(extracted_feature_dir: str,
-                                   image_types: Optional[list] = ['original'], 
-                                   drop_labels: Optional[bool] = True, 
-                                   labels_to_drop: Optional[list] = ["patient_ID", "survival_time_in_years", "survival_event_binary"]) -> Dict[str, pd.DataFrame]:
+                                   image_types: Optional[list] = None, 
+                                   drop_labels: Optional[bool] = True, 
+                                   labels_to_drop: Optional[list] = None) -> Dict[str, pd.DataFrame]:
     """Function to load in all the extracted imaging feature sets from a directory and return them as a dictionary of dataframes."""
+    if image_types is None:
+        image_types = ['original']
+    if labels_to_drop is None:
+        labels_to_drop = ["patient_ID", "survival_time_in_years", "survival_event_binary"]

Also applies to: 12-12

🧰 Tools
🪛 Ruff (0.8.0)

10-10: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

@strixy16 strixy16 closed this Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

1 participant