-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add CorrelationWriter #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including You can disable this status message by setting the WalkthroughThe pull request introduces a new Changes
Sequence DiagramsequenceDiagram
participant User
participant CorrelationWriter
participant DataFrame
participant FileSystem
User->>CorrelationWriter: save(correlation_df)
CorrelationWriter->>CorrelationWriter: validate input
alt Invalid DataFrame
CorrelationWriter-->>User: CorrelationWriterValidationError
else Valid DataFrame
CorrelationWriter->>FileSystem: check file existence
alt File exists and overwrite=False
CorrelationWriter-->>User: CorrelationWriterIOError
else File can be saved
CorrelationWriter->>FileSystem: save DataFrame
FileSystem-->>CorrelationWriter: file saved
CorrelationWriter-->>User: return file path
end
end
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/readii/io/writers/correlation_writer.py (2)
53-74
: Clarify the docstring to explicitly reflect support for both CSV and Excel formats.
The docstring for the save method states “Save the correlation dataframe to a .csv file.” Even though the code also supports “.xlsx,” the docstring might cause confusion.Consider updating the docstring to clearly indicate possible file extensions:
- """Save the correlation dataframe to a .csv file. + """Save the correlation dataframe to either .csv or .xlsx.
105-113
: Consider replacing match-case if Python 3.10 support is uncertain.
The match-case syntax requires Python 3.10+. If you anticipate older environments, an if-elif chain might be more compatible. Otherwise, this approach is fine.- match out_path.suffix: - case ".csv": - correlation_df.to_csv(out_path, index=True, index_label="") - case ".xlsx": - correlation_df.to_excel(out_path, index=True, index_label="") - case _: - ... + suffix = out_path.suffix + if suffix == ".csv": + correlation_df.to_csv(out_path, index=True, index_label="") + elif suffix == ".xlsx": + correlation_df.to_excel(out_path, index=True, index_label="") + else: + ...tests/io/writers/test_correlation_writer.py (2)
5-5
: Remove the unused import 'pathlib.Path'.
It’s imported but never referenced, generating a linter warning (F401).- from pathlib import Path
🧰 Tools
🪛 Ruff (0.8.2)
5-5:
pathlib.Path
imported but unusedRemove unused import:
pathlib.Path
(F401)
7-7
: Remove unused import CorrelationWriterError.
This import is not referenced in the test suite, generating a linter warning (F401).- from readii.io.writers.correlation_writer import CorrelationWriter, CorrelationWriterValidationError, CorrelationWriterError, CorrelationWriterIOError + from readii.io.writers.correlation_writer import CorrelationWriter, CorrelationWriterValidationError, CorrelationWriterIOError🧰 Tools
🪛 Ruff (0.8.2)
7-7:
readii.io.writers.correlation_writer.CorrelationWriterError
imported but unusedRemove unused import:
readii.io.writers.correlation_writer.CorrelationWriterError
(F401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/readii/io/writers/correlation_writer.py
(1 hunks)tests/io/writers/test_correlation_writer.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/io/writers/test_correlation_writer.py
5-5: pathlib.Path
imported but unused
Remove unused import: pathlib.Path
(F401)
7-7: readii.io.writers.correlation_writer.CorrelationWriterError
imported but unused
Remove unused import: readii.io.writers.correlation_writer.CorrelationWriterError
(F401)
🔇 Additional comments (1)
src/readii/io/writers/correlation_writer.py (1)
99-101
: Ensure correlation DataFrame columns and index checks align with domain requirements.
Currently, the method enforces that the columns and index must be identical, which often implies a square and symmetric correlation matrix. If there's a scenario where a user might want rectangular or partial correlation data, this strict check may be too restrictive.
Would you like to confirm that this enforcement is absolutely required in your domain, or should we make this validation more flexible?
be63b3d
to
7b64e94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
src/readii/io/writers/correlation_writer.py (4)
11-27
: Enhance exception docstrings with more detailsWhile the exception hierarchy is well-structured, the docstrings could be more descriptive by including:
- Specific scenarios when each exception is raised
- Example error messages
- How to handle or prevent these exceptions
30-30
: Update class docstring to match its purposeThe current docstring mentions "plot figure files" but this class handles correlation data files.
Suggested docstring:
"""Class for managing file writing with customizable paths and filenames for correlation data files."""
105-112
: Simplify file format handlingSince we validate file extensions in
__post_init__
, the match statement's default case is unreachable. Consider simplifying:- match out_path.suffix: - case ".csv": - correlation_df.to_csv(out_path, index=True, index_label="") - case ".xlsx": - correlation_df.to_excel(out_path, index=True, index_label="") - case _: - msg = f"Invalid file extension {out_path.suffix}. Must be one of {self.VALID_EXTENSIONS}." - raise CorrelationWriterValidationError(msg) + if out_path.suffix == ".csv": + correlation_df.to_csv(out_path, index=True, index_label="") + else: # must be .xlsx due to __post_init__ validation + correlation_df.to_excel(out_path, index=True, index_label="")
121-131
: Enhance example usage with actual correlation dataConsider adding a complete working example that demonstrates creating and saving correlation data:
if __name__ == "__main__": # pragma: no cover import numpy as np import pandas as pd # Create sample correlation data data = pd.DataFrame(np.random.rand(5, 5), columns=[f'feature_{i}' for i in range(5)]) corr_df = data.corr() writer = CorrelationWriter( root_directory=Path("examples"), filename_format="{DatasetName}_correlations.csv", overwrite=True, create_dirs=True ) # Save correlation matrix writer.save(corr_df, DatasetName="example")tests/io/writers/test_correlation_writer.py (4)
7-7
: Remove unused importThe
CorrelationWriterError
exception is imported but never used in the tests.-from readii.io.writers.correlation_writer import CorrelationWriter, CorrelationWriterValidationError, CorrelationWriterError, CorrelationWriterIOError +from readii.io.writers.correlation_writer import CorrelationWriter, CorrelationWriterValidationError, CorrelationWriterIOError🧰 Tools
🪛 Ruff (0.8.2)
7-7:
readii.io.writers.correlation_writer.CorrelationWriterError
imported but unusedRemove unused import:
readii.io.writers.correlation_writer.CorrelationWriterError
(F401)
9-17
: Consider adding more test cases to random_feature_correlations fixtureThe fixture could be parameterized to provide different types of correlation matrices:
@pytest.fixture(params=[10, 100]) # Test with different matrix sizes def random_feature_correlations(request): size = request.param random_matrix = np.random.default_rng(seed=10).random((size, size)) random_df = pd.DataFrame(random_matrix, columns=[f"feature_{i+1}" for i in range(size)]) return getFeatureCorrelations(random_df, random_df)
35-41
: Enhance validation in test_save_valid_correlationConsider adding more assertions to verify the saved correlation data:
def test_save_valid_correlation(corr_writer, request, correlation_df): """Test saving a valid correlation dataframe.""" correlation_df = request.getfixturevalue(correlation_df) out_path = corr_writer.save(correlation_df, CorrelationType="Pearson") assert out_path.exists() + # Verify saved content + saved_df = pd.read_csv(out_path, index_col=0) + pd.testing.assert_frame_equal(saved_df, correlation_df)
54-59
: Add tests for additional edge casesConsider adding tests for:
- Saving correlation matrices with non-numeric data
- Handling NaN values in correlation data
- Very large correlation matrices
- Invalid file paths
Example test case:
@pytest.mark.parametrize("invalid_data", [ pd.DataFrame({'A': ['a', 'b'], 'B': [1, 2]}), # non-numeric pd.DataFrame({'A': [1, np.nan], 'B': [np.nan, 2]}), # NaN values ]) def test_save_correlation_with_invalid_data(corr_writer, invalid_data): """Test saving correlation data with invalid values.""" with pytest.raises(CorrelationWriterValidationError): corr_writer.save(invalid_data, CorrelationType="Pearson")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pixi.lock
is excluded by!**/*.lock
📒 Files selected for processing (2)
src/readii/io/writers/correlation_writer.py
(1 hunks)tests/io/writers/test_correlation_writer.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/io/writers/test_correlation_writer.py
5-5: pathlib.Path
imported but unused
Remove unused import: pathlib.Path
(F401)
7-7: readii.io.writers.correlation_writer.CorrelationWriterError
imported but unused
Remove unused import: readii.io.writers.correlation_writer.CorrelationWriterError
(F401)
🔇 Additional comments (1)
src/readii/io/writers/correlation_writer.py (1)
45-52
: LGTM!
The post-initialization validation is well-implemented with proper error handling.
VALID_EXTENSIONS: ClassVar[list[str]] = ( | ||
".csv", | ||
".xlsx" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix type annotation for VALID_EXTENSIONS
The type hint list[str]
doesn't match the tuple literal. Either change the type hint to tuple[str, ...]
or convert the tuple to a list.
- VALID_EXTENSIONS: ClassVar[list[str]] = (
+ VALID_EXTENSIONS: ClassVar[tuple[str, ...]] = (
".csv",
".xlsx"
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
VALID_EXTENSIONS: ClassVar[list[str]] = ( | |
".csv", | |
".xlsx" | |
) | |
VALID_EXTENSIONS: ClassVar[tuple[str, ...]] = ( | |
".csv", | |
".xlsx" | |
) |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #96 +/- ##
==========================================
+ Coverage 40.31% 41.66% +1.35%
==========================================
Files 32 33 +1
Lines 1399 1452 +53
==========================================
+ Hits 564 605 +41
- Misses 835 847 +12 ☔ View full report in Codecov by Sentry. |
…on-writer' into katys/add-correlation-writer
Created CorrelationWriter class for the analyze portion of the pipeline.
I think there will eventually be a FeatureSetWriter that this should probably inherit from, but I need the Correlation one now for Aerts.
Summary by CodeRabbit
New Features
CorrelationWriter
class for managing the writing of correlation data to files with customizable paths and filenames.Bug Fixes
Tests
CorrelationWriter
class, covering various scenarios for saving correlation data.