-
Notifications
You must be signed in to change notification settings - Fork 16
Feature/cluster 1 #252
Feature/cluster 1 #252
Conversation
…or accuracy output
…nd improve structure
…, update tests for consistency
…d appending to documents
…for functionality
…, update LDA model methods; add tests for topic printing
…matting; update tests to validate output structure
…ng top documents by topic; update tests for new functionality
…and add corresponding tests
… modify plot method to use instance data if no DataFrame is provided
…izing document word counts by dominant topic; update tests accordingly
…gic; add topics fixture for testing
…; include corresponding test
…ion; reduce size and max words
…lusterDocs; enhance document processing capabilities
… visualize modules; enhance test readability in test files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds new test suites and updates core modules to enhance data visualization, file reading, neural network modeling with PyTorch, and clustering functionality.
- Added/updated tests for visualization, NLP and numerical output.
- Updated file reading to accept different input types and refactored ML modules to use PyTorch.
- Introduced clustering extensions and updated packaging configuration.
Reviewed Changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_visualize.py | Added tests for QRVisualize plotting functions. |
| tests/test_readfiles.py | Updated tests to use a single filename instead of a list. |
| tests/test_num.py | Adjusted assertion to match updated output capitalization. |
| tests/test_nlp.py | Minor improvements in fixtures and assertions. |
| test.py | Added a standalone test for spaCy based NLP processing. |
| src/qrmine/visualize.py | Implements various plotting methods using matplotlib and wordcloud; includes wordcloud function. |
| src/qrmine/readfiles.py | Refactored read_file function to support file, folder, and URL inputs. |
| src/qrmine/mlqrmine.py | Replaces Keras-based NN with PyTorch implementation for predictions and evaluation. |
| src/qrmine/content.py | Minor addition: exposing tokens property for filtering processed tokens. |
| src/qrmine/cluster.py | New file added for clustering operations using LDA and topic representations. |
| src/qrmine/init.py | Updated to include new modules for clustering and visualization. |
| pyproject.toml | Updated project metadata and dependencies. |
| notes/*.md | Added/updated documentation notes on pip-tools and conda environment setup. |
Files not reviewed (2)
- setup.cfg: Language not supported
- src/qrmine/resources/df_dominant_topic.csv: Language not supported
| # if input is a folder name | ||
| elif isinstance(input, str): | ||
| import os |
Copilot
AI
Apr 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type-check conditions in read_file are redundant since all inputs are checked as 'str'. Consider distinguishing file, folder, and URL cases with dedicated checks (e.g. os.path.isfile, os.path.isdir, or URL pattern matching) to ensure the correct branch is executed.
| # if input is a folder name | |
| elif isinstance(input, str): | |
| import os | |
| # Check if input is a folder | |
| elif os.path.isdir(input): |
| height=180, | ||
| max_words=5, | ||
| colormap="tab10", | ||
| color_func=lambda *args, **kwargs: cols[i], |
Copilot
AI
Apr 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lambda in the WordCloud instantiation captures 'i', which is undefined at that point. Capture the current index explicitly (e.g. using a default parameter like lambda *args, i=i, **kwargs: cols[i]) to fix the reference.
| color_func=lambda *args, **kwargs: cols[i], | |
| color_func=lambda *args, i=i, **kwargs: cols[i], |
…; upgrade filelock and jinja2 versions
…ate tox.ini to eliminate redundant dependencies
No description provided.