-
Notifications
You must be signed in to change notification settings - Fork 4
Preprocessing #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Preprocessing #11
Conversation
You can change file pyproject.toml, correcting
|
dfd27e7
to
e0cff35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a preprocessing module to set up tasks for the EM algorithm, including component number estimators, classifiers, and random parameter initialization.
- Introduces new dependencies and utility types
- Implements
Preprocessing
orchestration andParameterization
- Adds AIC/BIC criteria, four component number methods, and four classifiers
Reviewed Changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 10 comments.
Show a summary per file
File | Description |
---|---|
pyproject.toml | Added kneed dependency required by Elbow method |
mpest/preprocessing/utils.py | Introduced Model union type for supported distributions |
mpest/preprocessing/preprocessing.py | Added Preprocessing class orchestrating EM task creation |
mpest/preprocessing/parameterization.py | Implemented random initialization of mixture parameters |
mpest/preprocessing/criterions/bayesian_criterion.py | Added BIC implementation |
mpest/preprocessing/criterions/akaike_criterion.py | Added AIC implementation |
mpest/preprocessing/criterions/abstract_criterion.py | Defined base logic for information criteria |
mpest/preprocessing/criterions/init.py | Exported criterion classes |
mpest/preprocessing/components_number/xmeans.py | Implemented X-Means estimator |
mpest/preprocessing/components_number/silhouette.py | Implemented Silhouette method estimator |
mpest/preprocessing/components_number/peaks.py | Implemented Peaks method estimator |
mpest/preprocessing/components_number/elbow.py | Implemented Elbow method estimator |
mpest/preprocessing/components_number/abstract_estimator.py | Defined abstract estimator interface |
mpest/preprocessing/components_number/init.py | Exported component number estimators |
mpest/preprocessing/components_classifier/kstest.py | Added KS-test based classifier |
mpest/preprocessing/components_classifier/icclassifier.py | Added IC-based classifier |
mpest/preprocessing/components_classifier/constant.py | Added constant classifier for edge cases |
mpest/preprocessing/components_classifier/assymetry.py | Added skewness-based classifier |
mpest/preprocessing/components_classifier/abstract_classifier.py | Defined abstract classifier interface |
mpest/preprocessing/components_classifier/init.py | Exported classifier classes |
Comments suppressed due to low confidence (3)
mpest/preprocessing/parameterization.py:14
- The method name
get_componet_parameters
has a typo; consider renaming toget_component_parameters
for clarity.
def get_componet_parameters(model: Model, samples: Samples) -> np.ndarray:
mpest/preprocessing/criterions/abstract_criterion.py:19
- The method name
_loglikehood
is misspelled; consider renaming to_loglikelihood
.
def _loglikehood(
mpest/preprocessing/components_classifier/assymetry.py:12
- The class name
Assymetry
is misspelled; consider renaming toAsymmetry
to match the standard term.
class Assymetry(AComponentsClassifier):
@@ -0,0 +1,5 @@ | |||
"""Module wich contains supported distributions""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the typo in the module docstring: replace "wich" with "which".
"""Module wich contains supported distributions""" | |
"""Module which contains supported distributions""" |
Copilot uses AI. Check for mistakes.
@@ -0,0 +1,38 @@ | |||
"""Module wich contains the method of task creation for EM algorithm""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the typo in the module docstring: replace "wich" with "which".
"""Module wich contains the method of task creation for EM algorithm""" | |
"""Module which contains the method of task creation for EM algorithm""" |
Copilot uses AI. Check for mistakes.
@@ -0,0 +1,45 @@ | |||
"""Module wich contains a method for random parameterization of distributions""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the typo in the module docstring: replace "wich" with "which".
"""Module wich contains a method for random parameterization of distributions""" | |
"""Module which contains a method for random parameterization of distributions""" |
Copilot uses AI. Check for mistakes.
|
||
@property | ||
def name(self): | ||
return "Bayesian_Infromation_Criterion" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the typo in the name property: replace "Infromation" with "Information".
return "Bayesian_Infromation_Criterion" | |
return "Bayesian_Information_Criterion" |
Copilot uses AI. Check for mistakes.
|
||
@property | ||
def name(self): | ||
return "Akaike_Infromation_Criterion" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the typo in the name property: replace "Infromation" with "Information".
return "Akaike_Infromation_Criterion" | |
return "Akaike_Information_Criterion" |
Copilot uses AI. Check for mistakes.
|
||
def estimate(self, samples: Samples) -> int: | ||
samples = samples.reshape(-1, 1) | ||
k_range = range(1, self.kmax + 2) # possible components: [2, kmax] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The comment incorrectly states the range; update it to reflect range(1, kmax+2)
or adjust code to match the intended range.
k_range = range(1, self.kmax + 2) # possible components: [2, kmax] | |
k_range = range(1, self.kmax + 2) # possible components: [1, kmax + 1] |
Copilot uses AI. Check for mistakes.
X-Means method | ||
----- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class docstring mentions "X-Means method" but this is the Constant Classifier; update the description accordingly.
X-Means method | |
----- | |
Constant Classifier | |
----- | |
This classifier always predicts the same model for all inputs. |
Copilot uses AI. Check for mistakes.
|
||
|
||
class Peaks(AComponentsNumber): | ||
"""Peaks Method with Emperical Density""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the typo: replace "Emperical" with "Empirical".
"""Peaks Method with Emperical Density""" | |
"""Peaks Method with Empirical Density""" |
Copilot uses AI. Check for mistakes.
"""TODO""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Replace the placeholder docstring with a brief module description.
"""TODO""" | |
"""This module provides various components and utilities for classifying and preprocessing data. | |
It includes abstract base classes, specific classifiers, and statistical tests such as: | |
- AComponentsClassifier: Abstract base class for component classifiers. | |
- Assymetry: Handles asymmetry-related classification tasks. | |
- CModel: Represents a constant model for classification. | |
- ICClassifier: Implements an information criterion-based classifier. | |
- KSTest: Performs the Kolmogorov-Smirnov test for statistical analysis. | |
""" |
Copilot uses AI. Check for mistakes.
@@ -0,0 +1,8 @@ | |||
"""Module which represents method estimating components number and abstract classe""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the typo: replace "classe" with "classes".
"""Module which represents method estimating components number and abstract classe""" | |
"""Module which represents method estimating components number and abstract classes""" |
Copilot uses AI. Check for mistakes.
Module for sample preprocessing for EM algorithm includes:
-> Elbow method
-> Sihouette method
-> Peaks method
-> XMeans method
-> Constant Classifier
-> Assymetry Classifier
-> KSTest Classifier
-> ICClassifier