Skip to content

Preprocessing #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Preprocessing #11

wants to merge 14 commits into from

Conversation

Mmorgenn
Copy link

@Mmorgenn Mmorgenn commented Dec 21, 2024

Module for sample preprocessing for EM algorithm includes:

  • Methods for estimating the number of components of a mixture:

-> Elbow method
-> Sihouette method
-> Peaks method
-> XMeans method

  • Mixture component classifiers:

-> Constant Classifier
-> Assymetry Classifier
-> KSTest Classifier
-> ICClassifier

  • Information criterion (AIC, BIC)
  • Random initialisation of parameters of the mixture of distributions

@Mmorgenn Mmorgenn changed the title Components number estimators сomponents number estimators Dec 21, 2024
@Mmorgenn Mmorgenn changed the title сomponents number estimators Сomponents number estimators Dec 21, 2024
@toxakaz
Copy link
Collaborator

toxakaz commented Jan 10, 2025

You can change file pyproject.toml, correcting [tool.pylint] section to disable too-many-positional-arguments pylint warnings. It will be done in future anyway.

[tool.pylint]
disable = "too-few-public-methods, too-many-locals, line-too-long, too-many-positional-arguments, fixme"

@Mmorgenn Mmorgenn force-pushed the components_number branch from dfd27e7 to e0cff35 Compare April 4, 2025 15:50
@Mmorgenn Mmorgenn changed the title Сomponents number estimators Preprocessing May 12, 2025
@iraedeus iraedeus requested review from Copilot and iraedeus May 25, 2025 04:42
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a preprocessing module to set up tasks for the EM algorithm, including component number estimators, classifiers, and random parameter initialization.

  • Introduces new dependencies and utility types
  • Implements Preprocessing orchestration and Parameterization
  • Adds AIC/BIC criteria, four component number methods, and four classifiers

Reviewed Changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
pyproject.toml Added kneed dependency required by Elbow method
mpest/preprocessing/utils.py Introduced Model union type for supported distributions
mpest/preprocessing/preprocessing.py Added Preprocessing class orchestrating EM task creation
mpest/preprocessing/parameterization.py Implemented random initialization of mixture parameters
mpest/preprocessing/criterions/bayesian_criterion.py Added BIC implementation
mpest/preprocessing/criterions/akaike_criterion.py Added AIC implementation
mpest/preprocessing/criterions/abstract_criterion.py Defined base logic for information criteria
mpest/preprocessing/criterions/init.py Exported criterion classes
mpest/preprocessing/components_number/xmeans.py Implemented X-Means estimator
mpest/preprocessing/components_number/silhouette.py Implemented Silhouette method estimator
mpest/preprocessing/components_number/peaks.py Implemented Peaks method estimator
mpest/preprocessing/components_number/elbow.py Implemented Elbow method estimator
mpest/preprocessing/components_number/abstract_estimator.py Defined abstract estimator interface
mpest/preprocessing/components_number/init.py Exported component number estimators
mpest/preprocessing/components_classifier/kstest.py Added KS-test based classifier
mpest/preprocessing/components_classifier/icclassifier.py Added IC-based classifier
mpest/preprocessing/components_classifier/constant.py Added constant classifier for edge cases
mpest/preprocessing/components_classifier/assymetry.py Added skewness-based classifier
mpest/preprocessing/components_classifier/abstract_classifier.py Defined abstract classifier interface
mpest/preprocessing/components_classifier/init.py Exported classifier classes
Comments suppressed due to low confidence (3)

mpest/preprocessing/parameterization.py:14

  • The method name get_componet_parameters has a typo; consider renaming to get_component_parameters for clarity.
def get_componet_parameters(model: Model, samples: Samples) -> np.ndarray:

mpest/preprocessing/criterions/abstract_criterion.py:19

  • The method name _loglikehood is misspelled; consider renaming to _loglikelihood.
def _loglikehood(

mpest/preprocessing/components_classifier/assymetry.py:12

  • The class name Assymetry is misspelled; consider renaming to Asymmetry to match the standard term.
class Assymetry(AComponentsClassifier):

@@ -0,0 +1,5 @@
"""Module wich contains supported distributions"""
Copy link
Preview

Copilot AI May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the typo in the module docstring: replace "wich" with "which".

Suggested change
"""Module wich contains supported distributions"""
"""Module which contains supported distributions"""

Copilot uses AI. Check for mistakes.

@@ -0,0 +1,38 @@
"""Module wich contains the method of task creation for EM algorithm"""
Copy link
Preview

Copilot AI May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the typo in the module docstring: replace "wich" with "which".

Suggested change
"""Module wich contains the method of task creation for EM algorithm"""
"""Module which contains the method of task creation for EM algorithm"""

Copilot uses AI. Check for mistakes.

@@ -0,0 +1,45 @@
"""Module wich contains a method for random parameterization of distributions"""
Copy link
Preview

Copilot AI May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the typo in the module docstring: replace "wich" with "which".

Suggested change
"""Module wich contains a method for random parameterization of distributions"""
"""Module which contains a method for random parameterization of distributions"""

Copilot uses AI. Check for mistakes.


@property
def name(self):
return "Bayesian_Infromation_Criterion"
Copy link
Preview

Copilot AI May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the typo in the name property: replace "Infromation" with "Information".

Suggested change
return "Bayesian_Infromation_Criterion"
return "Bayesian_Information_Criterion"

Copilot uses AI. Check for mistakes.


@property
def name(self):
return "Akaike_Infromation_Criterion"
Copy link
Preview

Copilot AI May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the typo in the name property: replace "Infromation" with "Information".

Suggested change
return "Akaike_Infromation_Criterion"
return "Akaike_Information_Criterion"

Copilot uses AI. Check for mistakes.


def estimate(self, samples: Samples) -> int:
samples = samples.reshape(-1, 1)
k_range = range(1, self.kmax + 2) # possible components: [2, kmax]
Copy link
Preview

Copilot AI May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment incorrectly states the range; update it to reflect range(1, kmax+2) or adjust code to match the intended range.

Suggested change
k_range = range(1, self.kmax + 2) # possible components: [2, kmax]
k_range = range(1, self.kmax + 2) # possible components: [1, kmax + 1]

Copilot uses AI. Check for mistakes.

Comment on lines +10 to +11
X-Means method
-----
Copy link
Preview

Copilot AI May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class docstring mentions "X-Means method" but this is the Constant Classifier; update the description accordingly.

Suggested change
X-Means method
-----
Constant Classifier
-----
This classifier always predicts the same model for all inputs.

Copilot uses AI. Check for mistakes.



class Peaks(AComponentsNumber):
"""Peaks Method with Emperical Density"""
Copy link
Preview

Copilot AI May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the typo: replace "Emperical" with "Empirical".

Suggested change
"""Peaks Method with Emperical Density"""
"""Peaks Method with Empirical Density"""

Copilot uses AI. Check for mistakes.

Comment on lines +1 to +2
"""TODO"""

Copy link
Preview

Copilot AI May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Replace the placeholder docstring with a brief module description.

Suggested change
"""TODO"""
"""This module provides various components and utilities for classifying and preprocessing data.
It includes abstract base classes, specific classifiers, and statistical tests such as:
- AComponentsClassifier: Abstract base class for component classifiers.
- Assymetry: Handles asymmetry-related classification tasks.
- CModel: Represents a constant model for classification.
- ICClassifier: Implements an information criterion-based classifier.
- KSTest: Performs the Kolmogorov-Smirnov test for statistical analysis.
"""

Copilot uses AI. Check for mistakes.

@@ -0,0 +1,8 @@
"""Module which represents method estimating components number and abstract classe"""
Copy link
Preview

Copilot AI May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the typo: replace "classe" with "classes".

Suggested change
"""Module which represents method estimating components number and abstract classe"""
"""Module which represents method estimating components number and abstract classes"""

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants