Skip to content

Mahdiyar-Tagharobi/Logistic_Regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Logistic Regression simple code

This Python code implements a Logistic Regression model for handwritten digit classification using the scikit-learn library. The code performs the following steps:

  1. Data Loading:

    • Loads the digits dataset from scikit-learn, containing images of handwritten digits (0-9) and their corresponding labels.
    • Prints the shapes of the data and target arrays, indicating the number of images and labels.
    • Displays the first five images and their labels using Matplotlib.
  2. Data Splitting:

    • Splits the data into training and testing sets using train_test_split.
    • The training set is used to train the model, and the testing set is used to evaluate its performance.
    • The test size is set to 23% using test_size=0.23, and a random state of 2 is used for reproducibility using random_state=2.
    • Prints the shapes of the training and testing sets for clarity.
  3. Model Training:

    • Creates an instance of the LogisticRegression classifier from scikit-learn.
    • Sets the maximum number of iterations (max_iter) to 1000 for the optimization algorithm.
    • Trains the model on the training data using the fit method.
  4. Model Evaluation:

    • Predicts the class label for the first element of the test set using predict.
    • Compares the predicted label with the actual label to assess initial performance.
    • Predicts the class labels for the first 10 elements of the test set.
    • Calculates the model's accuracy using the score method.
    • Generates a confusion matrix using metrics.confusion_matrix to visualize the model's performance across all classes.
    • Creates a heatmap of the confusion matrix using Seaborn for better interpretation.
  5. Visualization of Misclassified Digits:

    • Identifies the indices of misclassified digits in the testing set.
    • Plots the first four misclassified digits using Matplotlib, along with their predicted and actual labels.

Now, What is the Logistic Regression?

Logistic regression is a statistical method for classification problems with binary dependent variables (0 or 1). It estimates the probability of an event occurring based on a set of independent variables. In this case, the model predicts the probability that a given image represents a specific handwritten digit (0-9).

Logistic Regression Applications:

  • Medical diagnosis (e.g., predicting cancer risk based on medical history)
  • Spam email filtering
  • Customer churn prediction (identifying customers likely to cancel service)
  • Fraud detection in financial transactions
  • Sentiment analysis of text data (classifying text as positive, negative, or neutral)

Logistic Regression Advantages:

  • Easy to interpret: Coefficients in the model provide insights into the relationship between features and the target variable.
  • Computationally efficient: Training and prediction are relatively fast compared to some other models.
  • Works well with high-dimensional data: Can handle a large number of features without significant performance degradation.
  • Good baseline model: Often used as a baseline for evaluating the performance of more complex models.

Logistic Regression Disadvantages:

  • Limited to binary classification problems: Cannot handle multi-class problems directly (requires special techniques like one-vs-rest).
  • Assumes linear relationship between features and target: If the relationship is non-linear, logistic regression may not perform well.
  • Sensitive to outliers: Can be affected by outliers in the data.

About

This repository is used to show logistic regression projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published