🛒 FastText-(Meta)-Ecommerce-Category-Classification

This project demonstrates how to perform text classification on e-commerce product descriptions using FastText.

📊 Dataset

The dataset used in this project contains e-commerce item descriptions categorized into four classes:

🏠 Household
🖥️ Electronics
🧥 Clothing and Accessories
📚 Books

Dataset source: Kaggle - E-commerce Text Classification

🔧 Data Preparation

Loading the Data

We use pandas to load and inspect the dataset:

import pandas as pd

df = pd.read_csv("ecommerce_dataset.csv", names=["category", "description"], header=None)
print(df.shape)
df.head(3)

Output:

(50425, 2)
   category                                        description
0  Household  Paper Plane Design Framed Wall Hanging Motivat...
1  Household  SAF 'Floral' Framed Painting (Wood, 30 inch x ...
2  Household  SAF 'UV Textured Modern Art Print Framed' Pain...

Preparing Labels for FastText

FastText expects labels to be prefixed with __label__. We create a new column combining the label and description:

df['category'] = '__label__' + df['category'].astype(str)
df['category_description'] = df['category'] + ' ' + df['description']

🧹 Text Preprocessing

We preprocess the text data using regular expressions to:

Remove punctuation
Remove extra spaces
Convert text to lowercase

import re

def preprocess(text):
    text = re.sub(r'[^\w\s\']',' ', text)
    text = re.sub(' +', ' ', text)
    return text.strip().lower() 

df['category_description'] = df['category_description'].map(preprocess)

💾 Generating CSV for FastText

We split the data into training and testing sets, then save them as CSV files:

train.to_csv("ecommerce.train", columns=["category_description"], index=False, header=False)
test.to_csv("ecommerce.test", columns=["category_description"], index=False, header=False)

🏋️ Training and Evaluation

We use FastText to train the model and evaluate its performance:

import fasttext

model = fasttext.train_supervised(input="ecommerce.train")
model.test("ecommerce.test")

Results:

(10085, 0.9682697074863659, 0.9682697074863659)

The model achieves approximately 96.83% precision and recall on the test set.

🔮 Predictions

We can use the trained model to make predictions on new product descriptions. Let's examine some examples:

🖥️ Electronics Prediction

product_description = "wintech assemble desktop pc cpu 500 gb sata hdd 4 gb ram intel c2d processor 3"
prediction = model.predict(product_description)
print(f"Product: {product_description}")
print(f"Predicted Category: {prediction[0][0]}")
print(f"Confidence: {prediction[1][0]:.2%}")

Output:

Product: wintech assemble desktop pc cpu 500 gb sata hdd 4 gb ram intel c2d processor 3
Predicted Category: __label__electronics
Confidence: 98.56%

The model correctly identifies this as an electronics product with high confidence.

🧥 Clothing and Accessories Prediction

product_description = "ockey men's cotton t shirt fabric details 80 cotton 20 polyester super combed cotton rich fabric"
prediction = model.predict(product_description)
print(f"Product: {product_description}")
print(f"Predicted Category: {prediction[0][0]}")
print(f"Confidence: {prediction[1][0]:.2%}")

Output:

Product: ockey men's cotton t shirt fabric details 80 cotton 20 polyester super combed cotton rich fabric
Predicted Category: __label__clothing_accessories
Confidence: 100.00%

The model correctly classifies this as a clothing item with very high confidence.

📚 Books Prediction

product_description = "think and grow rich deluxe edition"
prediction = model.predict(product_description)
print(f"Product: {product_description}")
print(f"Predicted Category: {prediction[0][0]}")
print(f"Confidence: {prediction[1][0]:.2%}")

Output:

Product: think and grow rich deluxe edition
Predicted Category: __label__books
Confidence: 100.00%

The model accurately identifies this as a book with very high confidence.

🔍 Word Similarities

We can also find similar words using the trained model:

model.get_nearest_neighbors("painting")

Output:

[(0.9976388216018677, 'vacuum'),
 (0.9968333840370178, 'guard'),
 (0.9968314170837402, 'heating'),
 (0.9966275095939636, 'lid'),
 (0.9962871670722961, 'lamp'),
 ...]

This shows words that the model considers similar to "painting" in the context of e-commerce products.

model.get_nearest_neighbors("sony")

Output:

[(0.9988397359848022, 'external'),
 (0.998672366142273, 'binoculars'),
 (0.9981507658958435, 'dvd'),
 (0.9975149631500244, 'nikon'),
 (0.9973592162132263, 'glossy'),
 ...]

These results show words that the model associates closely with the brand "Sony" in the e-commerce context.

🚀 Conclusion

This project demonstrates the effectiveness of FastText in classifying e-commerce product descriptions. With high accuracy and the ability to make quick predictions, this model can be a valuable tool for automating product categorization in e-commerce platforms.

For further improvements, consider:

Experimenting with different preprocessing techniques
Fine-tuning FastText hyperparameters
Exploring other deep learning models for comparison

Happy classifying! 🎉

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Fasttext_Ecommerce_Classification.ipynb		Fasttext_Ecommerce_Classification.ipynb
README.md		README.md
ecommerce_dataset.csv		ecommerce_dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🛒 FastText-(Meta)-Ecommerce-Category-Classification

📊 Dataset

🔧 Data Preparation

Loading the Data

Preparing Labels for FastText

🧹 Text Preprocessing

💾 Generating CSV for FastText

🏋️ Training and Evaluation

🔮 Predictions

🖥️ Electronics Prediction

🧥 Clothing and Accessories Prediction

📚 Books Prediction

🔍 Word Similarities

🚀 Conclusion

About

Uh oh!

Releases

Packages

Languages

FYT3RP4TIL/FastText-Meta-Ecommerce-Category-Classification

Folders and files

Latest commit

History

Repository files navigation

🛒 FastText-(Meta)-Ecommerce-Category-Classification

📊 Dataset

🔧 Data Preparation

Loading the Data

Preparing Labels for FastText

🧹 Text Preprocessing

💾 Generating CSV for FastText

🏋️ Training and Evaluation

🔮 Predictions

🖥️ Electronics Prediction

🧥 Clothing and Accessories Prediction

📚 Books Prediction

🔍 Word Similarities

🚀 Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages