Skip to content

Inside-Medium is an AI-powered content recommendation engine designed to help readers find the most relevant and high-quality Medium articles based on their interests or selected articles.

License

Notifications You must be signed in to change notification settings

priyam-hub/Inside-Medium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

21 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Cover Page

๐Ÿค– Inside-Medium : The Right Article, at the Right Time

Discover trending, relevant reads instantly with AI-powered article matching!

Python License Code style: black

Features โ€ข Installation โ€ข Documentation โ€ข Usage โ€ข Contributing


๐ŸŒŸ Overview

Inside-Medium is an AI-powered content recommendation engine designed to help readers find the most relevant and high-quality Medium articles based on their interests or selected articles. By leveraging Natural Language Processing (NLP) and Topic Modeling (NMF) techniques, the system extracts hidden topics from articles, encodes them into meaningful vectors, and uses cosine similarity to recommend similar content.


๐Ÿ“š Dataset - Medium Articles Dataset

๐Ÿ“Ž Source: Medium Articles Dataset โ€“ Kaggle

The Medium Articles Dataset is a curated collection of publicly available articles published on Medium.com. It contains both textual content and engagement metadata, making it ideal for tasks like recommendation systems, NLP, and content analysis.

๐Ÿ“ Dataset Highlights:

  • Total Records: ~8,000 articles

  • Key Columns:

    • title: Title of the article
    • subtitle: Subtitle or secondary heading
    • author: Author of the article
    • date: Publication date
    • claps: Number of claps (engagement metric)
    • reading_time: Estimated reading time (in minutes)
    • publication: Name of the publication (if any)
    • url: Link to the original article
    • article: Full textual content of the article

โœ… Why This Dataset?

  • Great for topic modeling, text classification, and recommendation systems
  • Contains real-world engagement signals (claps) to enrich the model
  • Useful for building AI-driven content discovery platforms like Inside-Medium

๐Ÿ“Œ Dataset Link: https://www.kaggle.com/datasets/dorianlazar/medium-articles-dataset/data


๐Ÿš€ Features of Inside-Medium

  • ๐Ÿ” Content-Based Article Recommendation Recommends articles similar to a userโ€™s query based on textual content and latent topic features.

  • ๐Ÿ“ˆ Similarity Scoring Calculates cosine similarity between articles to identify the most relevant ones.

  • ๐Ÿ“‘ Interactive Query Support Users can input any article title to retrieve a list of the most similar articles.

  • ๐Ÿงผ Modular, Clean Codebase Structured using classes for vectorization, normalization, and similarity search with full docstrings and logging.

  • ๐Ÿ“ฆ Reproducible Pipeline Complete workflow from raw data to recommendationsโ€”easy to extend or integrate into other systems.

  • ๐Ÿงพ Logging and Error Handling Built-in logging for debugging and tracking progress/errors in each module.

  • ๐Ÿ“‚ Scalable Design Easy to adapt for larger datasets or additional features like user profiling or collaborative filtering.


๐Ÿ“ฐ Published Article

Explore other Detailed Fine-Tuning Methods of Large Language Models with Mathematical Calculations:

๐Ÿ”— Read the article here: Inside Mediumโ€™s Recommendation Engine: How It Knows What Youโ€™ll Love


๐Ÿ› ๏ธ Installation

Step - 1: Repository Cloning

# Clone the repository
git clone https://github.com/priyam-hub/Inside-Medium.git

# Navigate into the directory
cd Inside-Medium

Step - 2: Enviornmental Setup and Dependency Installation

# Run env_setup.sh
bash env_setup.sh

# Select 1 to create Python Environment
# Select 2 to create Conda Environment

# Python Version - 3.10

# Make the Project to run as a Local Package
python setup.py

Step - 3: Creation of Kaggle API

  • Log-In to your Kaggle Account
  • An API token downloaded from Kaggle Account Settings โ†’ Create New Token.
  • Manually place your kaggle.json (downloaded from https://www.kaggle.com/settings) into this location:
C:\Users\<Your_Username>\.kaggle\kaggle.json

Step - 4: Create a .env file in the root directory to add Credentials or (Change the filename ".sample_env" to ".env")

KAGGLE_USERNAME = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
KAGGLE_API_KEY  = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Step - 5: Run the Full Pipeline

# Run the Main Python Script
python main.py

Step - 6: Run the Flask Server (Up-Coming)

# Run the Web App using Flask Server
python web/app.py

Note - Upon running, navigate to the provided local URL in your browser to interact with the Inside-Medium Recommendation Engine


๐Ÿงฐ Technology Stack

Python โ€“ Core programming language used to build the recommendation pipeline, data processing, and backend logic. ๐Ÿ”— Install Python

Pandas & NumPy โ€“ Used for efficient data manipulation, cleaning, and numerical operations. ๐Ÿ”— Pandas Documentation | NumPy Documentation

Scikit-learn โ€“ Used for feature extraction (TF-IDF), dimensionality reduction (NMF), and similarity computation. ๐Ÿ”— Scikit-learn Documentation

Flask โ€“ Lightweight Python web framework used to serve the recommendation engine as an API or simple web app. ๐Ÿ”— Flask Installation

Logging โ€“ Pythonโ€™s built-in logging module used for tracking system operations and debugging. ๐Ÿ”— Logging Documentation

Kaggle API โ€“ Used to automatically fetch and manage the Medium Articles dataset. ๐Ÿ”— Kaggle API Setup Guide


๐Ÿ“ Project Structure

Inside-Medium/
โ”œโ”€โ”€ .env                                      # Store the Kaggle Username and API Key
โ”œโ”€โ”€ .gitignore                                # Ignoring files for Git
โ”œโ”€โ”€ env_setup.sh                              # Package installation configuration
โ”œโ”€โ”€ folder_structure.py                       # Contains the Project Folder Structure
โ”œโ”€โ”€ LICENCE                                   # MIT License
โ”œโ”€โ”€ main.py                                   # Full Pipeline of the Project
โ”œโ”€โ”€ README.md                                 # Project documentation
โ”œโ”€โ”€ requirements.txt                          # Python dependencies
โ”œโ”€โ”€ setup.py                                  # Create the Project as Python Package
โ”œโ”€โ”€ config/                                   # Configuration files
โ”‚   โ”œโ”€โ”€ __init__.py                           
โ”‚   โ””โ”€โ”€ config.py/                            # All Configuration Variables of Pipeline
โ”œโ”€โ”€ data/                                     # Data Directory
โ”‚   โ”œโ”€โ”€ images/                               # Medium Article Images Directory
โ”‚   โ”œโ”€โ”€ medium_normalized_data.csv            # Normalized Data of the Medium Articles
โ”‚   โ”œโ”€โ”€ medium_processed_data.csv             # Processed Data of the Medium Articles
โ”‚   โ””โ”€โ”€ medium_raw_data.csv                   # Raw Data of the Medium Articles
โ”œโ”€โ”€ logger/                                   # Logger Setup Directory
โ”‚   โ””โ”€โ”€ logger.py                             # Format of the Logger Setup of the Project
โ”œโ”€โ”€ notebooks/                                # Jupyter notebooks for experimentation
โ”‚   โ””โ”€โ”€ Recommendation_System.ipynb           # Experimented Recommendation Engine in Jupyter Notebook
โ”œโ”€โ”€ results/                                  # Directory to Store the results of the Project
โ”‚   โ””โ”€โ”€ eda_results/                          # Directory to Store the EDA Results
โ”œโ”€โ”€ src/                                      # Source code
โ”‚   โ”œโ”€โ”€ data_preprocessor/                    # Data Preprocessor Directory
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py  
โ”‚   โ”‚   โ””โ”€โ”€ data_preprocessor.py              # Python file process the raw data                                       
โ”‚   โ”œโ”€โ”€ exploratory_data_analysis/            # EDA Directory
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py   
โ”‚   โ”‚   โ””โ”€โ”€ exploratory_data_analyzer.py      # Python file to perform EDA                                   
โ”‚   โ”œโ”€โ”€ normalizer/                           # Text Normalizing Directory
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py                           
โ”‚   โ”‚   โ””โ”€โ”€ nmf_normalizer.py                 # Python File to Normalize the Preprocessed Data                                    
โ”‚   โ”œโ”€โ”€ recommendation_engine/                # Recommendation Engine Directory
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py   
โ”‚   โ”‚   โ””โ”€โ”€ similarity_finder.py              # Python file to perform similarity search   
โ”‚   โ”œโ”€โ”€ vectorizer/                           # Recommendation Engine Directory
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py   
โ”‚   โ”‚   โ””โ”€โ”€ tfidf_vectorizer.py               # Python file to perform vectorizer                             
โ”‚   โ””โ”€โ”€ utils/                                # Utility Functions Directory
โ”‚       โ”œโ”€โ”€ __init__.py                     
โ”‚       โ”œโ”€โ”€ data_loader.py                    # Load and Save Data from Local
โ”‚       โ”œโ”€โ”€ download_dataset.py               # Download the Data from Kaggle
โ”‚       โ””โ”€โ”€ save_plot.py                      # Save the Plot in Specified Path
โ””โ”€โ”€ web/
    โ”œโ”€โ”€ __init__.py  
    โ”œโ”€โ”€ static/                                
    โ”‚   โ”œโ”€โ”€ styles.css                        # Styling of the Web Page
    โ”‚   โ””โ”€โ”€ script.js                         # JavaScript File
    โ”œโ”€โ”€ templates/                                
    โ”‚   โ””โ”€โ”€ index.html                        # Default Web Page
    โ””โ”€โ”€ app.py/                               # To run the flask server
        

๐Ÿ”ฎ Future Work Roadmap

The Inside-Medium project can be extended significantly to offer a more personalized and intelligent content recommendation system. Here's a proposed roadmap structured in three development phases, each with an estimated time frame.


๐Ÿš€ Phase 1: UI & API Integration (1โ€“2 Weeks)

Objective: Transform the backend logic into a user-accessible application.

  • Build a clean and responsive frontend using HTML/CSS/JS for user interaction.
  • Deploy the article recommender as a Flask API, allowing input of article titles and displaying similar content.
  • Enable users to upload custom datasets (CSV) for analysis and recommendations.
  • Add search bar, loading indicators, and user-friendly error messages.

๐Ÿง  Phase 2: Personalization & Topic Modeling (2โ€“3 Weeks)

Objective: Enhance the intelligence of the recommender.

  • Introduce user profiles to track reading history and provide personalized recommendations.
  • Apply LDA or BERTopic for better topic clustering and diversity in suggestions.
  • Integrate claps, reading time, and tags more deeply into the similarity scoring system.
  • Include feedback mechanism to rate recommended articles.

๐Ÿง  Phase 3: Embedding Models & LLM Integration (3โ€“4 Weeks)

Objective: Upgrade the recommendation engine with deep learning and language models.

  • Replace TF-IDF + NMF with sentence embeddings using SentenceTransformers or Hugging Face models.
  • Use vector databases (e.g., Qdrant, FAISS) for faster and smarter similarity search.
  • Integrate with LLMs (e.g., OpenAI, LLaMA via LangChain) to enable query-based article retrieval using natural language.
  • Package the app into a Docker container and deploy to the cloud for scalability.

๐Ÿ“œ License

This project is licensed under the MIT License. See the LICENSE file for more details.

Made by Priyam Pal - AI and Data Science Engineer

[โ†‘ Back to Top]

About

Inside-Medium is an AI-powered content recommendation engine designed to help readers find the most relevant and high-quality Medium articles based on their interests or selected articles.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published