Discover trending, relevant reads instantly with AI-powered article matching!
Features โข Installation โข Documentation โข Usage โข Contributing
Inside-Medium is an AI-powered content recommendation engine designed to help readers find the most relevant and high-quality Medium articles based on their interests or selected articles. By leveraging Natural Language Processing (NLP) and Topic Modeling (NMF) techniques, the system extracts hidden topics from articles, encodes them into meaningful vectors, and uses cosine similarity to recommend similar content.
๐ Source: Medium Articles Dataset โ Kaggle
The Medium Articles Dataset is a curated collection of publicly available articles published on Medium.com. It contains both textual content and engagement metadata, making it ideal for tasks like recommendation systems, NLP, and content analysis.
-
Total Records: ~8,000 articles
-
Key Columns:
title
: Title of the articlesubtitle
: Subtitle or secondary headingauthor
: Author of the articledate
: Publication dateclaps
: Number of claps (engagement metric)reading_time
: Estimated reading time (in minutes)publication
: Name of the publication (if any)url
: Link to the original articlearticle
: Full textual content of the article
- Great for topic modeling, text classification, and recommendation systems
- Contains real-world engagement signals (
claps
) to enrich the model - Useful for building AI-driven content discovery platforms like Inside-Medium
๐ Dataset Link: https://www.kaggle.com/datasets/dorianlazar/medium-articles-dataset/data
-
๐ Content-Based Article Recommendation Recommends articles similar to a userโs query based on textual content and latent topic features.
-
๐ Similarity Scoring Calculates cosine similarity between articles to identify the most relevant ones.
-
๐ Interactive Query Support Users can input any article title to retrieve a list of the most similar articles.
-
๐งผ Modular, Clean Codebase Structured using classes for vectorization, normalization, and similarity search with full docstrings and logging.
-
๐ฆ Reproducible Pipeline Complete workflow from raw data to recommendationsโeasy to extend or integrate into other systems.
-
๐งพ Logging and Error Handling Built-in logging for debugging and tracking progress/errors in each module.
-
๐ Scalable Design Easy to adapt for larger datasets or additional features like user profiling or collaborative filtering.
Explore other Detailed Fine-Tuning Methods of Large Language Models with Mathematical Calculations:
๐ Read the article here: Inside Mediumโs Recommendation Engine: How It Knows What Youโll Love
# Clone the repository
git clone https://github.com/priyam-hub/Inside-Medium.git
# Navigate into the directory
cd Inside-Medium
# Run env_setup.sh
bash env_setup.sh
# Select 1 to create Python Environment
# Select 2 to create Conda Environment
# Python Version - 3.10
# Make the Project to run as a Local Package
python setup.py
- Log-In to your Kaggle Account
- An API token downloaded from Kaggle Account Settings โ Create New Token.
- Manually place your kaggle.json (downloaded from https://www.kaggle.com/settings) into this location:
C:\Users\<Your_Username>\.kaggle\kaggle.json
Step - 4: Create a .env file in the root directory to add Credentials or (Change the filename ".sample_env" to ".env")
KAGGLE_USERNAME = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
KAGGLE_API_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Run the Main Python Script
python main.py
# Run the Web App using Flask Server
python web/app.py
Note - Upon running, navigate to the provided local URL in your browser to interact with the Inside-Medium Recommendation Engine
Python โ Core programming language used to build the recommendation pipeline, data processing, and backend logic. ๐ Install Python
Pandas & NumPy โ Used for efficient data manipulation, cleaning, and numerical operations. ๐ Pandas Documentation | NumPy Documentation
Scikit-learn โ Used for feature extraction (TF-IDF), dimensionality reduction (NMF), and similarity computation. ๐ Scikit-learn Documentation
Flask โ Lightweight Python web framework used to serve the recommendation engine as an API or simple web app. ๐ Flask Installation
Logging โ Pythonโs built-in logging
module used for tracking system operations and debugging.
๐ Logging Documentation
Kaggle API โ Used to automatically fetch and manage the Medium Articles dataset. ๐ Kaggle API Setup Guide
Inside-Medium/
โโโ .env # Store the Kaggle Username and API Key
โโโ .gitignore # Ignoring files for Git
โโโ env_setup.sh # Package installation configuration
โโโ folder_structure.py # Contains the Project Folder Structure
โโโ LICENCE # MIT License
โโโ main.py # Full Pipeline of the Project
โโโ README.md # Project documentation
โโโ requirements.txt # Python dependencies
โโโ setup.py # Create the Project as Python Package
โโโ config/ # Configuration files
โ โโโ __init__.py
โ โโโ config.py/ # All Configuration Variables of Pipeline
โโโ data/ # Data Directory
โ โโโ images/ # Medium Article Images Directory
โ โโโ medium_normalized_data.csv # Normalized Data of the Medium Articles
โ โโโ medium_processed_data.csv # Processed Data of the Medium Articles
โ โโโ medium_raw_data.csv # Raw Data of the Medium Articles
โโโ logger/ # Logger Setup Directory
โ โโโ logger.py # Format of the Logger Setup of the Project
โโโ notebooks/ # Jupyter notebooks for experimentation
โ โโโ Recommendation_System.ipynb # Experimented Recommendation Engine in Jupyter Notebook
โโโ results/ # Directory to Store the results of the Project
โ โโโ eda_results/ # Directory to Store the EDA Results
โโโ src/ # Source code
โ โโโ data_preprocessor/ # Data Preprocessor Directory
โ โ โโโ __init__.py
โ โ โโโ data_preprocessor.py # Python file process the raw data
โ โโโ exploratory_data_analysis/ # EDA Directory
โ โ โโโ __init__.py
โ โ โโโ exploratory_data_analyzer.py # Python file to perform EDA
โ โโโ normalizer/ # Text Normalizing Directory
โ โ โโโ __init__.py
โ โ โโโ nmf_normalizer.py # Python File to Normalize the Preprocessed Data
โ โโโ recommendation_engine/ # Recommendation Engine Directory
โ โ โโโ __init__.py
โ โ โโโ similarity_finder.py # Python file to perform similarity search
โ โโโ vectorizer/ # Recommendation Engine Directory
โ โ โโโ __init__.py
โ โ โโโ tfidf_vectorizer.py # Python file to perform vectorizer
โ โโโ utils/ # Utility Functions Directory
โ โโโ __init__.py
โ โโโ data_loader.py # Load and Save Data from Local
โ โโโ download_dataset.py # Download the Data from Kaggle
โ โโโ save_plot.py # Save the Plot in Specified Path
โโโ web/
โโโ __init__.py
โโโ static/
โ โโโ styles.css # Styling of the Web Page
โ โโโ script.js # JavaScript File
โโโ templates/
โ โโโ index.html # Default Web Page
โโโ app.py/ # To run the flask server
The Inside-Medium project can be extended significantly to offer a more personalized and intelligent content recommendation system. Here's a proposed roadmap structured in three development phases, each with an estimated time frame.
Objective: Transform the backend logic into a user-accessible application.
- Build a clean and responsive frontend using HTML/CSS/JS for user interaction.
- Deploy the article recommender as a Flask API, allowing input of article titles and displaying similar content.
- Enable users to upload custom datasets (CSV) for analysis and recommendations.
- Add search bar, loading indicators, and user-friendly error messages.
Objective: Enhance the intelligence of the recommender.
- Introduce user profiles to track reading history and provide personalized recommendations.
- Apply LDA or BERTopic for better topic clustering and diversity in suggestions.
- Integrate claps, reading time, and tags more deeply into the similarity scoring system.
- Include feedback mechanism to rate recommended articles.
Objective: Upgrade the recommendation engine with deep learning and language models.
- Replace TF-IDF + NMF with sentence embeddings using
SentenceTransformers
or Hugging Face models. - Use vector databases (e.g., Qdrant, FAISS) for faster and smarter similarity search.
- Integrate with LLMs (e.g., OpenAI, LLaMA via LangChain) to enable query-based article retrieval using natural language.
- Package the app into a Docker container and deploy to the cloud for scalability.
This project is licensed under the MIT License. See the LICENSE file for more details.
Made by Priyam Pal - AI and Data Science Engineer
[โ Back to Top]