Keyword Cannibalization Analyzer

A powerful tool for detecting and analyzing keyword cannibalization issues in your content. This application uses various similarity methods to identify potential content overlap and provides detailed visualizations and reports.

Features

Multiple Similarity Methods:
- TF-IDF: Term frequency-based similarity
- Sentence Transformers: Deep learning-based semantic similarity
- Levenshtein Distance: Character-based string similarity
- OpenAI Embeddings: Advanced AI-based semantic similarity
Sentence Transformer Models:
- MiniLM (Fast, 384d)
- MPNet (Balanced, 768d)
- Multilingual MiniLM (384d)
- Multilingual MPNet (768d)
Interactive Visualizations:
- Bar charts for title similarity
- Scatter plots for parameter similarity
- Interactive HTML reports with DataTables
File Support:
- CSV files (.csv)
- Excel files (.xlsx, .xls)
Additional Features:
- Persian text preprocessing support
- Configurable similarity thresholds
- Detailed HTML reports
- CSV export functionality

Installation

Clone the repository:

git clone https://github.com/Danobin/SEO-Cannibalization-Analysis.git
cd keyword-cannibalization

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Start the FastAPI backend:

uvicorn app:app --reload

In a separate terminal, start the Streamlit frontend:

streamlit run streamlit_app.py

Open your browser and navigate to http://localhost:8501
Upload your data file (CSV or Excel) containing:
- Title column
- Permalink/URL column
Configure analysis parameters:
- Select similarity methods for title and URL comparison
- Adjust similarity thresholds
- Choose Sentence Transformer model if applicable
- Configure OpenAI settings if using OpenAI embeddings
- Enable/disable Persian preprocessing
Click "Analyze" to start the analysis
View results in different formats:
- Table View: Raw data in tabular format
- Visualization: Interactive charts
- HTML Report: Downloadable detailed report

Input File Format

Your input file should be a CSV or Excel file with at least these columns:

Title: The title of your content
Permalink/URL: The URL or permalink of your content

Example CSV format:

Title,Permalink
"Best SEO Practices 2024","/seo/best-practices-2024"
"SEO Guide for Beginners","/seo/guide-beginners"

Output

The analysis provides:

Number of potential keyword cannibalization issues
Detailed results showing:
- Title pairs with similarity scores
- URL/parameter similarity scores
- Visual representations of similarities
Downloadable reports:
- HTML report with interactive visualizations
- CSV file with raw data

API Endpoints

POST /analyze

Analyzes content for keyword cannibalization.

Request:

File: CSV or Excel file
Config: JSON configuration object

Response:

{
    "total_matches": 10,
    "results": [
        {
            "Title_1": "Example Title 1",
            "Title_2": "Example Title 2",
            "Permalink_1": "/example-1",
            "Permalink_2": "/example-2",
            "Title_Similarity": "85%",
            "Param_Similarity": "90%"
        }
    ]
}

Dependencies

FastAPI
Streamlit
Pandas
NumPy
TheFuzz
scikit-learn
sentence-transformers
plotly
htmlmin
openpyxl

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Sentence Transformers library for semantic similarity
TheFuzz library for string matching
OpenAI for embedding capabilities
FastAPI and Streamlit for the web interface

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
demo_keyword_cannibalization.ipynb		demo_keyword_cannibalization.ipynb
keyword_cannibalization.py		keyword_cannibalization.py
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py
test_api.py		test_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Keyword Cannibalization Analyzer

Features

Installation

Usage

Input File Format

Output

API Endpoints

POST /analyze

Dependencies

Contributing

License

Acknowledgments

About

Uh oh!

Releases 1

Languages

License

Danobin/SEO-Cannibalization-Analysis

Folders and files

Latest commit

History

Repository files navigation

Keyword Cannibalization Analyzer

Features

Installation

Usage

Input File Format

Output

API Endpoints

POST /analyze

Dependencies

Contributing

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages