🤖 AI File Cleanup - Intelligent Duplicate Detection System

Advanced AI-powered file deduplication system using machine learning embeddings for intelligent similarity detection across text, images, and binary files.

🌟 Features

🤖 AI-Powered Detection

Text Similarity - Uses Xenova/all-MiniLM-L6-v2 for semantic text analysis
Image Similarity - Leverages Xenova/CLIP for visual content matching
Exact Matching - SHA256 hash-based duplicate detection
Real-time Processing - Live embedding generation and similarity scoring

🎨 Professional Interface

Modern Web App - Beautiful gradient design with glass-morphism effects
Desktop Application - Cross-platform Electron app with native file access
Responsive Design - Works seamlessly on desktop, tablet, and mobile
Real-time Statistics - Live progress tracking and AI model status

⚡ Performance & Architecture

Monorepo Structure - Organized with pnpm + TurboRepo for efficient development
Microservices Design - Separate AI worker, API service, and frontend applications
TypeScript - Full type safety across all packages
Modern Stack - Next.js, Fastify, Prisma, and transformers.js

Screen recording

Screen-Recording.mp4

🚀 Quick Start

Prerequisites

Node.js 18+
pnpm (recommended) or npm
Git

Installation

# Clone the repository
git clone https://github.com/vibhorjoshi/ai-file-cleaner.git
cd ai-file-cleaner

# Install dependencies
pnpm install

# Build all packages
pnpm run build

# Start development servers
pnpm run dev

🌐 Access the Application

Web Interface: http://localhost:3001
Professional HTML Version: http://localhost:3001/index.html
AI Model Worker: http://127.0.0.1:58748
Desktop App: Run pnpm run dev --filter=@ai-file-cleanup/desktop

📁 Project Structure

ai-file-cleanup/
├── apps/
│   ├── web/                    # Next.js web application
│   │   ├── src/
│   │   └── public/            # HTML/CSS/JS version
│   └── desktop/               # Electron desktop app
├── services/
│   ├── api/                   # Fastify API service
│   └── model-worker/          # AI inference service
├── packages/
│   ├── ui/                    # Shared React components
│   ├── core/                  # Shared TypeScript types
│   ├── db/                    # Prisma database schema
│   └── types/                 # Type definitions
└── infra/
    └── docker/                # Docker configurations

🔧 Development

Build System

# Build all packages
pnpm run build

# Build specific package
pnpm run build --filter=@ai-file-cleanup/web

# Development mode (all services)
pnpm run dev

# Development mode (specific service)
pnpm run dev --filter=@ai-file-cleanup/model-worker

Testing

# Run all tests
pnpm test

# Type checking
pnpm run type-check

# Linting
pnpm run lint

🤖 AI Models

Text Embeddings

Model: Xenova/all-MiniLM-L6-v2
Dimensions: 384
Use Case: Document similarity, text deduplication

Image Embeddings

Model: Xenova/clip-vit-base-patch32
Dimensions: 512
Use Case: Visual similarity, image deduplication

API Endpoints

// Text similarity
POST /embeddings/text
Content-Type: application/json
{
  "texts": ["document content 1", "document content 2"]
}

// Image similarity  
POST /embeddings/images
Content-Type: multipart/form-data

📊 Performance

Benchmarks

Text Processing: ~100ms per document
Image Processing: ~200ms per image
Similarity Calculation: <10ms for 1000 embeddings
Memory Usage: ~2GB with both models loaded

Supported File Types

Text: .txt, .md, .doc, .docx, .pdf
Images: .jpg, .png, .bmp, .gif, .webp
Archives: .zip, .rar, .7z
All Files: SHA256 hash-based exact matching

🎯 Usage Examples

Web Interface

Open http://localhost:3001/index.html
Select folder to scan
Choose detection options (AI text, AI image, exact match)
Click "Start Scan"
Review duplicate groups with similarity scores
Delete or export results

API Usage

// Check AI service health
const health = await fetch('http://127.0.0.1:58748/health');

// Generate text embeddings
const response = await fetch('http://127.0.0.1:58748/embeddings/text', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ texts: ['sample text'] })
});
const { embeddings } = await response.json();

🛠️ Technology Stack

Frontend

Next.js 14 - React framework with app router
TypeScript - Type-safe development
Tailwind CSS - Utility-first styling
Electron - Cross-platform desktop app

Backend

Fastify - High-performance Node.js framework
transformers.js - AI/ML inference in JavaScript
Prisma - Type-safe database ORM
Zod - Runtime type validation

DevOps

TurboRepo - Monorepo build system
pnpm - Fast, efficient package manager
Docker - Containerization support
GitHub Actions - CI/CD pipeline

📋 Roadmap

v1.1 - Enhanced AI

Video similarity detection
Audio fingerprinting
Custom model training
Batch processing optimization

v1.2 - Enterprise Features

Cloud storage integration
Multi-user support
API rate limiting
Advanced reporting

v1.3 - Performance

GPU acceleration
Distributed processing
Caching improvements
Real-time file monitoring

🤝 Contributing

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

Development Guidelines

Follow TypeScript best practices
Write tests for new features
Update documentation
Ensure all packages build successfully

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Xenova - For excellent JavaScript ML model implementations
Hugging Face - For pre-trained transformer models
Vercel - For Next.js and development tools
Electron - For cross-platform desktop capabilities

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Wiki

Made with ❤️ and 🤖 AI

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github		.github
apps		apps
infra		infra
packages		packages
scripts		scripts
services		services
.gitignore		.gitignore
API.md		API.md
APPLICATION_RUNNING.md		APPLICATION_RUNNING.md
BUILD_SUCCESS.md		BUILD_SUCCESS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GITHUB_PUBLISHED.md		GITHUB_PUBLISHED.md
LICENSE		LICENSE
PROFESSIONAL_WEBSITE.md		PROFESSIONAL_WEBSITE.md
README.md		README.md
SECURITY.md		SECURITY.md
SETUP_SUCCESS.md		SETUP_SUCCESS.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.base.json		tsconfig.base.json
turbo.json		turbo.json
vercel.json		vercel.json

License

vibhorjoshi/Ai-file-cleaner

Folders and files

Latest commit

History

Repository files navigation

🤖 AI File Cleanup - Intelligent Duplicate Detection System

🌟 Features

🤖 AI-Powered Detection

🎨 Professional Interface

⚡ Performance & Architecture

Screen recording

🚀 Quick Start

Prerequisites

Installation

🌐 Access the Application

📁 Project Structure

🔧 Development

Build System

Testing

🤖 AI Models

Text Embeddings

Image Embeddings

API Endpoints

📊 Performance

Benchmarks

Supported File Types

🎯 Usage Examples

Web Interface

API Usage

🛠️ Technology Stack

Frontend

Backend

DevOps

📋 Roadmap

v1.1 - Enhanced AI

v1.2 - Enterprise Features

v1.3 - Performance

🤝 Contributing

Development Guidelines

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages