Advanced AI-powered file deduplication system using machine learning embeddings for intelligent similarity detection across text, images, and binary files.
- Text Similarity - Uses Xenova/all-MiniLM-L6-v2 for semantic text analysis
- Image Similarity - Leverages Xenova/CLIP for visual content matching
- Exact Matching - SHA256 hash-based duplicate detection
- Real-time Processing - Live embedding generation and similarity scoring
- Modern Web App - Beautiful gradient design with glass-morphism effects
- Desktop Application - Cross-platform Electron app with native file access
- Responsive Design - Works seamlessly on desktop, tablet, and mobile
- Real-time Statistics - Live progress tracking and AI model status
- Monorepo Structure - Organized with pnpm + TurboRepo for efficient development
- Microservices Design - Separate AI worker, API service, and frontend applications
- TypeScript - Full type safety across all packages
- Modern Stack - Next.js, Fastify, Prisma, and transformers.js
Screen-Recording.mp4
- Node.js 18+
- pnpm (recommended) or npm
- Git
# Clone the repository
git clone https://github.com/vibhorjoshi/ai-file-cleaner.git
cd ai-file-cleaner
# Install dependencies
pnpm install
# Build all packages
pnpm run build
# Start development servers
pnpm run dev- Web Interface: http://localhost:3001
- Professional HTML Version: http://localhost:3001/index.html
- AI Model Worker: http://127.0.0.1:58748
- Desktop App: Run
pnpm run dev --filter=@ai-file-cleanup/desktop
ai-file-cleanup/
βββ apps/
β βββ web/ # Next.js web application
β β βββ src/
β β βββ public/ # HTML/CSS/JS version
β βββ desktop/ # Electron desktop app
βββ services/
β βββ api/ # Fastify API service
β βββ model-worker/ # AI inference service
βββ packages/
β βββ ui/ # Shared React components
β βββ core/ # Shared TypeScript types
β βββ db/ # Prisma database schema
β βββ types/ # Type definitions
βββ infra/
βββ docker/ # Docker configurations
# Build all packages
pnpm run build
# Build specific package
pnpm run build --filter=@ai-file-cleanup/web
# Development mode (all services)
pnpm run dev
# Development mode (specific service)
pnpm run dev --filter=@ai-file-cleanup/model-worker# Run all tests
pnpm test
# Type checking
pnpm run type-check
# Linting
pnpm run lint- Model:
Xenova/all-MiniLM-L6-v2 - Dimensions: 384
- Use Case: Document similarity, text deduplication
- Model:
Xenova/clip-vit-base-patch32 - Dimensions: 512
- Use Case: Visual similarity, image deduplication
// Text similarity
POST /embeddings/text
Content-Type: application/json
{
"texts": ["document content 1", "document content 2"]
}
// Image similarity
POST /embeddings/images
Content-Type: multipart/form-data- Text Processing: ~100ms per document
- Image Processing: ~200ms per image
- Similarity Calculation: <10ms for 1000 embeddings
- Memory Usage: ~2GB with both models loaded
- Text:
.txt,.md,.doc,.docx,.pdf - Images:
.jpg,.png,.bmp,.gif,.webp - Archives:
.zip,.rar,.7z - All Files: SHA256 hash-based exact matching
- Open http://localhost:3001/index.html
- Select folder to scan
- Choose detection options (AI text, AI image, exact match)
- Click "Start Scan"
- Review duplicate groups with similarity scores
- Delete or export results
// Check AI service health
const health = await fetch('http://127.0.0.1:58748/health');
// Generate text embeddings
const response = await fetch('http://127.0.0.1:58748/embeddings/text', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ texts: ['sample text'] })
});
const { embeddings } = await response.json();- Next.js 14 - React framework with app router
- TypeScript - Type-safe development
- Tailwind CSS - Utility-first styling
- Electron - Cross-platform desktop app
- Fastify - High-performance Node.js framework
- transformers.js - AI/ML inference in JavaScript
- Prisma - Type-safe database ORM
- Zod - Runtime type validation
- TurboRepo - Monorepo build system
- pnpm - Fast, efficient package manager
- Docker - Containerization support
- GitHub Actions - CI/CD pipeline
- Video similarity detection
- Audio fingerprinting
- Custom model training
- Batch processing optimization
- Cloud storage integration
- Multi-user support
- API rate limiting
- Advanced reporting
- GPU acceleration
- Distributed processing
- Caching improvements
- Real-time file monitoring
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
- Follow TypeScript best practices
- Write tests for new features
- Update documentation
- Ensure all packages build successfully
This project is licensed under the MIT License - see the LICENSE file for details.
- Xenova - For excellent JavaScript ML model implementations
- Hugging Face - For pre-trained transformer models
- Vercel - For Next.js and development tools
- Electron - For cross-platform desktop capabilities
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
Made with β€οΈ and π€ AI