A Multi-Modal Intelligent Search Engine over Decentralized Data
FileNest Framework represents a paradigm shift in information retrieval and content discovery, leveraging cutting-edge artificial intelligence and distributed computing technologies. The platform addresses the growing need for intelligent, scalable, and semantic search capabilities across diverse content types including text documents, research papers, images, and multimedia content.
FileNest combines the ease of intelligent discovery—like Google Drive Search or YouTube recommendations—with the resilience and freedom of P2P systems. It provides an intelligent search engine that runs across a decentralized, peer-based network, making content discovery as simple and powerful as searching Google — but without giving up ownership, privacy, or freedom.
- 🧠 Multi-Modal Intelligence: Advanced AI-powered content understanding for text, images, videos, PDFs and other file formats
- 🌐 Decentralized Architecture: Peer-to-peer network system that creates fault tolerance and reduces central dependency
- 📚 Academic Specialization: Sophisticated research paper analysis capabilities with citation networks and semantic understanding
- ⚡ High Performance: Sub-100ms search responses with 15-20 files/sec indexing throughput
- 🔒 Privacy-First: No central servers, no data mining, no platform censorship
FileNest operates through a hierarchical distributed tagging system that combines intelligent embedding generation with peer-to-peer routing mechanisms:
- Bootstrap Phase: Generate and distribute Depth 1 Tagging Vectors (D1TVs) across the network
- Local Indexing: Generate embeddings for files and route metadata through the hierarchical network
- Continuous Training: Update tagging vectors as new content is added to maintain accuracy
- Query Phase: Process queries through the distributed network with similarity-based pruning
- Core Language: Golang for local-first application logic and AI/ML integration
- Network: go-libp2p-kad-dht for discovery and network routing
- Database: SQLite/PostgreSQL for metadata, vector databases (Milvus/FAISS) for embeddings
- Caching: Redis for high-performance caching and session management
- NLP: Hugging Face Transformers (BERT, RoBERTa, T5)
- Embeddings: Sentence-Transformers for semantic vector representations
- Computer Vision: OpenCV, PIL, and CLIP for image-text cross-modal understanding
- Multimodal: Unified embedding system for image, text, and video content
- CLI: Golang Binary Executables with Cobra CLI
- Web Interface: React.js for user interface development
Disclaimer: This is a tentative first draft that may contain inaccuracies. Content will be updated based on ongoing discussions and actual implementation progress.
FileNest/
├── .github/ # GitHub-specific configurations
│ ├── workflows/ # CI/CD pipeline workflows
│ ├── issue_templates/ # Issue templates for bug reports, feature requests, etc.
│ └── pull_request_template.md # Template for PR descriptions
├── ai/ # AI/ML models and scripts
├── backend/ # Backend implementation (Go)
├── frontend/ # Frontend (CLI & React.js code)
├── network/ # P2P networking code
├── shared/ # Shared utilities
├── docs/ # Documentation files
├── tests/ # Test cases
├── scripts/ # Automation and utility scripts
├── examples/ # Example configurations and use cases
├── .gitignore # Ignore unnecessary files
├── LICENSE # MIT License
├── README.md # This file
└── CONTRIBUTING.md # Contribution guidelines
Disclaimer: This is a tentative first draft that may contain inaccuracies. Content will be updated based on ongoing discussions and actual implementation progress.
- Go 1.21 or higher
- Python 3.9 or higher
- Node.js 18 or higher (for frontend)
- Git
# Clone the repository
git clone https://github.com/AISocietyIITJ/FileNest.git
cd FileNest
# Set up backend dependencies
cd backend
go mod init filenest-backend
go mod tidy
# Set up AI/ML components
cd ../ai
pip install -r requirements.txt
# Set up frontend
cd ../frontend
npm install
# Run the system
./scripts/start.sh
- 🎓 Academic Collaboration: Labs and researchers can share files and papers directly, searchable across institutions without cloud lock-in
- 📰 Censorship-Resistant Discovery: Useful for journalists, archivists, or individuals seeking sensitive or restricted knowledge
- 🔗 Decentralized Knowledge Commons: Like a smarter, distributed version of torrenting — but semantically searchable
- 🏢 Self-Hosted File Search: Enabling secure internal discovery without exposing to external servers
Metric | Current State | Target |
---|---|---|
Indexing Speed | 1 file/second | 15-20 files/second |
Search Latency | 800-1200ms | Under 100ms |
Concurrent Users | Single user | 1000+ users |
Storage Overhead | 85% | Under 20% |
We welcome contributions from developers, researchers, and anyone interested in decentralized technologies! Please read our Contributing Guidelines to get started.
- AI/ML Research Engineers: Work on semantic embeddings and multimodal content analysis
- Backend & Network Developers: Build core backend logic and P2P protocols
- Frontend Developers: Create CLI tools and web interfaces
This project is licensed under the MIT License - see the LICENSE file for details.
This project is part of Summer RAID 2025 (Realm of Artificial Intelligence and Data) at IIT Jodhpur.
Mentors: Lakshya Jain, Aradhya Mahajan, Laksh Mendpara
FileNest Framework - Making decentralized data as discoverable as centralized systems, but with freedom, privacy, and ownership intact.