A production-grade machine learning system for intelligent rental property search and personalized recommendations using advanced NLP and hybrid recommendation algorithms.
- π― Project Overview
- β¨ Key Features
- ποΈ Architecture
- π Getting Started
- π Usage Instructions
- π οΈ Development
- π’ Deployment
- π ML Models & Architecture
- π Documentation
- π§ Maintenance
- π€ Contributing
- π License
The Rental ML System is a sophisticated machine learning platform designed for real-time rental property search and personalized recommendations. Built with clean architecture principles, it combines advanced NLP-powered search with hybrid recommendation engines to deliver exceptional user experiences.
- π Intelligent Search: NLP-powered semantic search using TensorFlow and Transformers
- π― Personalized Recommendations: Hybrid ML system combining collaborative filtering and content-based approaches
- β‘ Real-time Performance: Optimized for low-latency serving with Redis caching
- ποΈ Production-Ready Architecture: Clean architecture with comprehensive testing and monitoring
- π Scalable Infrastructure: Containerized deployment with Kubernetes support
- π Advanced Analytics: Real-time performance monitoring and business intelligence
Core ML/AI:
- TensorFlow 2.13+ for deep learning models
- Transformers for NLP and semantic search
- Scikit-learn for traditional ML algorithms
- NumPy/Pandas for data processing
Backend & API:
- FastAPI for high-performance REST APIs
- SQLAlchemy for database ORM
- PostgreSQL for primary data storage
- Redis for caching and session management
Infrastructure:
- Docker & Docker Compose for containerization
- Kubernetes for orchestration
- Nginx for reverse proxy and load balancing
- Prometheus for monitoring and metrics
Development:
- Pytest for comprehensive testing
- Black/Flake8 for code formatting and linting
- MyPy for static type checking
- Pre-commit hooks for code quality
- Semantic Search: TensorFlow-based text embedding and ranking
- Multi-criteria Filtering: Price, location, amenities, property type
- Real-time Results: Sub-200ms query response times
- Relevance Ranking: ML-powered result ordering
- Collaborative Filtering: Neural collaborative filtering with TensorFlow
- Content-Based Filtering: Property feature similarity matching
- Cold Start Handling: Effective recommendations for new users
- Explainable AI: Detailed recommendation explanations
- Performance Metrics: Real-time model accuracy and response times
- Business Intelligence: Market trends and user behavior analysis
- System Health: Comprehensive monitoring and alerting
- A/B Testing: Built-in framework for model experimentation
- Web Scraping: Ethical property data collection
- Data Quality: Automated validation and cleaning
- Feature Engineering: Advanced feature extraction and encoding
- Model Training: Automated ML pipeline with evaluation
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PRESENTATION LAYER β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββββββββββββββ β
β β FastAPI β β Streamlit β β Jupyter β β
β β REST API β β Demo UI β β Notebooks β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β APPLICATION LAYER β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββββββββββββββ β
β β Use Cases β β DTOs β β API β β
β β - SearchProps β β - SearchDTO β β - search_endpoints.py β β
β β - GetRecommend β β - RecommendDTO β β - recommendation_endpoints β β
β β - TrackInteract β β - UserDTO β β - user_endpoints.py β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DOMAIN LAYER β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββββββββββββββ β
β β Entities β β Repositories β β Services β β
β β - Property β β - PropertyRepo β β - SearchService β β
β β - User β β - UserRepo β β - RecommendationService β β
β β - SearchQuery β β - ModelRepo β β - UserService β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INFRASTRUCTURE LAYER β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββββββββββββββ β
β β Data Access β β ML Models β β Monitoring β β
β β - PostgreSQL β β - Search Ranker β β - Logging β β
β β - Redis Cache β β - Collaborative β β - Metrics β β
β β - Web Scrapers β β - Content-Based β β - Health Checks β β
β β - File Storage β β - Hybrid System β β - Error Tracking β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Core Services:
- API Gateway: FastAPI-based REST API with automatic documentation
- Search Service: NLP-powered property search and ranking
- Recommendation Service: Hybrid ML recommendation engine
- User Service: User management and preference tracking
Data Layer:
- PostgreSQL: Primary database for structured data
- Redis: High-performance caching and session storage
- Feature Store: ML feature management and serving
ML Infrastructure:
- Model Training: Automated training pipelines with evaluation
- Model Serving: Real-time model inference and prediction
- Model Registry: Version control and model lifecycle management
- Python: 3.9 or higher
- Docker: 20.0+ (for containerized deployment)
- PostgreSQL: 12+ (if running locally)
- Redis: 6+ (if running locally)
git clone https://github.com/rental-ml-system/rental-ml-system.git
cd rental-ml-system
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements/base.txt
# Start PostgreSQL and Redis with Docker
docker-compose up -d postgres redis
# Run database migrations
python scripts/init_database.py
python migrations/run_migrations.py
# Quick start with demo script
./demo-quick-start.sh
# Or manually start Streamlit demo
streamlit run src/presentation/demo/app.py
# Start FastAPI server
uvicorn src.application.api.main:app --reload --port 8000
# API documentation available at: http://localhost:8000/docs
# Start full system with Docker Compose
docker-compose up -d
# Access services:
# - API: http://localhost:8000
# - Demo UI: http://localhost:8501
# - Monitoring: http://localhost:9090
The Streamlit demo provides a comprehensive showcase of system capabilities:
# Launch interactive demo
./demo-quick-start.sh --port 8501
# Or with custom configuration
streamlit run src/presentation/demo/app.py \
--server.port 8502 \
--server.address 0.0.0.0
Demo Features:
- π Property Search: Advanced filtering and search capabilities
- π― Recommendations: Personalized property recommendations
- π€ User Preferences: User profile and preference management
- π Analytics Dashboard: Market insights and performance metrics
- β‘ ML Monitoring: Real-time model performance tracking
curl -X POST "http://localhost:8000/api/v1/search" \
-H "Content-Type: application/json" \
-d '{
"query": "downtown apartment with parking",
"filters": {
"price_min": 2000,
"price_max": 4000,
"bedrooms": 2,
"amenities": ["parking", "gym"]
},
"limit": 10
}'
curl -X POST "http://localhost:8000/api/v1/recommendations" \
-H "Content-Type: application/json" \
-d '{
"user_id": 123,
"num_recommendations": 5,
"include_explanations": true
}'
curl "http://localhost:8000/health"
# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_NAME=rental_ml
DB_USERNAME=postgres
DB_PASSWORD=your_password
# Redis Configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=your_redis_password
# ML Configuration
ML_MODEL_PATH=/app/models
ML_BATCH_SIZE=32
ML_CACHE_TTL=3600
# Application Configuration
LOG_LEVEL=INFO
API_HOST=0.0.0.0
API_PORT=8000
# config/settings.py
class FeatureFlags:
ENABLE_RECOMMENDATIONS = True
ENABLE_SCRAPING = True
ENABLE_ML_TRAINING = False # Disable in production
ENABLE_ANALYTICS = True
# Install development dependencies
pip install -r requirements/dev.txt
# Install pre-commit hooks
pre-commit install
# Run development server with hot reload
uvicorn src.application.api.main:app --reload --port 8000
# Format code
black src/ tests/
isort src/ tests/
# Lint code
flake8 src/ tests/
mypy src/
# Run all quality checks
pre-commit run --all-files
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test categories
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
pytest -m ml # ML model tests only
# Run performance tests
pytest tests/performance/ -v
rental-ml-system/
βββ src/ # Source code
β βββ application/ # Application layer
β β βββ api/ # REST API endpoints
β β βββ dto/ # Data transfer objects
β β βββ use_cases/ # Business use cases
β βββ domain/ # Domain layer
β β βββ entities/ # Core business entities
β β βββ repositories/ # Repository interfaces
β β βββ services/ # Domain services
β βββ infrastructure/ # Infrastructure layer
β β βββ data/ # Data access and repositories
β β βββ ml/ # ML models and training
β β βββ monitoring/ # Monitoring and logging
β β βββ scrapers/ # Web scraping infrastructure
β βββ presentation/ # Presentation layer
β βββ demo/ # Streamlit demo application
β βββ web/ # Web interface
βββ tests/ # Test suite
β βββ unit/ # Unit tests
β βββ integration/ # Integration tests
β βββ performance/ # Performance tests
βββ deployment/ # Deployment configurations
β βββ docker/ # Docker configurations
β βββ kubernetes/ # Kubernetes manifests
β βββ terraform/ # Infrastructure as code
βββ docs/ # Documentation
βββ examples/ # Usage examples
βββ notebooks/ # Jupyter notebooks
βββ scripts/ # Utility scripts
- Create Domain Entity (if needed):
# src/domain/entities/new_entity.py
from dataclasses import dataclass
from typing import Optional
@dataclass
class NewEntity:
id: Optional[int]
name: str
# Add fields
- Implement Repository Interface:
# src/domain/repositories/new_repository.py
from abc import ABC, abstractmethod
class NewRepository(ABC):
@abstractmethod
async def find_by_id(self, entity_id: int) -> Optional[NewEntity]:
pass
- Create Implementation:
# src/infrastructure/data/repositories/postgres_new_repository.py
class PostgresNewRepository(NewRepository):
async def find_by_id(self, entity_id: int) -> Optional[NewEntity]:
# Implementation
pass
- Add API Endpoint:
# src/application/api/routers/new_router.py
from fastapi import APIRouter
router = APIRouter(prefix="/api/v1/new", tags=["new"])
@router.get("/{entity_id}")
async def get_entity(entity_id: int):
# Implementation
pass
- Write Tests:
# tests/unit/test_domain/test_new_entity.py
def test_new_entity_creation():
entity = NewEntity(id=1, name="test")
assert entity.name == "test"
# Start development environment
docker-compose -f docker-compose.dev.yml up -d
# View logs
docker-compose logs -f app
# Build and start production environment
docker-compose up -d --build
# Scale services
docker-compose up -d --scale app=3 --scale worker=2
# Start minikube
minikube start
# Apply manifests
kubectl apply -f k8s/
# Check deployment status
kubectl get pods -n rental-ml
# Access services
minikube service rental-ml-app -n rental-ml
# Apply production configuration
kubectl apply -f deployment/kubernetes/
# Monitor deployment
kubectl rollout status deployment/rental-ml-app -n rental-ml
# View logs
kubectl logs -f deployment/rental-ml-app -n rental-ml
# Install with Helm
helm install rental-ml k8s/helm/rental-ml/ \
--namespace rental-ml \
--create-namespace \
--values k8s/helm/rental-ml/values-prod.yaml
# Upgrade deployment
helm upgrade rental-ml k8s/helm/rental-ml/ \
--values k8s/helm/rental-ml/values-prod.yaml
# Production deployment requires these variables
DB_PASSWORD=your_secure_db_password
REDIS_PASSWORD=your_secure_redis_password
SECRET_KEY=your_secret_key_here
JWT_SECRET_KEY=your_jwt_secret_here
# Monitoring
SENTRY_DSN=https://your-sentry-dsn
PROMETHEUS_ENABLED=true
# Scaling
DB_POOL_SIZE=20
REDIS_MAX_CONNECTIONS=50
ML_BATCH_SIZE=64
# Features
ENABLE_SCRAPING=true
ENABLE_ML_TRAINING=false
The NLP-powered search system uses TensorFlow and Transformers:
from transformers import AutoTokenizer, TFAutoModel
import tensorflow as tf
class NLPSearchRanker:
def __init__(self, model_name="sentence-transformers/all-MiniLM-L6-v2"):
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.encoder = TFAutoModel.from_pretrained(model_name)
self.ranking_model = self._build_ranking_model()
def rank_properties(self, query: str, properties: List[Dict]) -> List[Tuple[Dict, float]]:
# Generate embeddings and rank properties
pass
Features:
- Semantic text understanding using pre-trained transformers
- Deep neural ranking with TensorFlow
- Real-time inference optimized for low latency
- Custom training on rental property domain data
Combines collaborative filtering and content-based approaches:
class HybridRecommendationSystem:
def __init__(self, cf_weight=0.6, cb_weight=0.4):
self.cf_weight = cf_weight
self.cb_weight = cb_weight
self.cf_model = CollaborativeFilteringModel()
self.cb_model = ContentBasedModel()
def recommend(self, user_id: int, num_recommendations: int = 10) -> List[Recommendation]:
# Generate hybrid recommendations
pass
Components:
- Collaborative Filtering: Neural collaborative filtering with embeddings
- Content-Based: Property feature similarity matching
- Hybrid Fusion: Weighted combination with dynamic adjustment
- Cold Start: Effective handling of new users and properties
# Train models locally
python src/infrastructure/ml/training/ml_trainer.py \
--model-type hybrid \
--epochs 100 \
--batch-size 256
# Evaluate models
python src/infrastructure/ml/training/model_evaluator.py \
--model-path models/hybrid_model.h5 \
--test-data data/test_interactions.csv
Training Features:
- Automated data preprocessing and feature engineering
- Hyperparameter optimization with Optuna
- Cross-validation and performance evaluation
- Model versioning and experiment tracking
Search Metrics:
- NDCG@10: Normalized Discounted Cumulative Gain at 10
- MAP@10: Mean Average Precision at 10
- Response Time: 95th percentile < 200ms
Recommendation Metrics:
- Precision@K: Relevant items in top-K recommendations
- Recall@K: Coverage of relevant items
- Diversity: Intra-list diversity score
- Coverage: Catalog coverage percentage
- Interactive API Docs: http://localhost:8000/docs (Swagger UI)
- ReDoc: http://localhost:8000/redoc
- OpenAPI Spec: http://localhost:8000/openapi.json
Code Examples:
Jupyter Notebooks:
# Start Jupyter server
jupyter lab notebooks/
# Available notebooks:
# - 01_data_exploration.ipynb
# - 02_nlp_search_development.ipynb
# - 03_recommendation_model_training.ipynb
# - 04_system_evaluation.ipynb
# - 05_deployment_analysis.ipynb
Installation Issues:
# Python version compatibility
python --version # Should be 3.9+
# Clear pip cache
pip cache purge
# Reinstall dependencies
pip install -r requirements/base.txt --force-reinstall
Database Connection:
# Test database connection
python scripts/test_database_connection.py
# Reset database
python scripts/init_database.py --reset
Model Loading Errors:
# Check model files
ls -la models/
# Retrain models
python src/infrastructure/ml/training/ml_trainer.py --retrain
Performance Issues:
# Monitor system resources
docker stats
# Check Redis connection
redis-cli ping
# Monitor API performance
curl http://localhost:8000/health
# Access Prometheus dashboard
http://localhost:9090
# Key metrics to monitor:
# - ml_model_prediction_latency
# - api_request_duration_seconds
# - database_query_duration_seconds
# - cache_hit_ratio
# View application logs
docker-compose logs -f app
# View specific service logs
kubectl logs -f deployment/rental-ml-app -n rental-ml
# Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
# API health check
curl http://localhost:8000/health
# Database health
curl http://localhost:8000/health/database
# ML models health
curl http://localhost:8000/health/models
# Automated backup script
python scripts/backup_database.py \
--output-dir backups/ \
--compress
# Restore from backup
python scripts/restore_database.py \
--backup-file backups/rental_ml_2024_01_15.sql.gz
# Backup trained models
tar -czf models_backup_$(date +%Y%m%d).tar.gz models/
# Cloud backup (AWS S3)
aws s3 sync models/ s3://your-bucket/models/
-- Key database indexes
CREATE INDEX CONCURRENTLY idx_properties_location ON properties(city, neighborhood);
CREATE INDEX CONCURRENTLY idx_properties_price ON properties(price);
CREATE INDEX CONCURRENTLY idx_user_interactions_user_id ON user_interactions(user_id);
# Monitor Redis performance
redis-cli --latency-history
# Optimize Redis configuration
# - maxmemory-policy: allkeys-lru
# - save: disabled for cache-only usage
# - tcp-keepalive: 60
# Model quantization for faster inference
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model('models/hybrid_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
-
Database Security
- Use strong passwords (min 16 characters)
- Enable SSL/TLS connections
- Restrict database access by IP
- Regular security updates
-
API Security
- Enable HTTPS only
- Implement rate limiting
- Use JWT for authentication
- Validate all inputs
-
Container Security
- Use non-root containers
- Scan images for vulnerabilities
- Keep base images updated
- Use secrets management
-
Infrastructure Security
- Network policies in Kubernetes
- Regular security patches
- Monitor for intrusions
- Backup encryption
# Update dependencies for security patches
pip install --upgrade -r requirements/base.txt
# Scan for vulnerabilities
pip-audit
# Update Docker base images
docker-compose build --no-cache
- Fork and Clone
git clone https://github.com/your-username/rental-ml-system.git
cd rental-ml-system
- Create Feature Branch
git checkout -b feature/your-feature-name
- Make Changes
# Install development dependencies
pip install -r requirements/dev.txt
# Make your changes
# Add tests for new functionality
# Update documentation
- Test Changes
# Run full test suite
pytest
# Check code quality
pre-commit run --all-files
# Test in Docker environment
docker-compose -f docker-compose.dev.yml up -d
- Submit Pull Request
git add .
git commit -m "feat: add your feature description"
git push origin feature/your-feature-name
# Create pull request on GitHub
Python Style:
- Follow PEP 8 guidelines
- Use type hints for all function signatures
- Maximum line length: 88 characters
- Use descriptive variable and function names
Testing Requirements:
- Minimum 80% code coverage
- Unit tests for all new functions
- Integration tests for API endpoints
- Performance tests for ML models
Documentation:
- Docstrings for all public functions
- Update README for new features
- Include usage examples
- Update API documentation
Bug Reports:
## Bug Description
Brief description of the issue
## Steps to Reproduce
1. Step one
2. Step two
3. Step three
## Expected Behavior
What should happen
## Actual Behavior
What actually happens
## Environment
- Python version:
- OS:
- Docker version:
Feature Requests:
## Feature Description
Description of the requested feature
## Use Case
Why this feature would be useful
## Proposed Implementation
How you think it could be implemented
## Alternatives
Other ways to achieve the same goal
This project is licensed under the MIT License - see the LICENSE file for details.
- TensorFlow Team: For the excellent deep learning framework
- FastAPI Team: For the high-performance web framework
- Streamlit Team: For the intuitive demo application framework
- Open Source Community: For the many libraries that make this project possible
# Start development environment
./demo-quick-start.sh
# Run full system with Docker
docker-compose up -d
# Deploy to Kubernetes
kubectl apply -f k8s/
# Run tests
pytest --cov=src
# Start API server
uvicorn src.application.api.main:app --reload
# Launch demo
streamlit run src/presentation/demo/app.py
# Train ML models
python src/infrastructure/ml/training/ml_trainer.py
# Check system health
curl http://localhost:8000/health
Ready to build the future of rental property search? Get started with the demo and explore the power of AI-driven property matching! π β¨