Skip to content

A real estate scraper for Georgian property websites (SS.ge and MyHome.ge) with Google Sheets integration and intelligent monitoring.

License

Notifications You must be signed in to change notification settings

nmbrthirteen/homeus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

๐Ÿ  Homeus - Georgian Real Estate Scraper

A production-ready real estate scraper for Georgian property websites (SS.ge and MyHome.ge) with Google Sheets integration and intelligent monitoring.

โœจ Features

  • Multi-site scraping: SS.ge and MyHome.ge support
  • Smart duplicate detection: Hash-based property comparison
  • Google Sheets integration: Automatic export of new properties
  • Precise data extraction: Square meters, rooms, prices with currency detection
  • Intelligent scheduling: Configurable scraping intervals
  • Production ready: Docker support, logging, error handling
  • Respectful scraping: Rate limiting and proper headers

๐Ÿš€ Quick Start

Option 1: Automated Setup (Recommended)

git clone https://github.com/nmbrthirteen/homeus.git
cd homeus

# Run setup script
python setup.py

Option 2: Manual Setup

git clone https://github.com/nmbrthirteen/homeus.git
cd homeus

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Configuration

# Copy example config
cp config/config.example.yaml config/config.yaml

# Edit configuration
nano config/config.yaml

3. Run

# Single scraping cycle
python src/main.py --once

# Continuous monitoring (every 5 minutes)
python src/main.py

๐Ÿ“‹ Configuration

The config/config.yaml file controls all aspects of the scraper:

Basic Settings

scraping:
  interval_minutes: 5 # How often to scrape
  max_pages: 10 # Maximum pages per search
  delay_between_requests: 2 # Seconds between requests

Website Configuration

websites:
  ss:
    enabled: true
    search_urls:
      - url: "https://home.ss.ge/ka/udzravi-qoneba/iyideba-bina?price_to=90000"
        name: "Apartments Under 90k USD"

Google Sheets Integration

google_sheets:
  enabled: true
  sheet_id: "your-google-sheet-id"
  service_account_file: "config/google_credentials.json"

๐Ÿ”ง Google Sheets Setup

  1. Create a Google Sheet and note the Sheet ID from the URL
  2. Enable Google Sheets API in Google Cloud Console
  3. Create Service Account and download credentials JSON
  4. Share your sheet with the service account email
  5. Place credentials in config/google_credentials.json

Detailed setup guide: GOOGLE_SHEETS_SETUP.md

๐Ÿณ Docker Deployment

# Build and run
docker-compose up -d

# View logs
docker-compose logs -f homeus

# Stop
docker-compose down

๐Ÿ“Š Data Structure

Properties are extracted with the following fields:

Field Description Example
property_id Unique identifier ss_12345678
title Property title แƒ˜แƒงแƒ˜แƒ“แƒ”แƒ‘แƒ 3 แƒแƒ—แƒแƒฎแƒ˜แƒแƒœแƒ˜ แƒ‘แƒ˜แƒœแƒ
price Price in specified currency 75000
currency Currency code USD
location Property location แƒ“แƒ˜แƒ“แƒ˜ แƒ“แƒ˜แƒฆแƒแƒ›แƒ˜
size Size in square meters 61.5
rooms Number of rooms 3
property_type Type of property apartment

๐Ÿ” Supported Websites

SS.ge

  • โœ… Property listings
  • โœ… Detailed property pages
  • โœ… Price extraction (USD/GEL)
  • โœ… Size extraction (mยฒ)
  • โœ… Room count
  • โœ… Images

MyHome.ge

  • โœ… Property listings
  • โœ… Basic property data
  • โš ๏ธ Limited detail extraction (encoding issues)

๐Ÿ“ˆ Monitoring

The scraper provides comprehensive logging:

# View logs
tail -f data/logs/homeus.log

# Database statistics
sqlite3 data/homeus.db "SELECT COUNT(*) as total_properties FROM properties;"

๐Ÿ› ๏ธ Development

Project Structure

homeus/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ models/          # Data models
โ”‚   โ”œโ”€โ”€ scraper/         # Website scrapers
โ”‚   โ”œโ”€โ”€ storage/         # Database & Sheets
โ”‚   โ”œโ”€โ”€ utils/           # Utilities
โ”‚   โ””โ”€โ”€ main.py          # Entry point
โ”œโ”€โ”€ config/
โ”‚   โ”œโ”€โ”€ config.example.yaml
โ”‚   โ””โ”€โ”€ google_credentials.json
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ homeus.db        # SQLite database
โ”‚   โ””โ”€โ”€ logs/            # Log files
โ””โ”€โ”€ docker-compose.yml

Adding New Websites

  1. Create scraper class inheriting from BaseScraper
  2. Implement scrape_listings() and scrape_property_details()
  3. Add website configuration to config.yaml
  4. Register scraper in main.py

Running Tests

# Test individual scrapers
python -c "from src.scraper.ss_scraper import SSScraper; print('SS scraper imported successfully')"

# Test database
python -c "from src.storage.database import Database; db = Database('test.db'); print('Database working')"

๐Ÿšจ Rate Limiting & Ethics

This scraper is designed to be respectful:

  • 2-second delays between requests
  • Proper User-Agent headers
  • Error handling to avoid overwhelming servers
  • Configurable limits on pages and requests

Please use responsibly and respect the websites' terms of service.

๐Ÿ“ License

MIT License - see LICENSE file for details.

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

๐Ÿ“ž Support

๐ŸŽฏ Roadmap

  • Additional Georgian real estate websites
  • Price change notifications
  • Web dashboard
  • Property image downloading
  • Advanced filtering options
  • Telegram bot integration
  • Email notifications

Made with โค๏ธ for the Georgian real estate market

About

A real estate scraper for Georgian property websites (SS.ge and MyHome.ge) with Google Sheets integration and intelligent monitoring.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published