Skip to content

dimagi/connect-gis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building Density and Clustering API

This is a Flask-based web application that provides APIs for fetching building data, performing clustering (K-means and grid-based), and generating reports based on geospatial data. The application integrates with Google Earth Engine (GEE) and a PostgreSQL database with PostGIS extension to process building data within specified polygons or around pin locations. It supports clustering buildings using balanced K-means or grid-based clustering and generates CSV reports summarizing ward-level visit data.

Features

  • Building Data Retrieval: Fetches building data within a polygon from either Google Earth Engine or a PostgreSQL database, with optional filters for minimum area and confidence.
  • Clustering:
    • K-means Clustering: Performs balanced K-means clustering based on a specified number of clusters or buildings per cluster.
    • Grid-based Clustering: Generates a grid over a polygon, assigns buildings to grid cells, and clusters grids to balance building counts.
  • Reporting: Generates CSV reports summarizing ward-level visit data, with options to include building-to-visit distance metrics.
  • Geospatial Support: Uses PostGIS for spatial queries and GeoPandas for handling geospatial data.
  • CORS Support: Allows cross-origin requests for frontend integration.

Prerequisites

  • Python: Version 3.8 or higher.
  • PostgreSQL: With PostGIS extension enabled for spatial queries.
  • Google Earth Engine Account: For accessing building data via GEE.
  • Dependencies: Listed in requirements.txt.

Installation

  1. Clone the Repository

    git clone https://github.com/Thushar12E45/dimagi-map-project.git
    cd dimagi-map-project
  2. Set Up a Virtual Environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install Dependencies

    pip install -r requirements.txt
  4. Set Up Environment Variables Create a .env file in the project root with the following variables:

    GEE_CREDS=<your-google-earth-engine-credentials-json>
    GEE_PROJECT_NAME=<your-google-earth-engine-project-name>
    DB_USER=<your-postgres-username>
    DB_PASSWORD=<your-postgres-password>
    DB_HOST=<your-postgres-host>
    DB_PORT=<your-postgres-port>
    DB_NAME=<your-postgres-database-name>
    HOST_URL=<your-host-url>  # Optional, defaults to https://connectgis.dimagi.com
    
    • GEE_CREDS: JSON string of Google Earth Engine service account credentials.
    • Database credentials for PostgreSQL connection.
  5. Database Setup

    • Ensure the PostgreSQL database is running and has the PostGIS extension enabled.
    • The buildings table should contain building data with spatial geometry.
  6. Run the Application

    python app.py

    The app runs on http://0.0.0.0:5000 in debug mode by default.

Google Earth Engine (GEE) Setup Guide

Prerequisites

Setup Steps

1. Create a Google Cloud Project

  1. Go to Google Cloud Console
  2. Click "Create Project"
  3. Enter project name and details
  4. Click "Create"

2. Register Project for Earth Engine

  1. After project creation, register it for:
    • Commercial use (if applicable), or
    • Non-commercial use (for research/academic purposes)
  2. Wait for approval (typically 1-2 business days)

3. Enable Earth Engine API

  1. Search / Navigate to: APIs & ServicesLibrary
  2. Search for "Google Earth Engine API"
  3. Click "Enable"

4. Local Development Setup

# Install Earth Engine Python API
pip install earthengine-api

# Authenticate (will open browser)
earthengine authenticate

4. Configuration

Add your GEE project name to the .env file

GEE_PROJECT_NAME=<your-google-earth-engine-project-name>

Important Notes

  • Local Development: Authentication via earthengine authenticate is sufficient for local use (no credentials needed in .env)
  • Production / Other Environments: You must add the appropriate GEE credentials to your .env file:
GEE_CREDS=<your-service-account-credentials-json>

Building Data Insertion Guide

Overview

The building data insertion system downloads building footprint data from Overture Maps and loads it into a PostgreSQL database for use by the clustering application. This is a one-time setup process required before using the main application.

What it Does

  • Downloads building footprint data for an entire country from Overture Maps
  • Processes the data in manageable tiles to handle large datasets
  • Calculates building metrics (area, centroids, confidence scores)
  • Stores data in PostGIS-enabled PostgreSQL database
  • Creates spatial indexes for optimal query performance
  • Supports parallel processing for faster data loading

Configuration for Different Countries

1. Update Bounding Box Coordinates

Edit SQL_SCRIPTS/db_insertion_buildings.py to set the target country's bounding box:

# Example for Kenya
left, bottom = 33.9098, -4.6796  # SW corner
right, top = 41.9058, 5.5059     # NE corner

2. Update Configuration Variables

# Change base filename to match your country
base_filename = "kenya_buildings"

Running the Data Insertion

1. Single Instance (Slower)

cd SQL_SCRIPTS
python db_insertion_buildings.py --instance-id 0 --total-instances 1

2. Parallel Processing (Recommended)

Run multiple instances simultaneously for faster processing:

# Example with 4 parallel instances
python db_insertion_buildings.py --instance-id 0 --total-instances 4 &
python db_insertion_buildings.py --instance-id 1 --total-instances 4 &
python db_insertion_buildings.py --instance-id 2 --total-instances 4 &
python db_insertion_buildings.py --instance-id 3 --total-instances 4

Recovery from Failures

The script supports resuming from interruptions:

  1. Failed tiles are logged in failed_tiles.csv
  2. Restart the script with same parameters - it will skip completed tiles
  3. For persistent failures, check the error messages in the failed tiles log

API Endpoints

1. Home (/)

  • Method: GET
  • Description: Renders the index.html template with the configured HOST_URL.
  • Response: HTML page.

2. Get Building Density (/get_building_density)

  • Method: POST
  • Description: Fetches building data within a polygon or around a pin and performs clustering.
  • Request Body:
    {
      "clusteringType": "kMeans|balancedKMeans|bottomUp",
      "noOfClusters": <int>,  // Number of clusters (default: 3)
      "noOfBuildings": <int>,  // Target buildings per cluster (default: 250)
      "buildingsAreaInMeters": <float>,  // Minimum building area (default: 0)
      "buildingsConfidence": <int>,  // Minimum confidence (0-100, default: 0)
      "thresholdVal": <int>,  // Tolerance percentage (default: 10)
      "fetchClusters": <boolean>,  // Whether to perform clustering (default: false)
      "dbType": "GEE|DB",  // Data source (Google Earth Engine or Database)
      "polygon": [[lng, lat], ...],  // Polygon coordinates (for kMeans/balancedKMeans)
      "pin": [lng, lat]  // Pin coordinates (for bottomUp)
    }
  • Response: JSON with building count, GeoJSON features, and optional cluster data.
  • Example Response:
    {
      "building_count": 100,
      "buildings": {"type": "FeatureCollection", "features": [...]},
      "clusters": [{"coordinates": [lng, lat], "cluster": <int>, "numOfBuildings": <int>}, ...]
    }

3. Get Building Density V2 (/get_building_density_v2)

  • Method: POST
  • Description: Fetches buildings from the database, generates a grid, assigns buildings to grid cells, and performs grid-based clustering.
  • Request Body:
    {
      "polygon": [[lng, lat], ...],  // Polygon coordinates
      "noOfClusters": <int>,  // Number of clusters (default: 3)
      "thresholdVal": <int>,  // Tolerance percentage (default: 10)
      "gridLength": <int>,  // Grid size in meters (default: 50)
      "buildingsAreaInMeters": <float>,  // Minimum building area (default: 0)
      "buildingsConfidence": <int>  // Minimum confidence (0-100, default: 0)
    }
  • Response: JSON with building count, GeoJSON features, grid GeoJSON, and cluster data.
  • Example Response:
    {
      "building_count": 100,
      "buildings": {"type": "FeatureCollection", "features": [...]},
      "grids": {"type": "FeatureCollection", "features": [...]},
      "clusters": [{"coordinates": [lng, lat], "cluster": <int>, "grid_index": <int>}, ...]
    }

4. Generate Report (/generate_report)

  • Method: POST
  • Description: Generates a CSV report summarizing ward-level visit data.
  • Request Body:
    {
      "data": [{"latitude": <float>, "longitude": <float>, "flw_id": <int>}, ...],
      "fetchVisitToBuildingsVal": <boolean>  // Include building-to-visit distance metrics (default: true)
    }
  • Response: CSV file (Ward_summary_report.csv) with ward visit summary data.
  • Example CSV Headers (with fetchVisitToBuildingsVal=true):
    state_name,lga_name,ward_name,population,total.visits,total.buildings,num.phc.serve.ward,median.visit.to.phc,max.visit.to.phc,median.building.to.phc,max.buildings.to.phc,unique.flws,coverage,percent.building.100.plus.to.visit,percent.building.200.plus.to.visit,percent.building.500.plus.to.visit,percent.building.10000.plus.to.visit
    

Project Structure

├── app.py              # Main Flask application
├── .env                # Environment variables (not tracked)
├── requirements.txt    # Python dependencies
├── templates/
│   └── index.html      # Frontend template

Notes

  • Google Earth Engine: Ensure valid GEE credentials are provided in .env for the /get_building_density endpoint with dbType=GEE.
  • Performance: For large polygons, reduce the number of buildings or grid cells to avoid GEE’s 5000-element limit.
  • Security: Sanitize inputs to prevent SQL injection (handled via parameterized queries in the code).
  • CORS: Configured to allow all origins (*). Adjust in production for security.

Troubleshooting

  • GEE Initialization Error: Verify GEE_CREDS in .env and ensure the credentials file is correctly formatted.
  • Database Connection Error: Check PostgreSQL credentials and ensure the database is accessible.
  • No Buildings Found: Ensure the polygon or pin coordinates are valid and contain buildings in the database or GEE dataset.
  • Clustering Issues: Adjust thresholdVal or reduce noOfClusters/noOfBuildings if clustering fails due to insufficient data.

Deployment

The Connect GIS app is hosted on AWS by running a dockerized version of the app on the EC2 instance. The database is hosted as an AWS RDS service.

Deploying changes

In order to deploy new change you need access to the production server on AWS (the details and credentials is in 1Password - search for "ConnectGIS").

Deploying new changes requires the following steps:

# Navigate to folder
cd projects/dimagi-map-project/

# Pull latest changes
git pull

# Build the docker image
docker build -t map-clustering .

# Stop the existing running container
docker stop map-clustering

# Remove the named instance
docker container remove map-clustering

# Run the new image with the port binding
docker run -d -p 5010:5000 --name map-clustering map-clustering

Updating environment variables

The production environment file also lives in 1Password, but you'll also need to update the .env on the server (~/projects/connect-gis/.env).

About

A simple tool for visualizing cases and generating reports

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •