Building Density and Clustering API

This is a Flask-based web application that provides APIs for fetching building data, performing clustering (K-means and grid-based), and generating reports based on geospatial data. The application integrates with Google Earth Engine (GEE) and a PostgreSQL database with PostGIS extension to process building data within specified polygons or around pin locations. It supports clustering buildings using balanced K-means or grid-based clustering and generates CSV reports summarizing ward-level visit data.

Features

Building Data Retrieval: Fetches building data within a polygon from either Google Earth Engine or a PostgreSQL database, with optional filters for minimum area and confidence.
Clustering:
- K-means Clustering: Performs balanced K-means clustering based on a specified number of clusters or buildings per cluster.
- Grid-based Clustering: Generates a grid over a polygon, assigns buildings to grid cells, and clusters grids to balance building counts.
Reporting: Generates CSV reports summarizing ward-level visit data, with options to include building-to-visit distance metrics.
Geospatial Support: Uses PostGIS for spatial queries and GeoPandas for handling geospatial data.
CORS Support: Allows cross-origin requests for frontend integration.

Prerequisites

Python: Version 3.8 or higher.
PostgreSQL: With PostGIS extension enabled for spatial queries.
Google Earth Engine Account: For accessing building data via GEE.
Dependencies: Listed in requirements.txt.

Installation

Clone the Repository

git clone https://github.com/Thushar12E45/dimagi-map-project.git
cd dimagi-map-project

Set Up a Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Dependencies
```
pip install -r requirements.txt
```

Set Up Environment Variables Create a .env file in the project root with the following variables:

GEE_CREDS=<your-google-earth-engine-credentials-json>
GEE_PROJECT_NAME=<your-google-earth-engine-project-name>
DB_USER=<your-postgres-username>
DB_PASSWORD=<your-postgres-password>
DB_HOST=<your-postgres-host>
DB_PORT=<your-postgres-port>
DB_NAME=<your-postgres-database-name>
HOST_URL=<your-host-url>  # Optional, defaults to https://connectgis.dimagi.com

GEE_CREDS: JSON string of Google Earth Engine service account credentials.
Database credentials for PostgreSQL connection.

Database Setup
- Ensure the PostgreSQL database is running and has the PostGIS extension enabled.
- The buildings table should contain building data with spatial geometry.
Run the Application
```
python app.py
```
The app runs on http://0.0.0.0:5000 in debug mode by default.

Google Earth Engine (GEE) Setup Guide

Prerequisites

Google Cloud account
Earth Engine access (sign up at earthengine.google.com)

Setup Steps

1. Create a Google Cloud Project

Go to Google Cloud Console
Click "Create Project"
Enter project name and details
Click "Create"

2. Register Project for Earth Engine

After project creation, register it for:
- Commercial use (if applicable), or
- Non-commercial use (for research/academic purposes)
Wait for approval (typically 1-2 business days)

3. Enable Earth Engine API

Search / Navigate to: APIs & Services → Library
Search for "Google Earth Engine API"
Click "Enable"

4. Local Development Setup

# Install Earth Engine Python API
pip install earthengine-api

# Authenticate (will open browser)
earthengine authenticate

4. Configuration

Add your GEE project name to the .env file

GEE_PROJECT_NAME=<your-google-earth-engine-project-name>

Important Notes

Local Development: Authentication via earthengine authenticate is sufficient for local use (no credentials needed in .env)
Production / Other Environments: You must add the appropriate GEE credentials to your .env file:

GEE_CREDS=<your-service-account-credentials-json>

Building Data Insertion Guide

Overview

The building data insertion system downloads building footprint data from Overture Maps and loads it into a PostgreSQL database for use by the clustering application. This is a one-time setup process required before using the main application.

What it Does

Downloads building footprint data for an entire country from Overture Maps
Processes the data in manageable tiles to handle large datasets
Calculates building metrics (area, centroids, confidence scores)
Stores data in PostGIS-enabled PostgreSQL database
Creates spatial indexes for optimal query performance
Supports parallel processing for faster data loading

Configuration for Different Countries

1. Update Bounding Box Coordinates

Edit SQL_SCRIPTS/db_insertion_buildings.py to set the target country's bounding box:

# Example for Kenya
left, bottom = 33.9098, -4.6796  # SW corner
right, top = 41.9058, 5.5059     # NE corner

2. Update Configuration Variables

# Change base filename to match your country
base_filename = "kenya_buildings"

Running the Data Insertion

1. Single Instance (Slower)

cd SQL_SCRIPTS
python db_insertion_buildings.py --instance-id 0 --total-instances 1

2. Parallel Processing (Recommended)

Run multiple instances simultaneously for faster processing:

# Example with 4 parallel instances
python db_insertion_buildings.py --instance-id 0 --total-instances 4 &
python db_insertion_buildings.py --instance-id 1 --total-instances 4 &
python db_insertion_buildings.py --instance-id 2 --total-instances 4 &
python db_insertion_buildings.py --instance-id 3 --total-instances 4

Recovery from Failures

The script supports resuming from interruptions:

Failed tiles are logged in failed_tiles.csv
Restart the script with same parameters - it will skip completed tiles
For persistent failures, check the error messages in the failed tiles log

API Endpoints

1. Home (`/`)

Method: GET
Description: Renders the index.html template with the configured HOST_URL.
Response: HTML page.

2. Get Building Density (`/get_building_density`)

Method: POST
Description: Fetches building data within a polygon or around a pin and performs clustering.

Request Body:

{
  "clusteringType": "kMeans|balancedKMeans|bottomUp",
  "noOfClusters": <int>,  // Number of clusters (default: 3)
  "noOfBuildings": <int>,  // Target buildings per cluster (default: 250)
  "buildingsAreaInMeters": <float>,  // Minimum building area (default: 0)
  "buildingsConfidence": <int>,  // Minimum confidence (0-100, default: 0)
  "thresholdVal": <int>,  // Tolerance percentage (default: 10)
  "fetchClusters": <boolean>,  // Whether to perform clustering (default: false)
  "dbType": "GEE|DB",  // Data source (Google Earth Engine or Database)
  "polygon": [[lng, lat], ...],  // Polygon coordinates (for kMeans/balancedKMeans)
  "pin": [lng, lat]  // Pin coordinates (for bottomUp)
}

Response: JSON with building count, GeoJSON features, and optional cluster data.

Example Response:

{
  "building_count": 100,
  "buildings": {"type": "FeatureCollection", "features": [...]},
  "clusters": [{"coordinates": [lng, lat], "cluster": <int>, "numOfBuildings": <int>}, ...]
}

3. Get Building Density V2 (`/get_building_density_v2`)

Method: POST
Description: Fetches buildings from the database, generates a grid, assigns buildings to grid cells, and performs grid-based clustering.

Request Body:

{
  "polygon": [[lng, lat], ...],  // Polygon coordinates
  "noOfClusters": <int>,  // Number of clusters (default: 3)
  "thresholdVal": <int>,  // Tolerance percentage (default: 10)
  "gridLength": <int>,  // Grid size in meters (default: 50)
  "buildingsAreaInMeters": <float>,  // Minimum building area (default: 0)
  "buildingsConfidence": <int>  // Minimum confidence (0-100, default: 0)
}

Response: JSON with building count, GeoJSON features, grid GeoJSON, and cluster data.

Example Response:

{
  "building_count": 100,
  "buildings": {"type": "FeatureCollection", "features": [...]},
  "grids": {"type": "FeatureCollection", "features": [...]},
  "clusters": [{"coordinates": [lng, lat], "cluster": <int>, "grid_index": <int>}, ...]
}

4. Generate Report (`/generate_report`)

Method: POST
Description: Generates a CSV report summarizing ward-level visit data.

Request Body:

{
  "data": [{"latitude": <float>, "longitude": <float>, "flw_id": <int>}, ...],
  "fetchVisitToBuildingsVal": <boolean>  // Include building-to-visit distance metrics (default: true)
}

Response: CSV file (Ward_summary_report.csv) with ward visit summary data.

Example CSV Headers (with fetchVisitToBuildingsVal=true):

state_name,lga_name,ward_name,population,total.visits,total.buildings,num.phc.serve.ward,median.visit.to.phc,max.visit.to.phc,median.building.to.phc,max.buildings.to.phc,unique.flws,coverage,percent.building.100.plus.to.visit,percent.building.200.plus.to.visit,percent.building.500.plus.to.visit,percent.building.10000.plus.to.visit

Project Structure

├── app.py              # Main Flask application
├── .env                # Environment variables (not tracked)
├── requirements.txt    # Python dependencies
├── templates/
│   └── index.html      # Frontend template

Notes

Google Earth Engine: Ensure valid GEE credentials are provided in .env for the /get_building_density endpoint with dbType=GEE.
Performance: For large polygons, reduce the number of buildings or grid cells to avoid GEE’s 5000-element limit.
Security: Sanitize inputs to prevent SQL injection (handled via parameterized queries in the code).
CORS: Configured to allow all origins (*). Adjust in production for security.

Troubleshooting

GEE Initialization Error: Verify GEE_CREDS in .env and ensure the credentials file is correctly formatted.
Database Connection Error: Check PostgreSQL credentials and ensure the database is accessible.
No Buildings Found: Ensure the polygon or pin coordinates are valid and contain buildings in the database or GEE dataset.
Clustering Issues: Adjust thresholdVal or reduce noOfClusters/noOfBuildings if clustering fails due to insufficient data.

Deployment

The Connect GIS app is hosted on AWS by running a dockerized version of the app on the EC2 instance. The database is hosted as an AWS RDS service.

Deploying changes

In order to deploy new change you need access to the production server on AWS (the details and credentials is in 1Password - search for "ConnectGIS").

Deploying new changes requires the following steps:

# Navigate to folder
cd projects/dimagi-map-project/

# Pull latest changes
git pull

# Build the docker image
docker build -t map-clustering .

# Stop the existing running container
docker stop map-clustering

# Remove the named instance
docker container remove map-clustering

# Run the new image with the port binding
docker run -d -p 5010:5000 --name map-clustering map-clustering

Updating environment variables

The production environment file also lives in 1Password, but you'll also need to update the .env on the server (~/projects/connect-gis/.env).

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
SQL_SCRIPTS		SQL_SCRIPTS
static		static
templates		templates
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

dimagi/connect-gis

Folders and files

Latest commit

History

Repository files navigation

Building Density and Clustering API

Features

Prerequisites

Installation

Google Earth Engine (GEE) Setup Guide

Prerequisites

Setup Steps

1. Create a Google Cloud Project

2. Register Project for Earth Engine

3. Enable Earth Engine API

4. Local Development Setup

4. Configuration

Important Notes

Building Data Insertion Guide

Overview

What it Does

Configuration for Different Countries

1. Update Bounding Box Coordinates

2. Update Configuration Variables

Running the Data Insertion

1. Single Instance (Slower)

2. Parallel Processing (Recommended)

Recovery from Failures

API Endpoints

1. Home (/)

2. Get Building Density (/get_building_density)

3. Get Building Density V2 (/get_building_density_v2)

4. Generate Report (/generate_report)

Project Structure

Notes

Troubleshooting

Deployment

Deploying changes

Updating environment variables

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

1. Home (`/`)

2. Get Building Density (`/get_building_density`)

3. Get Building Density V2 (`/get_building_density_v2`)

4. Generate Report (`/generate_report`)

Packages