GeoAnalystBench

GeoAnalystBench: A GeoAI benchmark for assessing large language models for spatial analysis workflow and code generation

Automating GIS Workflows with Large Language Models (LLMs)

Recent advances in Geospatial Artificial Intelligence (GeoAI) have been driven by generative AI and foundation models. While powerful geoprocessing tools are widely available in Geographic Information Systems (GIS), automating these workflows using AI-driven Python scripting remains a challenge, especially for non-expert users.

This project explores the capabilities of Large Language Models (LLMs) such as ChatGPT, Claude, Gemini, Llama, and DeepSeek in automating GIS workflows. We introduce a benchmark of 50 geoprocessing tasks to evaluate these models' ability to generate Python functions from natural language instructions.

Our findings reveal that proprietary LLMs achieve higher success rates (>90%) and produce workflows more aligned with human-designed implementations than open-source models. The results suggest that integrating proprietary LLMs with ArcPy is a more effective approach for specialized GIS workflows.

By providing benchmarks and insights, this study contributes to the development of optimized prompting strategies, future GIS automation tools, and hybrid GeoAI workflows that combine LLMs with human expertise.

Key Features:

Benchmark for GIS Automation: Evaluation of LLMs on 50 geoprocessing tasks.
LLM Performance Comparison: Validity and similarity analysis of generated workflows.
Open-source Versus Proprietary Models: Comparison of performance and reliability.

Dataset

This research developed 50 Python-based geoprocessing tasks derived from GIS platforms, software, online tutorials, and academic literature. Each task comprises 3 to 10 subtasks, because the simplest task still involves data loading, applying at least one spatial analysis tool, and saving the final outputs. The list of those tasks with their sources are included in the Tasks section below.

The dataset includes the following information:

Key Column	Description
ID	Unique identifier for each task
Open or Closed Source	Use open source or closed source library
Task	Brief description of the task
Instruction	Natural language instruction for completing the task
Domain Knowledge	Domain-specific knowledge related to task
Dataset Description	Data name, format, descriptions, and key columns
Human Designed Workflow	Numbered list of human-designed workflow
Task Length	The length of the human-designed workflow
Code	Human-designed code for the task and dataset

The dataset is avaliable to download at GeoAnalystBench.

The data being used in this research is avaliable to download at Google Drive.

Tasks

There are 50 tasks in the dataset, and this section covers all tasks and their sources. For more details, please refer to the GeoAnalystBench.

Note that there are tasks with the same name but different id. This typically happens when the task is slightly different, or the task is a subset of a larger task.

ID	Task Name	Source
1	Find heat islands and at-risk populations in Madison, Wisconsin	Analyze urban heat using kriging
2	Find future bus stop locations in Hamilton	Assess access to public transit
3	Assess burn scars and wildfire impact in Montana using satellite imagery	Assess burn scars with satellite imagery
4	Identify groundwater vulnerable areas that need protection	Identify groundwater vulnerable areas
5	Visualize data on children with elevated blood lead levels while protecting privacy	De-identify health data for visualization and sharing
6	Use animal GPS tracks to model home range and movement over time	Model animal home range
7	Analyze the impacts of land subsidence on flooding	Model how land subsidence affects flooding
8	Find gaps in Toronto fire station service coverage	Get started with Python in ArcGIS Pro
9	Find the deforestation rate for Rondônia	Predict deforestation in the Amazon rain forest
10	Analyze the impact of proposed roads on the local environment	Predict deforestation in the Amazon rain forest
11	Create charts in Python to explore coral and sponge distribution around Catalina Island	Chart coral and sponge distribution
12	Find optimal corridors to connect dwindling mountain lion populations	Build a model to connect mountain lion habitat
13	Understand the relationship between ocean temperature and salinity at various depths in the South Atlantic Ocean	SciTools Iris
14	Detect persistent periods of high temperature over the past 240 years	SciTools Iris
15	Understand the geographical distribution of Total Electron Content (TEC) in the ionosphere	SciTools Iris
16	Analyze climate change trends in North America using spatiotemporal data	SciTools Iris
17	Analyze the geographical distribution of fatal car crashes in New York City during 2016	Pointplot of NYC fatal and injurious traffic collisions
18	Analyze street tree species data in San Francisco	Quadtree of San Francisco street trees
19	Model spatial patterns of water quality	Model water quality
20	Predict the likelihood of tin-tungsten deposits in Tasmania	Geospatial ML Challenges: A prospectivity analysis example
21	Find optimal corridors to connect dwindling mountain lion populations(2)	Build a model to connect mountain lion habitat
22	Find optimal corridors to connect dwindling mountain lion populations(3)	Build a model to connect mountain lion habitat
23	Assess Open Space to Lower Flood Insurance Cost	Assess open space to lower flood insurance cost
24	Provide a de-identified point-level dataset that includes all the variables of interest for each child, as well as their general location	De-identify health data for visualization and sharing
25	Create risk maps for transmission, susceptibility, and resource scarcity. Then create a map of risk profiles to help pinpoint targeted intervention areas	Analyze COVID-19 risk using ArcGIS Pro
26	Use drainage conditions and water depth to calculate groundwater vulnerable areas	Identify groundwater vulnerable areas
27	Identify undeveloped areas from groundwater risk zones	Identify groundwater vulnerable areas
28	Estimate the origin-destination (OD) flows between regions based on the socioeconomic attributes of regions and the mobility data	ScienceDirect - OD Flow Estimation
29	Calculate Travel Time for a Tsunami	Calculate travel time for a tsunami
30	Designate bike routes for commuting professionals	Designate bike routes
31	Detect aggregation scales of geographical flows	Geographical Flow Aggregation
32	Find optimal corridors to connect dwindling mountain lion populations	Build a model to connect mountain lion habitat
33	Analyze the impacts of land subsidence on flooding	Model how land subsidence affects flooding
34	Estimate the accessibility of roads to rural areas in Japan	Estimate access to infrastructure
35	Calculate landslide potential for communities affected by wildfires	Landslide Potential Calculation
36	Compute the change in vegetation before and after a hailstorm with the SAVI index	Assess hail damage in cornfields with satellite imagery
37	Analyze human sentiments of heat exposure using social media data	National-level Analysis using Twitter Data
38	Calculate travel time from one location to others in a neighborhood	Intro to OSM Network Data
39	Train a Geographically Weighted Regression model to predict Georgia's Bachelor's degree rate	Geographically Weighted Regression Demo
40	Calculate and visualize changes in malaria prevalence	Visualizing Shrinking Malaria Rates
41	Improve campsite data quality using a relationship class	Improve campsite data
42	Investigate spatial patterns for Airbnb prices in Berlin	Determine dangerous roads for drivers
43	Use animal GPS tracks to model home range to understand where they are and how they move over time	Model animal home range
44	Find gap for Toronto fire station service coverage	Get started with Python in ArcGIS Pro
45	Find optimal corridors to connect dwindling mountain lion populations	Build a model to connect mountain lion habitat
46	Identify hot spots for peak crashes	Determine the most dangerous roads for drivers
47	Calculate impervious surface area	Calculate impervious surfaces
48	Determine how location impacts interest rates	Impact of Location on Interest Rates
49	Mapping the Impact of Housing Shortage on Oil Workers	Homeless in the Badlands
50	Predict seagrass habitats	Predict seagrass habitats with machine learning

Case Study 1(Task 43): Identification of Home Range and Spatial Clusters from Animal

Movements Understanding elk movement patterns is critical for wildlife conservation and management in the field of animal ecology. The task needs to identify elk home ranges in Southwestern Alberta, 2009 using GPS-tracking locations. In doing so, researchers are able to analyze their space use and movement clusters for elk populations. Understanding the home range of the elk population is essential for ensuring sustainability and stability of the wildlife.

Dataset

• berling_neighbourhoods.geojson: Geojson file for multipolygons of neighbourhoods in Berling, properties include "neighbourhood" and "neighbourhood_group".

• berlin-listings.csv: CSV file of Berling Airbnb information, with lat and lng of Airbnb.

Prompts

Click to expand/collapse Workflow Prompts

As a Geospatial data scientist, you will generate a workflow to a proposed task.

[Task]: Use animal GPS tracks to model home range to understand where they are and how they move over time.

[Instruction]: Your task is to analyze and visualize elk movements using the provided dataset. The goal is to estimate home ranges and assess habitat preferences using spatial analysis techniques, including Minimum Bounding Geometry (Convex Hull), Kernel Density Estimation, and Density-Based Clustering (DBSCAN). The analysis will generate spatial outputs stored in "dataset/elk_home_range.gdb" and "dataset/".

[Domain Knowledge]: "Home range" can be defined as the area within which an animal normally lives and finds what it needs for survival. Basically, the home range is the area that an animal travels for its normal daily activities. "Minimum Bounding Geometry" creates a feature class containing polygons which represent a specified minimum bounding geometry enclosing each input feature or each group of input features. "Convex hull" is the smallest convex polygon that can enclose a group of objects, such as a group of points. "Kernel Density Mapping" calculates and visualizes features's density in a given area. "DBSCAN", Density-Based Spatial Clustering of Applications with Noise that cluster the points based on density criterion. [Dataset Description]: dataset/Elk_in_Southwestern_Alberta_2009.geojson: geojson files for storing points of Elk movements in Southwestern Alberta 2009.

Columns of dataset/Elk_in_Southwestern_Alberta_2009.geojson: 'OBJECTID', 'timestamp', 'long', 'lat', 'comments', 'external_t', 'dop', 'fix_type_r', 'satellite_', 'height', 'crc_status', 'outlier_ma', 'sensor_typ', 'individual', 'tag_ident', 'ind_ident', 'study_name', 'date', 'time', 'timestamp_Converted', 'summer_indicator', 'geometry'

[Key Notes]: 1.Use automatic reasoning and clearly explain each step (Chain of Thoughts approach).

2.Using *NetworkX package for visualization.

3.Using 'dot' for graph visualization layout.

4.Multiple subtasks can be proceeded correspondingly because all of their outputs will be inputs for the next subtask.

5.Limiting your output to code, no extra information.

6.Only codes for workflow, no implementation.

[Expected Sample Output Begin]

"""

tasks = [Task1, Task2, Task3]

G = nx.DiGraph()

for i in range(len(tasks) - 1):

G.add_edge(tasks[i], tasks[i + 1])

pos = nx.drawing.nx_pydot.graphviz_layout(G, prog="dot")

plt.figure(figsize=(15, 8))

nx.draw(G, pos, with_labels=True, node_size=3000, node_color='lightblue', font_size=10, font_weight='bold', arrowsize=20)

plt.title("Workflow for Analyzing Urban Heat Using Kriging Interpolation", fontsize=14)

plt.show()

"""

[Expected Sample Output End]

Click to expand/collapse Code Generation Prompts

As a Geospatial data scientist, generate a python file to solve the proposed task.

[Task]: Use animal GPS tracks to model home range to understand where they are and how they move over time.

[Instruction]: Your task is to analyze and visualize elk movements using the provided dataset. The goal is to estimate home ranges and assess habitat preferences using spatial analysis techniques, including Minimum Bounding Geometry (Convex Hull), Kernel Density Estimation, and Density-Based Clustering (DBSCAN). The analysis will generate spatial outputs stored in ""dataset/elk_home_range.gdb"" and ""dataset/"".

[Domain Knowledge]: "Home range" can be defined as the area within which an animal normally lives and finds what it needs for survival. Basically, the home range is the area that an animal travels for its normal daily activities.

"Minimum Bounding Geometry" creates a feature class containing polygons which represent a specified minimum bounding geometry enclosing each input feature or each group of input features.

"Convex hull" is the smallest convex polygon that can enclose a group of objects, such as a group of points.

"Kernel Density Mapping" calculates and visualizes features's density in a given area. "DBSCAN", Density-Based Spatial Clustering of Applications with Noise that cluster the points based on density criterion.

[Dataset Description]: dataset/Elk_in_Southwestern_Alberta_2009.geojson: geojson files for storing points of Elk movements in Southwestern Alberta 2009.

Columns of dataset/Elk_in_Southwestern_Alberta_2009.geojson: 'OBJECTID', 'timestamp', 'long', 'lat', 'comments', 'external_t', 'dop', 'fix_type_r', 'satellite_', 'height', 'crc_status', 'outlier_ma', 'sensor_typ', 'individual', 'tag_ident', 'ind_ident', 'study_name', 'date', 'time', 'timestamp_Converted', 'summer_indicator', 'geometry'

[Key Notes]: 1.Use automatic reasoning and clearly explain each subtask before performing it (ReAct approach).

2.Using latest python packages for code generation

3.Put all code under main function, no helper functions

4.Limit your output to code, no extra information.

5.Use latest Arcpy functions only "

Results

Case Study 2

The second case study is about spatial hotspot analysis of car accidents. The Brevard County in Florida has one of the deadliest interstate highways in the United States. This case study aims to identify the spatially distributed hot spots along the road network. The dataset includes road network, crash locations from 2010 to 2015, and a network spatial weighting matrix. Understanding the hot spots for car accidents is essential for the local transportation department to make policies and quick responses for future accidents.

Dataset

• roads.shp: The road network of Brevard County.

• crashes.shp: The locations of crashes in Brevard County, Florida between 2010 and 2015.

• nwswm360ft.swm: Spatial weights matrix file created using the Generate Network Spatial Weights tool and a street network built from Brevard County road polylines.

Prompts

Click to expand/collapse Workflow Prompts

As a Geospatial data scientist, you will generate a workflow to a proposed task.

[Task]: Identify hot spots for peak crashes

[Instruction]: Your task is identifying hot spots for peak crashes in Brevard County, Florida, 2010 - 2015. The first step is select all the crashes based on peak time zone. Create a copy of selected crashes data. Then, snap the crashes points to the road network and spatial join with the road. Calculate the crash rate based on the joint data and use hot spot analysis to get crash hot spot map as the result.

[Domain Knowledge]: We consider traffic between time zone 3pm to 5pm in weekdays as peak. For snap process, the recommend buffer on roads is 0.25 miles. Hot spot analysis looks for high crash rates that cluster close together, accurate distance measurements based on the road network are essential.

[Dataset Description]: dataset/crashes.shp: The locations of crashes in Brevard County, Florida between 2010 and 2015.

dataset/roads.shp: The road network of Brevard County.

dataset/nwswm360ft.swm: Spatial weights matrix file created using the Generate Network Spatial

Weights tool and a street network built from Brevard County road polylines.

[Key Notes]: 1.Use automatic reasoning and clearly explain each step (Chain of Thoughts approach).

2.Using *NetworkX package for visualization.

3.Using 'dot' for graph visualization layout.

4.Multiple subtasks can be proceeded correspondingly because all of their outputs will be inputs for the next subtask.

5.Limiting your output to code, no extra information.

6.Only codes for workflow, no implementation.

[Expected Sample Output Begin]

"""

tasks = [Task1, Task2, Task3]

G = nx.DiGraph()

for i in range(len(tasks) - 1):

G.add_edge(tasks[i], tasks[i + 1])

pos = nx.drawing.nx_pydot.graphviz_layout(G, prog="dot")

plt.figure(figsize=(15, 8))

nx.draw(G, pos, with_labels=True, node_size=3000, node_color='lightblue', font_size=10, font_weight='bold', arrowsize=20)

plt.title("Workflow for Analyzing Urban Heat Using Kriging Interpolation", fontsize=14)

plt.show()

"""

[Expected Sample Output End]

Click to expand/collapse Code Generation Prompts

As a Geospatial data scientist, generate a python file to solve the proposed task.

[Task]: Identify hot spots for peak crashes

[Instruction]: Your task is identifying hot spots for peak crashes in Brevard County, Florida, 2010 - 2015. The first step is select all the crashes based on peak time zone. Create a copy of selected crashes data. Then, snap the crashes points to the road network and spatial join with the road. Calculate the crash rate based on the joint data and use hot spot analysis to get crash hot spot map as the result.

[Domain Knowledge]: We consider traffic between time zone 3pm to 5pm in weekdays as peak. For snap process, the recommend buffer on roads is 0.25 miles. Hot spot analysis looks for high crash rates that cluster close together, accurate distance measurements based on the road network are essential.

[Dataset Description]: dataset/crashes.shp: The locations of crashes in Brevard County, Florida between 2010 and 2015.

dataset/roads.shp: The road network of Brevard County.

dataset/nwswm360ft.swm: Spatial weights matrix file created using the Generate Network Spatial Weights tool and a street network built from Brevard County road polylines.

[Key Notes]:

Use automatic reasoning and clearly explain each subtask before performing it (ReAct approach).

Using latest python packages for code generation

Put all code under main function, no helper functions

Limit your output to code, no extra information.

Use latest Arcpy functions only

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
case_study/figures		case_study/figures
codes		codes
dataset		dataset
figures		figures
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GeoAnalystBench

Automating GIS Workflows with Large Language Models (LLMs)

Key Features:

Dataset

Tasks

Case Study 1(Task 43): Identification of Home Range and Spatial Clusters from Animal

Dataset

Prompts

Results

Case Study 2

Dataset

Prompts

Results

Acknowledgement

Reference

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

GeoDS/GeoAnalystBench

Folders and files

Latest commit

History

Repository files navigation

GeoAnalystBench

Automating GIS Workflows with Large Language Models (LLMs)

Key Features:

Dataset

Tasks

Case Study 1(Task 43): Identification of Home Range and Spatial Clusters from Animal

Dataset

Prompts

Results

Case Study 2

Dataset

Prompts

Results

Acknowledgement

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages