GeoAnalystBench: A GeoAI benchmark for assessing large language models for spatial analysis workflow and code generation
Recent advances in Geospatial Artificial Intelligence (GeoAI) have been driven by generative AI and foundation models. While powerful geoprocessing tools are widely available in Geographic Information Systems (GIS), automating these workflows using AI-driven Python scripting remains a challenge, especially for non-expert users.
This project explores the capabilities of Large Language Models (LLMs) such as ChatGPT, Claude, Gemini, Llama, and DeepSeek in automating GIS workflows. We introduce a benchmark of 50 geoprocessing tasks to evaluate these models' ability to generate Python functions from natural language instructions.
Our findings reveal that proprietary LLMs achieve higher success rates (>90%) and produce workflows more aligned with human-designed implementations than open-source models. The results suggest that integrating proprietary LLMs with ArcPy is a more effective approach for specialized GIS workflows.
By providing benchmarks and insights, this study contributes to the development of optimized prompting strategies, future GIS automation tools, and hybrid GeoAI workflows that combine LLMs with human expertise.
- Benchmark for GIS Automation: Evaluation of LLMs on 50 geoprocessing tasks.
- LLM Performance Comparison: Validity and similarity analysis of generated workflows.
- Open-source Versus Proprietary Models: Comparison of performance and reliability.
This research developed 50 Python-based geoprocessing tasks derived from GIS platforms, software, online tutorials, and academic literature. Each task comprises 3 to 10 subtasks, because the simplest task still involves data loading, applying at least one spatial analysis tool, and saving the final outputs. The list of those tasks with their sources are included in the Tasks section below.
The dataset includes the following information:
Key Column | Description |
---|---|
ID | Unique identifier for each task |
Open or Closed Source | Use open source or closed source library |
Task | Brief description of the task |
Instruction | Natural language instruction for completing the task |
Domain Knowledge | Domain-specific knowledge related to task |
Dataset Description | Data name, format, descriptions, and key columns |
Human Designed Workflow | Numbered list of human-designed workflow |
Task Length | The length of the human-designed workflow |
Code | Human-designed code for the task and dataset |
The dataset is avaliable to download at GeoAnalystBench.
The data being used in this research is avaliable to download at Google Drive.
There are 50 tasks in the dataset, and this section covers all tasks and their sources. For more details, please refer to the GeoAnalystBench.
Note that there are tasks with the same name but different id. This typically happens when the task is slightly different, or the task is a subset of a larger task.
ID | Task Name | Source |
---|---|---|
1 | Find heat islands and at-risk populations in Madison, Wisconsin | Analyze urban heat using kriging |
2 | Find future bus stop locations in Hamilton | Assess access to public transit |
3 | Assess burn scars and wildfire impact in Montana using satellite imagery | Assess burn scars with satellite imagery |
4 | Identify groundwater vulnerable areas that need protection | Identify groundwater vulnerable areas |
5 | Visualize data on children with elevated blood lead levels while protecting privacy | De-identify health data for visualization and sharing |
6 | Use animal GPS tracks to model home range and movement over time | Model animal home range |
7 | Analyze the impacts of land subsidence on flooding | Model how land subsidence affects flooding |
8 | Find gaps in Toronto fire station service coverage | Get started with Python in ArcGIS Pro |
9 | Find the deforestation rate for Rondônia | Predict deforestation in the Amazon rain forest |
10 | Analyze the impact of proposed roads on the local environment | Predict deforestation in the Amazon rain forest |
11 | Create charts in Python to explore coral and sponge distribution around Catalina Island | Chart coral and sponge distribution |
12 | Find optimal corridors to connect dwindling mountain lion populations | Build a model to connect mountain lion habitat |
13 | Understand the relationship between ocean temperature and salinity at various depths in the South Atlantic Ocean | SciTools Iris |
14 | Detect persistent periods of high temperature over the past 240 years | SciTools Iris |
15 | Understand the geographical distribution of Total Electron Content (TEC) in the ionosphere | SciTools Iris |
16 | Analyze climate change trends in North America using spatiotemporal data | SciTools Iris |
17 | Analyze the geographical distribution of fatal car crashes in New York City during 2016 | Pointplot of NYC fatal and injurious traffic collisions |
18 | Analyze street tree species data in San Francisco | Quadtree of San Francisco street trees |
19 | Model spatial patterns of water quality | Model water quality |
20 | Predict the likelihood of tin-tungsten deposits in Tasmania | Geospatial ML Challenges: A prospectivity analysis example |
21 | Find optimal corridors to connect dwindling mountain lion populations(2) | Build a model to connect mountain lion habitat |
22 | Find optimal corridors to connect dwindling mountain lion populations(3) | Build a model to connect mountain lion habitat |
23 | Assess Open Space to Lower Flood Insurance Cost | Assess open space to lower flood insurance cost |
24 | Provide a de-identified point-level dataset that includes all the variables of interest for each child, as well as their general location | De-identify health data for visualization and sharing |
25 | Create risk maps for transmission, susceptibility, and resource scarcity. Then create a map of risk profiles to help pinpoint targeted intervention areas | Analyze COVID-19 risk using ArcGIS Pro |
26 | Use drainage conditions and water depth to calculate groundwater vulnerable areas | Identify groundwater vulnerable areas |
27 | Identify undeveloped areas from groundwater risk zones | Identify groundwater vulnerable areas |
28 | Estimate the origin-destination (OD) flows between regions based on the socioeconomic attributes of regions and the mobility data | ScienceDirect - OD Flow Estimation |
29 | Calculate Travel Time for a Tsunami | Calculate travel time for a tsunami |
30 | Designate bike routes for commuting professionals | Designate bike routes |
31 | Detect aggregation scales of geographical flows | Geographical Flow Aggregation |
32 | Find optimal corridors to connect dwindling mountain lion populations | Build a model to connect mountain lion habitat |
33 | Analyze the impacts of land subsidence on flooding | Model how land subsidence affects flooding |
34 | Estimate the accessibility of roads to rural areas in Japan | Estimate access to infrastructure |
35 | Calculate landslide potential for communities affected by wildfires | Landslide Potential Calculation |
36 | Compute the change in vegetation before and after a hailstorm with the SAVI index | Assess hail damage in cornfields with satellite imagery |
37 | Analyze human sentiments of heat exposure using social media data | National-level Analysis using Twitter Data |
38 | Calculate travel time from one location to others in a neighborhood | Intro to OSM Network Data |
39 | Train a Geographically Weighted Regression model to predict Georgia's Bachelor's degree rate | Geographically Weighted Regression Demo |
40 | Calculate and visualize changes in malaria prevalence | Visualizing Shrinking Malaria Rates |
41 | Improve campsite data quality using a relationship class | Improve campsite data |
42 | Investigate spatial patterns for Airbnb prices in Berlin | Determine dangerous roads for drivers |
43 | Use animal GPS tracks to model home range to understand where they are and how they move over time | Model animal home range |
44 | Find gap for Toronto fire station service coverage | Get started with Python in ArcGIS Pro |
45 | Find optimal corridors to connect dwindling mountain lion populations | Build a model to connect mountain lion habitat |
46 | Identify hot spots for peak crashes | Determine the most dangerous roads for drivers |
47 | Calculate impervious surface area | Calculate impervious surfaces |
48 | Determine how location impacts interest rates | Impact of Location on Interest Rates |
49 | Mapping the Impact of Housing Shortage on Oil Workers | Homeless in the Badlands |
50 | Predict seagrass habitats | Predict seagrass habitats with machine learning |
Movements Understanding elk movement patterns is critical for wildlife conservation and management in the field of animal ecology. The task needs to identify elk home ranges in Southwestern Alberta, 2009 using GPS-tracking locations. In doing so, researchers are able to analyze their space use and movement clusters for elk populations. Understanding the home range of the elk population is essential for ensuring sustainability and stability of the wildlife.
• berling_neighbourhoods.geojson: Geojson file for multipolygons of neighbourhoods in Berling, properties include "neighbourhood" and "neighbourhood_group".
• berlin-listings.csv: CSV file of Berling Airbnb information, with lat and lng of Airbnb.
Click to expand/collapse Workflow Prompts
As a Geospatial data scientist, you will generate a workflow to a proposed task.
[Task]: Use animal GPS tracks to model home range to understand where they are and how they move over time.
[Instruction]: Your task is to analyze and visualize elk movements using the provided dataset. The goal is to estimate home ranges and assess habitat preferences using spatial analysis techniques, including Minimum Bounding Geometry (Convex Hull), Kernel Density Estimation, and Density-Based Clustering (DBSCAN). The analysis will generate spatial outputs stored in "dataset/elk_home_range.gdb" and "dataset/".
[Domain Knowledge]: "Home range" can be defined as the area within which an animal normally lives and finds what it needs for survival. Basically, the home range is the area that an animal travels for its normal daily activities. "Minimum Bounding Geometry" creates a feature class containing polygons which represent a specified minimum bounding geometry enclosing each input feature or each group of input features. "Convex hull" is the smallest convex polygon that can enclose a group of objects, such as a group of points. "Kernel Density Mapping" calculates and visualizes features's density in a given area. "DBSCAN", Density-Based Spatial Clustering of Applications with Noise that cluster the points based on density criterion. [Dataset Description]: dataset/Elk_in_Southwestern_Alberta_2009.geojson: geojson files for storing points of Elk movements in Southwestern Alberta 2009.
Columns of dataset/Elk_in_Southwestern_Alberta_2009.geojson: 'OBJECTID', 'timestamp', 'long', 'lat', 'comments', 'external_t', 'dop', 'fix_type_r', 'satellite_', 'height', 'crc_status', 'outlier_ma', 'sensor_typ', 'individual', 'tag_ident', 'ind_ident', 'study_name', 'date', 'time', 'timestamp_Converted', 'summer_indicator', 'geometry'
[Key Notes]: 1.Use automatic reasoning and clearly explain each step (Chain of Thoughts approach).
2.Using *NetworkX package for visualization.
3.Using 'dot' for graph visualization layout.
4.Multiple subtasks can be proceeded correspondingly because all of their outputs will be inputs for the next subtask.
5.Limiting your output to code, no extra information.
6.Only codes for workflow, no implementation.
[Expected Sample Output Begin]
"""
tasks = [Task1, Task2, Task3]
G = nx.DiGraph()
for i in range(len(tasks) - 1):
G.add_edge(tasks[i], tasks[i + 1])
pos = nx.drawing.nx_pydot.graphviz_layout(G, prog="dot")
plt.figure(figsize=(15, 8))
nx.draw(G, pos, with_labels=True, node_size=3000, node_color='lightblue', font_size=10, font_weight='bold', arrowsize=20)
plt.title("Workflow for Analyzing Urban Heat Using Kriging Interpolation", fontsize=14)
plt.show()
"""
[Expected Sample Output End]
Click to expand/collapse Code Generation Prompts
As a Geospatial data scientist, generate a python file to solve the proposed task.
[Task]: Use animal GPS tracks to model home range to understand where they are and how they move over time.
[Instruction]: Your task is to analyze and visualize elk movements using the provided dataset. The goal is to estimate home ranges and assess habitat preferences using spatial analysis techniques, including Minimum Bounding Geometry (Convex Hull), Kernel Density Estimation, and Density-Based Clustering (DBSCAN). The analysis will generate spatial outputs stored in ""dataset/elk_home_range.gdb"" and ""dataset/"".
[Domain Knowledge]: "Home range" can be defined as the area within which an animal normally lives and finds what it needs for survival. Basically, the home range is the area that an animal travels for its normal daily activities.
"Minimum Bounding Geometry" creates a feature class containing polygons which represent a specified minimum bounding geometry enclosing each input feature or each group of input features.
"Convex hull" is the smallest convex polygon that can enclose a group of objects, such as a group of points.
"Kernel Density Mapping" calculates and visualizes features's density in a given area. "DBSCAN", Density-Based Spatial Clustering of Applications with Noise that cluster the points based on density criterion.
[Dataset Description]: dataset/Elk_in_Southwestern_Alberta_2009.geojson: geojson files for storing points of Elk movements in Southwestern Alberta 2009.
Columns of dataset/Elk_in_Southwestern_Alberta_2009.geojson: 'OBJECTID', 'timestamp', 'long', 'lat', 'comments', 'external_t', 'dop', 'fix_type_r', 'satellite_', 'height', 'crc_status', 'outlier_ma', 'sensor_typ', 'individual', 'tag_ident', 'ind_ident', 'study_name', 'date', 'time', 'timestamp_Converted', 'summer_indicator', 'geometry'
[Key Notes]: 1.Use automatic reasoning and clearly explain each subtask before performing it (ReAct approach).
2.Using latest python packages for code generation
3.Put all code under main function, no helper functions
4.Limit your output to code, no extra information.
5.Use latest Arcpy functions only "
The second case study is about spatial hotspot analysis of car accidents. The Brevard County in Florida has one of the deadliest interstate highways in the United States. This case study aims to identify the spatially distributed hot spots along the road network. The dataset includes road network, crash locations from 2010 to 2015, and a network spatial weighting matrix. Understanding the hot spots for car accidents is essential for the local transportation department to make policies and quick responses for future accidents.
• roads.shp: The road network of Brevard County.
• crashes.shp: The locations of crashes in Brevard County, Florida between 2010 and 2015.
• nwswm360ft.swm: Spatial weights matrix file created using the Generate Network Spatial Weights tool and a street network built from Brevard County road polylines.
Click to expand/collapse Workflow Prompts
As a Geospatial data scientist, you will generate a workflow to a proposed task.
[Task]: Identify hot spots for peak crashes
[Instruction]: Your task is identifying hot spots for peak crashes in Brevard County, Florida, 2010 - 2015. The first step is select all the crashes based on peak time zone. Create a copy of selected crashes data. Then, snap the crashes points to the road network and spatial join with the road. Calculate the crash rate based on the joint data and use hot spot analysis to get crash hot spot map as the result.
[Domain Knowledge]: We consider traffic between time zone 3pm to 5pm in weekdays as peak. For snap process, the recommend buffer on roads is 0.25 miles. Hot spot analysis looks for high crash rates that cluster close together, accurate distance measurements based on the road network are essential.
[Dataset Description]: dataset/crashes.shp: The locations of crashes in Brevard County, Florida between 2010 and 2015.
dataset/roads.shp: The road network of Brevard County.
dataset/nwswm360ft.swm: Spatial weights matrix file created using the Generate Network Spatial
Weights tool and a street network built from Brevard County road polylines.
[Key Notes]: 1.Use automatic reasoning and clearly explain each step (Chain of Thoughts approach).
2.Using *NetworkX package for visualization.
3.Using 'dot' for graph visualization layout.
4.Multiple subtasks can be proceeded correspondingly because all of their outputs will be inputs for the next subtask.
5.Limiting your output to code, no extra information.
6.Only codes for workflow, no implementation.
[Expected Sample Output Begin]
"""
tasks = [Task1, Task2, Task3]
G = nx.DiGraph()
for i in range(len(tasks) - 1):
G.add_edge(tasks[i], tasks[i + 1])
pos = nx.drawing.nx_pydot.graphviz_layout(G, prog="dot")
plt.figure(figsize=(15, 8))
nx.draw(G, pos, with_labels=True, node_size=3000, node_color='lightblue', font_size=10, font_weight='bold', arrowsize=20)
plt.title("Workflow for Analyzing Urban Heat Using Kriging Interpolation", fontsize=14)
plt.show()
"""
[Expected Sample Output End]
Click to expand/collapse Code Generation Prompts
As a Geospatial data scientist, generate a python file to solve the proposed task.
[Task]: Identify hot spots for peak crashes
[Instruction]: Your task is identifying hot spots for peak crashes in Brevard County, Florida, 2010 - 2015. The first step is select all the crashes based on peak time zone. Create a copy of selected crashes data. Then, snap the crashes points to the road network and spatial join with the road. Calculate the crash rate based on the joint data and use hot spot analysis to get crash hot spot map as the result.
[Domain Knowledge]: We consider traffic between time zone 3pm to 5pm in weekdays as peak. For snap process, the recommend buffer on roads is 0.25 miles. Hot spot analysis looks for high crash rates that cluster close together, accurate distance measurements based on the road network are essential.
[Dataset Description]: dataset/crashes.shp: The locations of crashes in Brevard County, Florida between 2010 and 2015.
dataset/roads.shp: The road network of Brevard County.
dataset/nwswm360ft.swm: Spatial weights matrix file created using the Generate Network Spatial Weights tool and a street network built from Brevard County road polylines.
[Key Notes]:
Use automatic reasoning and clearly explain each subtask before performing it (ReAct approach).
Using latest python packages for code generation
Put all code under main function, no helper functions
Limit your output to code, no extra information.
Use latest Arcpy functions only