GitHub - Arnaslie/operation-floresta

Ancient Footprints, Modern Pathways to Sustainability

AI-Guided Discovery of Amazonian Dark Earths and Earthworks in the Upper Xingu

The Upper Xingu region of Brazil, long inhabited by the Kuikuro and other Xinguano groups, holds one of the most intricate pre-Columbian landscapes in Amazonia—marked by ringed settlements, canals, engineered soils (ADE), and forest management systems. While prior studies have illuminated the spatial logic of these "garden cities," much of the region remains unmapped and underexplored.

This project introduces a novel AI-assisted pipeline to identify potential new archaeological sites in the Upper Xingu, focusing on the prediction of Amazonian Dark Earth (ADE) zones and earthworks. By combining:

Synthetic test sites generation
Remote sensing
Machine learning
GPT-4.1–based generative reasoning,

we build a replicable framework that links Indigenous settlement logic to environmental signatures—enhancing archaeological discovery while honoring cultural context.

Methods Overview

Synthetic Test Generation:

We use GPT to select locations based on regional terrain, settlement logic, and environmental patterns drawn from its knowledge base.

Remote Sensing

We use publicly available environmental data collected via satellite or airborne platforms to extract relevant features:

Sentinel-2 imagery for vegetation indices (NDVI, BSI, NDBI)
LiDAR-derived DEM for elevation and slope
WorldClim v2 for climate variables (temperature, precipitation, seasonality)
ISRIC SoilGrids for soil composition (clay %, pH, carbon content)

Training Data

2,073 labeled sites from Walker et al. (2022) including ADE, earthwork and others.

Feature Matrix (32 Total Variables)

Soil: clay %, organic carbon, pH, bulk density, cation exchange capacity
Hydrology: distance to major/minor rivers, slope-based drainage index
Climate: 19 bioclimatic variables from 1970–2000
Topography: LiDAR-derived elevation & slope
Vegetation: NDVI, NDBI, BSI (from Sentinel-2)
Accessibility: distance to roads, remoteness index

Novelty: GPT-Generated Test Site Coordinates ✨

We used GPT-4.1 to generate 50 hypothetical site coordinates in the Upper Xingu, simulating culturally and environmentally plausible locations not included in the training data. This step served two purposes:

Synthetic Test Generation: GPT selected locations based on regional terrain, settlement logic, and environmental patterns drawn from its knowledge base.
Post-prediction Reasoning: For top-ranked predictions, GPT performed qualitative reasoning to interpret spatial plausibility based on cultural patterns and terrain resemblance to known sites.

This dual application of GPT represents a novel method in archaeological discovery—combining structured prediction with semantic interpretation.

🔁 Reproducing Our Workflow

This project is organized into three modular notebooks, each corresponding to a key stage of our AI-guided archaeological discovery pipeline:

🧱 Step 1: Data Collection & Feature Engineering

Notebook: UpperXingu_data.ipynb

Retrieves environmental and geospatial features for both training and GPT-simulated test sites.
Outputs a .csv with 32 features for ∼2,000 labeled sites and 50 synthetic test sites.

🌲 Step 2: Machine Learning Classification

Notebook: Footprint_fullanalysis.ipynb

Trains a Random Forest classifier on labeled ADE/earthwork/other sites.
Evaluates feature importance and predicts site classes for the test set.
Produces per-site class probabilities and ranked outputs.

✨ Step 3: GPT-Guided Interpretation & Site Selection

Notebook: openai_reasoning.ipynb

Uses GPT-4.1 to interpret the classifier output and prioritize candidates.
Performs generative reasoning to select the two most promising ADE sites.
Outputs a final summary table with coordinates and justification.

📍 Final Predicted Sites for Field Validation

Site	Latitude	Longitude	ADE Probability
1	-11.4869	-55.3030	0.458
2	-11.3475	-55.2940	0.444

These candidate sites are situated on elevated terra firme ridges within 25 km of a major river, in ecologically stable regions—aligning with known ADE settlement patterns.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
data		data
figures		figures
misc		misc
model		model
.gitignore		.gitignore
.python-version		.python-version
Ancient Footprints, Modern Pathways to Sustainability.pdf		Ancient Footprints, Modern Pathways to Sustainability.pdf
Footprint_fullanalysis.ipynb		Footprint_fullanalysis.ipynb
LICENSE		LICENSE
README.md		README.md
UpperXingu_data.ipynb		UpperXingu_data.ipynb
openai_reasoning.ipynb		openai_reasoning.ipynb
project_flowchart.png		project_flowchart.png
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ancient Footprints, Modern Pathways to Sustainability

Methods Overview

Synthetic Test Generation:

Remote Sensing

Training Data

Feature Matrix (32 Total Variables)

Novelty: GPT-Generated Test Site Coordinates ✨

🔁 Reproducing Our Workflow

🧱 Step 1: Data Collection & Feature Engineering

🌲 Step 2: Machine Learning Classification

✨ Step 3: GPT-Guided Interpretation & Site Selection

📍 Final Predicted Sites for Field Validation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Arnaslie/operation-floresta

Folders and files

Latest commit

History

Repository files navigation

Ancient Footprints, Modern Pathways to Sustainability

Methods Overview

Synthetic Test Generation:

Remote Sensing

Training Data

Feature Matrix (32 Total Variables)

Novelty: GPT-Generated Test Site Coordinates ✨

🔁 Reproducing Our Workflow

🧱 Step 1: Data Collection & Feature Engineering

🌲 Step 2: Machine Learning Classification

✨ Step 3: GPT-Guided Interpretation & Site Selection

📍 Final Predicted Sites for Field Validation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages