HarborX is a high-performance data engine for Web3 that enhances data visibility and composability through open distribution.
You simply subscribe to the datasets you need, and HarborX will automatically catch up and stay in sync.
Datasets can be saved into your preferred database and queried with SQL or other familiar tools.
By standardizing all data into a unified intermediate format, any data stream can be published as a dataset — including social graphs, ENS records, token prices, transaction history, and more.
No more writing indexers, cleaning data, or juggling slow, complex API queries — everything happens locally.
👉 Live Demo · Architecture Guide
- Subscribe & Go: Add a dataset with a single command (just like npm). HarborX fetches and continuously catches up — no need to build collectors, indexers, or multi-API queries.
- Query Freely: Datasets are stored locally and can be queried directly using your favorite databases and SQL engines.
- One-Click Publish: Turn your results into a new dataset and let the entire ecosystem benefit from your creativity.
- Plug Any Upstream: Connect to diverse sources (Rollup Blobs, indexers, off-chain data, APIs, files, etc.) and compose them as needed.
- Decentralized by Design: Persist datasets into decentralized storage so anyone can rebuild them — ensuring reliability even if the original source goes offline.
Requirements: Python 3.9+
python -m pip install -U pip
pip install -e .[cli]harborx add --base https://play.harborx.tech/data/ --subdir local -m --port 8080You’ll see output similar to:
harborx add --base https://play.harborx.tech/data/ --subdir local -m --port 8080
[2025-08-13 23:46:03] 🔗 Using Lake base: https://play.harborx.tech/data
[2025-08-13 23:46:03] ⬇️ Fetching manifest.json from https://play.harborx.tech/data
[2025-08-13 23:46:04] 🧭 Rewriting relative paths → absolute URLs
[2025-08-13 23:46:04] 📄 No local manifest found, adopting remote manifest as local
[2025-08-13 23:46:04] 🧱 Materializing → HarborX\apps\web\data\local\objects
[2025-08-13 23:46:06] ↓ https://play.harborx.tech/data/1456cfb63a334a39a06df3ee120daafb-0.arrow
24cfd00e87ded65b.arrow done in 7.3s
[2025-08-13 23:46:17] ↓ https://play.harborx.tech/data/2bb64d90458245e990c29acd605dadce-0.arrow
db3c961ccec074e1.arrow done in 21.7s
[2025-08-13 23:52:05] ↓ https://play.harborx.tech/data/chain_id=167001/date=20310/topic=state_diff/2bb64d90458245e990c29acd605dadce-0.parquet
b2089242e6daa02b.parquet done in 1.7s
[2025-08-13 23:52:12] ↓ https://play.harborx.tech/data/chain_id=167001/date=20310/topic=state_diff/7721bc175b1a4194a04f2779536f0d15-0.parquet
4a1aea6b788713e2.parquet done in 1.5s
[2025-08-13 23:52:14] 🔍 Validating a few entries
[2025-08-13 23:52:14] ✅ local OK: objects/24cfd00e87ded65b.arrow
[2025-08-13 23:52:14] ✅ local OK: objects/db3c961ccec074e1.arrow
[2025-08-13 23:52:14] ✅ local OK: objects/360ce304038b70f4.parquet
[2025-08-13 23:52:14] ✅ local OK: objects/d7e92bfb8c72c3ac.parquet
[2025-08-13 23:52:14] 📊 Merge summary:
• arrow: 2 items
• parquet: 2 items
• files: 0 items
• segments:0 items
(Δ added) arrow:+2 parquet:+2 files:+0 segments:+0
Probed OK:4 FAIL:0
[2025-08-13 23:52:14] 🚀 Starting harborx static server at http://127.0.0.1:8080
[2025-08-13 23:52:14] Serving directory: HarborX\apps\web (manifest at HarborX\apps\web\data\local\manifest.json)
[serve] http://127.0.0.1:8080 (root=HarborX\apps\web)This is a PoC example; in production HarborX will automatically catch up and stay in sync.
Place your dataset (including manifest.json) under apps/web/data/ and run:
harborx serve --dir apps/web --port 8080
# Open http://127.0.0.1:8080harborx blobscan --limit 3 --out apps/web/data --parquet
harborx manifest --root apps/web --data data --include-parquet
harborx serve --dir apps/web --port 8080On the page, click Rebuild state to load the files, then try queries like:
SELECT COUNT(*) AS rows FROM state;
SELECT value, COUNT(*) AS c
FROM state
GROUP BY value
ORDER BY c DESC
LIMIT 10;
-- Last-write-wins (LWW) simple example
WITH ranked AS (
SELECT value, timestamp, blob_index, position,
ROW_NUMBER() OVER (ORDER BY timestamp DESC, blob_index DESC, position DESC) rn
FROM state
)
SELECT value, timestamp
FROM ranked
WHERE rn = 1
LIMIT 50;The frontend demo runs entirely in the browser with DuckDB-WASM, reading Arrow/Parquet files over HTTP.
No server-side database is required.
All commands are provided via the single entrypoint harborx.
Fetch blobs + write Arrow/Parquet + generate manifest
harborx blobscan --limit 3 --out apps/web/data --parquet
harborx manifest --root apps/web --data data --include-parquetServe the static web demo
harborx serve --dir apps/web --port 8080- Location:
apps/web/ - Data folder:
apps/web/data/ - Manifest:
apps/web/data/manifest.json
The app resolves file paths based on the manifest.
Apache-2.0. See LICENSE.