Skip to content

ETH-100/HarborX

HarborX is a high-performance data engine for Web3 that enhances data visibility and composability through open distribution.

You simply subscribe to the datasets you need, and HarborX will automatically catch up and stay in sync.

Datasets can be saved into your preferred database and queried with SQL or other familiar tools.

By standardizing all data into a unified intermediate format, any data stream can be published as a dataset — including social graphs, ENS records, token prices, transaction history, and more.

No more writing indexers, cleaning data, or juggling slow, complex API queries — everything happens locally.

👉 Live Demo · Architecture Guide

Core Capabilities

  • Subscribe & Go: Add a dataset with a single command (just like npm). HarborX fetches and continuously catches up — no need to build collectors, indexers, or multi-API queries.
  • Query Freely: Datasets are stored locally and can be queried directly using your favorite databases and SQL engines.
  • One-Click Publish: Turn your results into a new dataset and let the entire ecosystem benefit from your creativity.
  • Plug Any Upstream: Connect to diverse sources (Rollup Blobs, indexers, off-chain data, APIs, files, etc.) and compose them as needed.
  • Decentralized by Design: Persist datasets into decentralized storage so anyone can rebuild them — ensuring reliability even if the original source goes offline.

Quickstart (PoC)

Requirements: Python 3.9+

python -m pip install -U pip
pip install -e .[cli]

Add a Dataset

harborx add --base https://play.harborx.tech/data/ --subdir local -m --port 8080

You’ll see output similar to:

harborx add --base https://play.harborx.tech/data/ --subdir local -m --port 8080
[2025-08-13 23:46:03] 🔗 Using Lake base: https://play.harborx.tech/data
[2025-08-13 23:46:03] ⬇️  Fetching manifest.json from https://play.harborx.tech/data
[2025-08-13 23:46:04] 🧭 Rewriting relative paths → absolute URLs
[2025-08-13 23:46:04] 📄 No local manifest found, adopting remote manifest as local
[2025-08-13 23:46:04] 🧱 Materializing → HarborX\apps\web\data\local\objects
[2025-08-13 23:46:06]   ↓ https://play.harborx.tech/data/1456cfb63a334a39a06df3ee120daafb-0.arrow
    24cfd00e87ded65b.arrow done in 7.3s
[2025-08-13 23:46:17]   ↓ https://play.harborx.tech/data/2bb64d90458245e990c29acd605dadce-0.arrow
    db3c961ccec074e1.arrow done in 21.7s
[2025-08-13 23:52:05]   ↓ https://play.harborx.tech/data/chain_id=167001/date=20310/topic=state_diff/2bb64d90458245e990c29acd605dadce-0.parquet
    b2089242e6daa02b.parquet done in 1.7s
[2025-08-13 23:52:12]   ↓ https://play.harborx.tech/data/chain_id=167001/date=20310/topic=state_diff/7721bc175b1a4194a04f2779536f0d15-0.parquet
    4a1aea6b788713e2.parquet done in 1.5s
[2025-08-13 23:52:14] 🔍 Validating a few entries
[2025-08-13 23:52:14]   ✅ local OK: objects/24cfd00e87ded65b.arrow
[2025-08-13 23:52:14]   ✅ local OK: objects/db3c961ccec074e1.arrow
[2025-08-13 23:52:14]   ✅ local OK: objects/360ce304038b70f4.parquet
[2025-08-13 23:52:14]   ✅ local OK: objects/d7e92bfb8c72c3ac.parquet
[2025-08-13 23:52:14] 📊 Merge summary:
  • arrow:   2 items
  • parquet: 2 items
  • files:   0 items
  • segments:0 items
  (Δ added) arrow:+2 parquet:+2 files:+0 segments:+0
  Probed OK:4 FAIL:0
[2025-08-13 23:52:14] 🚀 Starting harborx static server at http://127.0.0.1:8080
[2025-08-13 23:52:14]     Serving directory: HarborX\apps\web (manifest at HarborX\apps\web\data\local\manifest.json)
[serve] http://127.0.0.1:8080 (root=HarborX\apps\web)

This is a PoC example; in production HarborX will automatically catch up and stay in sync.


Use Existing Static Data

Place your dataset (including manifest.json) under apps/web/data/ and run:

harborx serve --dir apps/web --port 8080
# Open http://127.0.0.1:8080

Fetch a Small Real Dataset

harborx blobscan --limit 3 --out apps/web/data --parquet
harborx manifest --root apps/web --data data --include-parquet
harborx serve --dir apps/web --port 8080

On the page, click Rebuild state to load the files, then try queries like:

SELECT COUNT(*) AS rows FROM state;

SELECT value, COUNT(*) AS c
FROM state
GROUP BY value
ORDER BY c DESC
LIMIT 10;

-- Last-write-wins (LWW) simple example
WITH ranked AS (
  SELECT value, timestamp, blob_index, position,
         ROW_NUMBER() OVER (ORDER BY timestamp DESC, blob_index DESC, position DESC) rn
  FROM state
)
SELECT value, timestamp
FROM ranked
WHERE rn = 1
LIMIT 50;

The frontend demo runs entirely in the browser with DuckDB-WASM, reading Arrow/Parquet files over HTTP.

No server-side database is required.


CLI Examples

All commands are provided via the single entrypoint harborx.

Fetch blobs + write Arrow/Parquet + generate manifest

harborx blobscan --limit 3 --out apps/web/data --parquet
harborx manifest --root apps/web --data data --include-parquet

Serve the static web demo

harborx serve --dir apps/web --port 8080

Frontend Demo

  • Location: apps/web/
  • Data folder: apps/web/data/
  • Manifest: apps/web/data/manifest.json

The app resolves file paths based on the manifest.


License

Apache-2.0. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published