Interactive visual analytics for exploring etymology, phonology, descendants, and translations using Wiktionary data.
WiktionaryViz consists of a React + Vite frontend and a FastAPI backend that serves data from a large JSONL dump via an mmap-backed index for fast lookups.
• Live demo: https://vialab.github.io/WiktionaryViz/
- Node.js 18+ and npm (or pnpm)
- For backend via Docker: Docker + Docker Compose
- For backend without Docker: Python 3.9+, pip
- Data: the backend will auto-download the Wiktionary JSONL (20GB+ uncompressed) on first run by default.
- Backend (Docker, recommended for dev):
npm run backend:up
- Frontend (local dev):
npm run dev
- Optional Cloudflare dev tunnel (HTTPS for your backend):
# Start the dev-only tunnel service (compose profile: dev)
docker compose --profile dev up -d tunnel
docker compose logs -f tunnel # copy the trycloudflare.com URL
- Deploy frontend (GitHub Pages) pointing to your backend:
API=https://your-backend.example.com npm run deploy:api
- Install Node.js and dependencies:
npm install
- Start frontend:
npm run dev
- Start backend (Python FastAPI) without Docker (requires Python and pip):
# one-time in another shell
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# then, from repo root
npm run backend
- To run both concurrently:
npm run dev:full
- The frontend reads
API_BACKEND
at build/dev time. Create a.env
in the repo root (based on.env.example
) and set the backend URL:
API_BACKEND=http://localhost:8000
-
For GitHub Pages, set
API_BACKEND
at build time to your public backend URL (e.g.,https://api.example.com
). Ensure your backend has CORS enabled for your GitHub Pages origin. -
Convenience scripts also accept
API
and forward it toAPI_BACKEND
:
# Build only
API=https://api.example.com npm run build:api
# Deploy to GitHub Pages
API=https://api.example.com npm run deploy:api
- To run the backend via Docker Compose from the repo root:
docker compose up --build
By default, the container manages its own data in /app/data
and will auto-download the dataset if missing (see env vars below). Data persists across container recreation via a named volume declared in docker-compose.yml
.
To remove persisted data, delete the wiktionary-data
Docker volume.
- Start/stop backend (Docker Compose):
npm run backend:up
npm run backend:down
- Quick Cloudflare tunnel for dev (ephemeral URL):
docker compose --profile dev up -d tunnel
docker compose logs -f tunnel # copy the trycloudflare.com URL
- Build/deploy frontend with a specific backend API URL:
# Build only
API=https://your-backend.example.com npm run build:api
# Deploy to GitHub Pages with API injected
API=https://your-backend.example.com npm run deploy:api
.
├── src/ # React + Vite frontend (D3, Leaflet, Tailwind)
├── backend/ # FastAPI backend
│ ├── api_routes/ # API route modules
│ ├── data/ # wiktionary_data.jsonl + generated indices/stats
│ ├── services/ # PanPhon alignment, IO helpers
│ └── requirements.txt
├── docker-compose.yml # Backend + dev-only Cloudflare tunnel
├── vite.config.ts # Frontend build config (base /WiktionaryViz/)
└── package.json # Scripts for dev/deploy
- Backend Docker build:
.github/workflows/backend-docker.yml
- Builds the backend image on pushes to
main
when backend files change. - To push to a registry, add secrets and enable login in the workflow (Docker Hub or GHCR).
- Builds the backend image on pushes to
Environment variables used across the project:
-
Frontend
API_BACKEND
: Base URL for backend API (e.g.,https://api.example.com
). Used at build time.
-
Backend
ALLOWED_ORIGINS
: Comma-separated list of allowed CORS origins (e.g.,https://vialab.github.io
).OPENAI_API_KEY
(optional): Enables AI IPA estimation when needed.PORT
(optional): Backend port (defaults to 8000).WIKTIONARY_DATA_URL
(optional): URL to download the dataset on first run. Supports.gz
or plain.jsonl
. Defaults to Kaikki.org.gz
.SKIP_DOWNLOAD
(optional): Set to1
to skip auto-download. Default0
.
- FastAPI app runs on
http://localhost:8000
by default. - Interactive API docs:
http://localhost:8000/docs
- Common endpoints:
GET /word-data?word=tea&lang_code=en
GET /available-languages?word=tea
GET /random-interesting-word
GET /ancestry-chain?word=tea&lang_code=en
GET /phonetic-drift-detailed?ipa1=/tiː/&ipa2=/te/
GET /descendant-tree?word=tea&lang_code=en
GET /descendant-tree-from-root?word=proto-form&lang_code=la
-
Requests hitting GitHub Pages path (e.g.,
/random-interesting-word
) instead of backend:- Rebuild/deploy frontend with
API_BACKEND
set to your backend URL.
- Rebuild/deploy frontend with
-
Mixed content blocked (HTTPS page calling HTTP backend):
- Use HTTPS backend (Cloudflare tunnel or your own TLS).
-
CORS errors:
- Set
ALLOWED_ORIGINS=https://vialab.github.io
and restart backend.
- Set
-
No data or 404 for entries:
- If running locally without Docker, ensure
backend/data/wiktionary_data.jsonl
exists. In Docker, allow time for download and index build on first run.
- If running locally without Docker, ensure
Issues and pull requests are welcome. Please:
- Keep changes focused and documented.
- Run
npm run lint
for frontend changes; follow existing code style. - When touching the backend, update docs for any new endpoints.
MIT © Visualization for Information Analysis Lab. See LICENSE
.