Two ways to host:
- Put
dynamic/index.html
into any static host (GitHub Pages, Netlify, Vercel, Cloudflare Pages, Hugging Face Spaces). - URL will work immediately; it fetches rows live from the Hugging Face datasets-server.
- Requires your crawler to execute JavaScript to see the
full_text
content.
- Requires your crawler to execute JavaScript to see the
- Run
python build.py --pages 1,2
locally (or let GitHub Actions do it). - It writes
docs/index.html
with allfull_text
baked in (no JS needed). - In GitHub repo settings: Pages → Build from branch → main → /docs.
- Add
robots.txt
andsitemap.xml
(edit domain placeholders).
- The included workflow builds
docs/index.html
on every push. You can set a repo variablePAGES
(e.g.0,1,2,3
).
- Each HF viewer page corresponds to 100 rows;
--pages 1,2
fetches rows 100–299. - Dataset:
slava-medvedev/zelensky-speeches
(license: CC BY 4.0). - If you need other datasets, edit
DATASET
,CONFIG
,SPLIT
inbuild.py
.