Are you looking for a lightweight, extensible way to convert from HTML to any document format?
Convert any HTML into production‑ready documents — DOCX today, PDF today, XLSX tomorrow.
html‑to‑document parses HTML into an intermediate, format‑agnostic tree and then feeds that tree to adapters (e.g. DOCX, PDF).
Write HTML → get Word, PDFs, spreadsheets, and more — all with one unified TypeScript API.
Below is a high-level overview of the conversion pipeline. The library processes the HTML input through optional middleware steps, parses it into a structured intermediate representation, and then delegates to an adapter to generate the desired output format.
The stages are:
- Input: Raw HTML input as a string.
- Middleware: One or more middleware functions can inspect or transform the HTML string before parsing (e.g., sanitization, custom tags).
- Parser: Converts the (possibly modified) HTML string into an array of
DocumentElementobjects, representing a structured AST. - Adapter: Takes the parsed
DocumentElement[]and renders it into the target format (e.g., DOCX, PDF, Markdown) via a registered adapter.
| Feature | Description |
|---|---|
| Format‑agnostic core | Converts HTML into a reusable DocumentElement[] structure |
| DOCX adapter (built‑in) | Powered by docx with rich style support |
| Pluggable adapters | Create and add your own adapter for PDF, XLSX, Markdown, etc. |
| Style mapping engine | Define your own css mappings for the adapters and set per‑format defaults |
| Custom tag handlers | Override or extend how any HTML tag is parsed |
| Page sections & headers | Use <section class="page">, <section class="page-break">, <header> and <footer> to control pages in DOCX |
| Middleware pipeline | Transform or sanitise HTML before parsing |
npm install html-to-documentimport { init, DocxAdapter } from 'html-to-document';
import fs from 'fs';
const converter = init({
adapters: {
register: [
{ format: 'docx', adapter: DocxAdapter },
],
},
});
const html = '<h1>Hello World</h1>';
const buffer = await converter.convert(html, 'docx'); // ↩️ Buffer in Node / Blob in browser
fs.writeFileSync('output.docx', buffer);You can provide adapter-specific configuration to register custom element converters when initializing. For example, with DocxAdapter:
const converter = init({
adapters: {
register: [
{
format: 'docx',
adapter: DocxAdapter,
config: {
blockConverters: [new MyBlockConverter()],
inlineConverters: [new MyInlineConverter()],
fallthroughConverters: [new MyFallthroughConverter()],
},
},
],
},
});📖 For more on writing custom element converters, see the Custom Converters guide: https://html-to-document.vercel.app/docs/api/converters
Headers & Footers
When converting to DOCX, you can include
<header>and<footer>elements in your HTML. These will become page headers and footers in the output document. See the html-to-document-adapter-docx package for complete usage details.
import { init } from 'html-to-document';
// DOCX adapter is included. For PDF support:
// npm i html-to-document-adapter-pdf
// Docs: https://www.npmjs.com/package/html-to-document-adapter-pdf
import { DocxAdapter } from 'html-to-document-adapter-docx';
const converter = init({
adapters: {
register: [
{
format: 'docx',
adapter: DocxAdapter,
// Optional adapter-specific config:
// config: {
// blockConverters: [...],
// inlineConverters: [...],
// fallthroughConverters: [...],
// },
},
],
},
});Tip: you can bundle multiple adapters:
register: [ { format: 'docx', adapter: DocxAdapter }, { format: 'pdf', adapter: PdfAdapter }, ] // To install PDF support, run: // npm i html-to-document-adapter-pdf // See docs: https://www.npmjs.com/package/html-to-document-adapter-pdf
The rest of the API stays the same—convert(html, 'docx'), convert(html, 'pdf'), etc.
Need just the parsed structure?
const elements = await converter.parse('<p>Some HTML</p>');
console.log(elements); // => DocumentElement[]| Resource | Link |
|---|---|
| Full Docs | https://html-to-document.vercel.app/ |
| Live Demo (TinyMCE) | https://html-to-document-demo.vercel.app |
- Style mappings: fine‑tune CSS → DOCX/PDF with
StyleMapper - Tag handlers: intercept
<custom-tag>→ your ownDocumentElement - Custom adapters: implement
IDocumentConverterto target new formats
To create a new adapter from scratch in your own project:
-
Install the core types:
npm install html-to-document-core
This package contains the necessary interfaces and type definitions like
DocumentElement,StyleMapper, andIDocumentConverter. -
Implement your adapter based on the documentation here:
Custom Converters Guide
See the Extensibility Guide.
Contributions are welcome!
Please read CONTRIBUTING.md and follow the Code of Conduct.
All notable changes are documented in CHANGELOG.md.
ISC — a permissive, MIT‑style license that allows free use, modification, and distribution without requiring permission.
