Html2Markdown

Convert HTML to clean, readable Markdown. Designed for content extraction, this library handles common HTML patterns while filtering out non-content elements like navigation and and scripts.

Installation

Add html2markdown to your list of dependencies in mix.exs:

def deps do
  [
    {:html2markdown, "~> 0.3.0"}
  ]
end

Quick Start

# Basic conversion
Html2Markdown.convert("<h1>Hello World</h1><p>Welcome to <strong>Elixir</strong>!</p>")
# => "\n# Hello World\n\n\n\nWelcome to **Elixir**!\n"

# With custom options
Html2Markdown.convert(html, %{
  navigation_classes: ["nav", "menu", "custom-nav"],
  normalize_whitespace: true
})

Features

Smart Content Extraction: Automatically removes navigation, ads, and other non-content elements
HTML5 Support: Handles modern semantic elements like <details>, <mark>, <time>
Table Conversion: Converts HTML tables to clean Markdown tables
Entity Handling: Properly decodes HTML entities (&, <,  , etc.)
Configurable: Customize filtering and processing behavior

Configuration Options

Html2Markdown.convert(html, %{
  # CSS classes that identify navigation elements to remove
  navigation_classes: ["footer", "menu", "nav", "sidebar", "aside"],
  
  # HTML tags to filter out during conversion
  non_content_tags: ["script", "style", "form", "nav", ...],
  
  # Markdown flavor (currently :basic, future: :gfm, :commonmark)
  markdown_flavor: :basic,
  
  # Normalize whitespace (collapses multiple spaces, trims)
  normalize_whitespace: true
})

Common Use Cases

Web Scraping

Extract readable content from web pages:

{:ok, %{body: html}} = Req.get!(url)
markdown = Html2Markdown.convert(html)

Content Migration

Convert existing HTML content to Markdown:

# Convert blog posts from HTML to Markdown
html_content
|> Html2Markdown.convert(%{normalize_whitespace: true})
|> save_as_markdown()

Email Processing

Clean up HTML emails for plain text storage:

email_html
|> Html2Markdown.convert(%{
  non_content_tags: ["style", "script", "meta"],
  navigation_classes: ["unsubscribe", "footer"]
})

Supported Elements

Headings: <h1> through <h6>
Text: Paragraphs, emphasis (<em>, <i>), strong (<strong>, <b>)
Lists: Ordered and unordered lists with nesting
Links: <a> tags with proper URL handling
Images: <img> and <picture> elements
Code: Both inline <code> and block <pre> elements
Tables: Full table support with headers
Quotes: <blockquote> and <q> elements
HTML5: <details>, <summary>, <mark>, <abbr>, <cite>, <time>, <video>

Documentation

Full documentation is available at https://hexdocs.pm/html2markdown.

Development

This project includes comprehensive testing and quality assurance tools:

Running Tests

# Run all tests
mix test

# Run tests with coverage
mix coveralls.html

Code Quality

# Run all quality checks (formatting, security, linting)
mix quality

# Individual checks
mix format --check-formatted  # Code formatting
mix credo --only warning       # Code linting
mix sobelow --config          # Security analysis

CI/CD

This project uses GitHub Actions for continuous integration with:

Multi-version testing (Elixir 1.15-1.17, OTP 25-27)
Code quality enforcement
Security scanning
Test coverage reporting

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
lib		lib
test		test
.formatter.exs		.formatter.exs
.gitattributes		.gitattributes
.gitignore		.gitignore
.sobelow-conf		.sobelow-conf
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Html2Markdown

Installation

Quick Start

Features

Configuration Options

Common Use Cases

Web Scraping

Content Migration

Email Processing

Supported Elements

Documentation

Development

Running Tests

Code Quality

CI/CD

License

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

License

agoodway/html2markdown

Folders and files

Latest commit

History

Repository files navigation

Html2Markdown

Installation

Quick Start

Features

Configuration Options

Common Use Cases

Web Scraping

Content Migration

Email Processing

Supported Elements

Documentation

Development

Running Tests

Code Quality

CI/CD

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages