is swc_html_parser general-purpose? #7695

untitaker · 2023-07-22T14:12:53Z

untitaker
Jul 22, 2023

I saw that swc-html-parser is running against the html5lib-tests testsuite.

This is very interesting to me because I find the current state of html5 parsers in Rust not satisfying. html5ever lags behind the spec and lol-html is too hard to use. I wrote html5gum which is just the tokenizer/lexer.

What were the considerations when building your own html5 parser? Was there a set of requirements not covered by other parsers? What would be required to make it a general-purpose library, except from docs?

I see that you have an implementation of tree builder which I would love to incorporate into html5gum. Skimming through the code I think html5gum tokenizer might be faster than swc (it e.g. has fastpaths for data->data transitions), but I haven't tested it.

kdy1 · 2023-07-22T19:05:11Z

kdy1
Jul 22, 2023
Maintainer

Yes, it's general-purpose. @alexander-akait will know more, but the goals are

correctness (spec-compliant)
no lag behind spec
general purpose
easy API

in order.

Correctness:

SWC is a compiler, not a runtime. I think the first focus of the compiler should be correctness.

No lag behind:

SWC is a compiler for the future. SWC started as a transpiler for future ES syntax, and I think it's quite important also for CSS and HTML.

General purpose:

All AST definitions of SWC are pure. i.e. No project-specific field, no project-specific optimization like transforming AST in the parser.

Easy API:

I know too easy API comes at some cost. But I didn't want something like CowRcStr. Rust is hard enough, and I focused on easy API (for Rust) from the start. i.e. No lifetime on AST definition

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

is swc_html_parser general-purpose? #7695

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

is swc_html_parser general-purpose? #7695

Uh oh!

Uh oh!

untitaker Jul 22, 2023

Replies: 1 comment

Uh oh!

kdy1 Jul 22, 2023 Maintainer

untitaker
Jul 22, 2023

kdy1
Jul 22, 2023
Maintainer