-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Description
The tokenization performed by tree-sitter
can be slow for large datasets: up to 50% or even more is spent on this part.
Multiple improvements are possible:
- Try out similar parsers, possibly those that can be run in browser contexts as well, possibilities:
- https://github.com/microsoft/vscode-textmate - code highlighter used by vscode, requires WASM (but no native deps)
- https://github.com/highlightjs/highlight.js - a full JavaScript highlighter
- https://github.com/PrismJS/prism or https://github.com/tannerlinsley/reprism - pure JS as well with support for a lot of languages, but no longer maintained
- https://github.com/syntax-tree/unist - gestandardiseerd protocol voor syntax trees?
- Parallelize tokenization using workers
- Benchmark/profile tree-sitter, we might just be using it wrong?
Metadata
Metadata
Assignees
Labels
No labels