-
-
Notifications
You must be signed in to change notification settings - Fork 62
Description
Check List
- I have already read README.
- I have already searched existing issues.
- I have already searched existing pull requrests.
Feature Request
The current encode_url function in hexo-util is based on an implementation from six years ago, which redundantly uses both url.parse and new URL() for double parsing. This redundant parsing logic leads to significant performance bottlenecks. In a benchmark test using a content-heavy environment (8x hexo-many-posts) with the hexo-theme-reimu theme, the total build time reached 36 seconds, with the encode_url function alone accounting for 5 seconds of that time.
After reading the implementation of url.parse in Node.js (https://github.com/nodejs/node/blob/main/lib/url.js), I believe we can replace the conditional check if (parse(str).protocol) with a simpler logic.
The parsing of the protocol by parse is roughly as follows:
- Asserts that the input must be a string
- Ignores leading and trailing whitespace characters
- Replaces backslashes before the query symbol with forward slashes
- Uses the regex
/^[a-z0-9.+-]+:/ito extract the protocol
During the parsing process, three types of errors may be thrown: ERR_INVALID_ARG_TYPE (non-string input), ERR_INVALID_URL (invalid hostname), and ERR_INVALID_ARG_VALUE (invalid port). The first error can be thrown proactively by our implementation. The second error will also be thrown by new URL(). As for the third error, I have not been able to reproduce it successfully, and I suspect it might be a product of defensive programming.
I suppose we could implement the original protocol parsing logic as follows:
const PROTOCOL_RE = /^[a-z0-9.+-]+:/i;
const hasProtocolLikeNode = (str: unknown): boolean => {
if (typeof str !== 'string') throw new TypeError('url must be a string');
return PROTOCOL_RE.test(str.trim());
};The implementation above passes all existing unit tests and is theoretically similar to the current implementation. Most importantly, it delivers a massive performance improvement, reducing the total execution time from 5 seconds to around 300 ms.
Do you think this modification is worthwhile?
Additional context
No response