Skip to content

Improve build performance #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 8, 2024

Conversation

JLHwung
Copy link
Contributor

@JLHwung JLHwung commented Nov 6, 2024

This PR contains commits from #82. I will rebase once that PR gets merged.

This PR fixed a performance regression introduced in #80 where we generated the bidi_mirroring_glyph data when enumerating the glyph mappings, although the final data is correct, this is apparently not desired.

We also removed the cluster building approach because at this point, it is more efficient to run the V8 optimized code again and again for several seconds than forking multiple V8 processes simutaneously optimizing code which does not run for a very long time.

Last we used the native JSON.stringify when generating the gzipped data. Because the data is gzipped and not human-readable anyway, there is not very much benefit running jsesc before gzip.

Unsurprisingly, the last commit generates lots of output diffs. However, they are equivalent for end users, so I didn't attach the output diff in this PR.

Runtime measured in my local environment (Apple M1 Max 10c)

# Current main:
$ time npm run build
npm run build  15.51s user 9.99s system 350% cpu 7.267 total

# This PR:
$ time npm run build
npm run build  9.01s user 10.30s system 297% cpu 6.497 total
The time spent on each task when processing Unicode 16.0.0 in this PR (Click to unfold)
Generating data for Unicode v16.0.0…
Parsing Unicode v16.0.0 `Bidi_Class`
bidi_class: 58.685ms
Parsing Unicode v16.0.0 `Script`…
scripts: 50.533ms
Parsing Unicode v16.0.0 `Script_Extensions`…
script extensions: 48.772ms
Parsing Unicode v16.0.0 properties…
properties: 13.801ms
Parsing Unicode v16.0.0 derived core properties…
derived properties: 36.039ms
Parsing Unicode v16.0.0 derived general category…
derived general category: 28.396ms
Parsing Unicode v16.0.0 derived normalization properties…
derived normalization properties: 51.644ms
Parsing Unicode v16.0.0 derived binary properties…
derived binary properties: 0.641ms
Parsing Unicode v16.0.0 composition exclusions…
compostion exclusions: 0.48ms
Parsing Unicode v16.0.0 `Case_Folding`…
case folding: 4.197ms
Parsing Unicode v16.0.0 `Block`…
block: 87.287ms
Parsing Unicode v16.0.0 `Bidi_Mirroring_Glyph`
bidi mirroring glyph: 1.353ms
Parsing Unicode v16.0.0 bidi brackets…
bidi brackets: 1.573ms
Parsing Unicode v16.0.0 `Line_Break`…
line break: 21.974ms
Parsing Unicode v16.0.0 `Word_Break`…
word break: 9.982ms
Parsing Unicode v16.0.0 binary emoji properties…
binary emoji properties: 6.312ms
Parsing Unicode v16.0.0 emoji sequence properties…
emoji sequence properties: 17.331ms
Parsing Unicode v16.0.0 `Names`…
names: 112.647ms
Parsing Unicode v16.0.0 Aliases…
aliases: 1.282ms
Parsing Unicode v16.0.0 simple case mappings…
simple case mappings: 23.97ms
Parsing Unicode v16.0.0 `Special_Casing`…
special casing: 6.342ms

for (const resource of resources) {
const version = resource.version;
console.log('[%s] Unicode v%s', getTime(), version);
console.groupCollapsed();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a bonus, this refactor also fixes another build hazard: Previously when there is an error thrown from the building worker, the cluster main does not exit early, nor does it exit with non-zero code. Now that all building steps are within one process, any thrown errors should invalidate the build step.

Copy link
Contributor Author

@JLHwung JLHwung Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to revert the remove cluster commit because I overlooked the total time reported by the zsh time. There is still improvement from multiple processes. I will see if I can fix the build hazard directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the mentioned build hazard in c6f9130.

See https://github.com/node-unicode/node-unicode-data/actions/runs/11724848834/job/32659747285 for the error log example.

Copy link
Collaborator

@mathiasbynens mathiasbynens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to merge after rebase. Thanks a lot for your continued work on this!

In node-unicode#80 we made a performance regression that the bidiMirroringGlyphMap is generated in the property iteration. This is also fixed.
@JLHwung JLHwung force-pushed the improve-build-performance branch from c6c50a9 to 89544f1 Compare November 7, 2024 13:08
@JLHwung JLHwung requested a review from mathiasbynens November 7, 2024 13:26
@mathiasbynens mathiasbynens merged commit fd3bb32 into node-unicode:main Nov 8, 2024
1 check passed
@mathiasbynens
Copy link
Collaborator

Rebuilding all packages after merging this, I noticed that Bidi_Mirroring_Glyph no longer gets mentioned in the generated READMEs, e.g. node-unicode/unicode-16.0.0@97cb5e3#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5. Is this intended?

@JLHwung
Copy link
Contributor Author

JLHwung commented Nov 8, 2024

Rebuilding all packages after merging this, I noticed that Bidi_Mirroring_Glyph no longer gets mentioned in the generated READMEs, e.g. node-unicode/unicode-16.0.0@97cb5e3#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5. Is this intended?

No, this should be a bug. I will open a new PR.

@JLHwung JLHwung deleted the improve-build-performance branch November 8, 2024 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants