Skip to content

Performance Issue with Nokogiri #3142

Open
@nirvdrum

Description

@nirvdrum

We have some tests in the test suite for an internal Rails app that take an inordinate amount of time in TruffleRuby. On my developer machine a single test takes ~50s. I don't have numbers from CI yet (still working on that), but the CI machines are far less capable so I wouldn't be surprised if they take at least a minute. It's quite noticeable.

Running the tests isolated and with the CPU profiler, most of the time is spent in Nokogiri. I've tried to narrow it down to a simpler subset. Unfortunately, my simplest examples didn't have the same performance issues. So, I've left in a few dependencies:

  • bootstrap-email
    • Styles HTML emails with the Bootstrap UI toolkit
  • premailer
    • Inlines CSS rules into an HTML email body to improve client compatibility (many email clients won't load in linked stylesheets)
  • ActionMailer
    • The Rails component responsible for rendering emails. It's possible to construct an email body without this, but I found performance was worse when ActionMailer was used

I've pulled together a repo with a representative example. It's not exactly what we use in the app. In particular, the email bodies are different. But, I think it encompasses the key details and can be open sourced for public discussion and used for test cases. The performance in this repo isn't quite as bad as what I'm seeing with the app test suite. I need to investigate more to see why that is. While the reproduction isn't an exact reflection of reality, but I'm hopeful it shows enough to get us going. Processing a document shouldn't take 1s.

A confounding issue is the bootstrap-email gem compiles the Bootstrap CSS files with sassc at start-up. That process takes several seconds to complete. In CI, we end up compiling each time we perform a new run because temporary files are not maintained across test runs. Locally, the cache will be populated and used unless you explicitly remove it (rm -rf tmp/cache). It's best to keep the cache if you're profiling the Nokogiri usage but it's more representative of what CI is doing by starting in a clean state.

With TruffleRuby 23.0.0 we changed out the underlying VM with one that's more performant in various ways. That creates a bit of an interesting situation where the release build handles some performance matters for us that dev builds do not, but we lack the ability to build TruffleRuby with that VM because it's not open source. So, I'm providing numbers here from both:

TruffleRuby 23.0.0 + Oracle GraalVM

================================================================================
Email: 0
Took: 4.367758917003812s
================================================================================

================================================================================
Email: 1
Took: 0.8536989160056692s
================================================================================

================================================================================
Email: 2
Took: 0.9239389580034185s
================================================================================

================================================================================
Email: 3
Took: 0.49181870800384786s
================================================================================

TruffleRuby 23.1.0-dev (0d3058b) + GraalVM CE

================================================================================
Email: 0
Took: 10.682570666001993s
================================================================================

================================================================================
Email: 1
Took: 4.075295624999853s
================================================================================

================================================================================
Email: 2
Took: 2.1472915840058704s
================================================================================

================================================================================
Email: 3
Took: 2.8145212919989717s
================================================================================

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions