Skip to content

General suggestions #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
toddstrader opened this issue May 19, 2025 · 1 comment
Open

General suggestions #2

toddstrader opened this issue May 19, 2025 · 1 comment

Comments

@toddstrader
Copy link
Member

This is great, thanks for adding. Some quick suggestions off the top of my head:

A forced retry option would be nice. It seems that --retry will only re-run failing steps, but when I had to make Verilator changes which fixed C++ compilation, verilation hadn't failed so I had to rm -rf work in order to get things to run again.

Running all the cases is a little annoying because the shell will expand * into non-useful things. It appears to do the thing I want when I escape * so maybe just a note in --help about this would be good.

Parallelization of the different steps would be amazing. I'm sure this would be a bit tricky because the verilated Makefiles already -j, but it would be super nice to be able to chew through these different tests on all the cores I have.

It's a little unfortunate that the config is split between this repo and rtlmeter.yml back in the Verilator repo. For instance, some cases are commented out in that YAML, so I'm guessing I should expect them to pass but then it's less easy to locally run all the "good" cases. But I'm guessing just doing cases: "*" would be less idea in terms of CI throughput. So I don't exactly know what to suggest other than mentioning that it's nice to be able to simply run what Verilator's CI is doing myself.

@gezalore
Copy link
Member

Thanks for the comments, just sharing some thoughts in exchange to provide context:

It seems that --retry will only re-run failing steps,

This is correct. --retry is more intended to help with bringing up new cases which might have syntax errors, than for tool changes. I agree a --clean might be useful, but I'm not sure it's worth the internal complexity compared to rm -rf work; ./rtlmeter run ..., plus there are some usage patterns where you want to reuse the old artefacts even if the tool changed (gathering more samples for a previous version), so I don't mind having the "now do the whole thing again" a bit more explicit.

the shell will expand * into non-useful things. It appears to do the thing I want when I escape * so maybe just a note in --help about this would be good.

FWIW there is a note about this in the docs: https://verilator.github.io/rtlmeter/running.html#specifying-cases

It's a little unfortunate that the config is split between this repo and rtlmeter.yml back in the Verilator repo. For instance, some cases are commented out in that YAML, so I'm guessing I should expect them to pass but then it's less easy to locally run all the "good" cases. But I'm guessing just doing cases: "*" would be less idea in terms of CI throughput. So I don't exactly know what to suggest other than mentioning that it's nice to be able to simply run what Verilator's CI is doing myself.

Agree it would be nice to be able to run just the same as the Verilator CI. I intended RTLMeter to be more generally useful, and not be just a Verilator specific thing, so I believe storing the subset of the cases to run in the Verailtor CI belongs in the Verilator repo.

There are a couple issues as to why not just run everything in the Verilator CI:

  1. The github runners have 16G RAM, some cases need more
  2. The verilated C++ of some configurations of XiangShan won't build with GCC due to hitting some algorithmic bottleneck, but works fine with Clang. (Hence the difference in what is run in the Verilator CI with GCC vs Clang.) We will need to fix this in Verilator at some point (it's a tricky code splitting problem) and report to GCC.
  3. Some cases take a very long time to run, which is not ideal to run nightly

Short summary:

  • Everything should work [*] with clang, but will take a long time to run
  • Everything except the commented out XiangShan cases should work with GCC
  • One nit: I have not actually tried the OpenTitan:32x32 cases as they need more than 128G memory, which is the largest I currently have access to. Help trying these would be appreciated, but I would expect those case to take a weekend to run.

As to easily running the same: The docs link above explains how you can use a case list file with ./rtlmeter run --cases @filename, that's probably the easiest. I'm happy for other suggestions though. Unfortunately we can't store the case list file in the Verilator repo, as the workflow setup needs it in the YAML to define the parallel job matrix.

You probably know this, but you can also trigger the RTLMeter workflow manually on your fork/branch if you want to do a check same as the CI.

Parallelization of the different steps would be amazing.

I understand how this can be helpful for "testing", however the intention with RTLMeter is to enable collecting high quality performance [latency] data, for which we definitely don't want to run verilation or execution in parallel, so this was not a priority, and I'm somewhat concerned it would be abused when doing actual benchmarking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants