Skip to content

[Bazel] Dynamically generate the Emscripten cache #1402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

[Bazel] Dynamically generate the Emscripten cache #1402

wants to merge 1 commit into from

Conversation

allsey87
Copy link
Contributor

@allsey87 allsey87 commented Jun 12, 2024

This is a draft PR for sharing my progress on this feature. The idea is to use embuilder to generate the Emscripten cache dynamically by specifying what you want upfront and in which configuration (i.e., wasm64, lto, pic etc).

For now I would really appreciate any input on these remaining problems:

  1. Output names
    Some libraries generate .a and some .o, I suspect that it is just the c runtime libraries that emit the .o files, so I could tell Bazel that if the name of the library begins with crt then its output is .o?

  2. Overlaying the cache
    I added my output to link_files but this seems to be ignored by the Emscripten linker. When I compile a program, Emscripten considers the cache to be: external/emscripten_bin_linux/emscripten/cache/ while, the cache assets from my genrule are actually stored at: execroot/_main/bazel-out/k8-opt-exec-ST-13d3ddad9198/bin/external/emscripten_bin_linux/emscripten/cache/. I am not sure what the best approach for moving forward here is.

  3. Portability
    I haven't found away to get rid of the dirname and realpath for the moment. It seems almost impossible to provide a rule in Bazel with a path which means it is very difficult to set the values of BINARYEN_ROOT etc for embuilder. The best solution I can think of for the moment is to leave the Bash-isms in place and just use cmd_bash and cmd_bat (provided by genrule) to set the environment variables.

Related issues:

@allsey87
Copy link
Contributor Author

@walkingeyerobot let me know if you think this is going in a reasonable direction or if you would do something differently

export EM_BINARYEN_ROOT=$$(realpath $$(dirname $$(dirname $(location :emscripten/embuilder.py))))
export EM_LLVM_ROOT=$$EM_BINARYEN_ROOT/bin
export EM_EMSCRIPTEN_ROOT=$$EM_BINARYEN_ROOT/emscripten
export EM_CACHE=$(RULEDIR)/emscripten/cache
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values normally part of the config file. See https://github.com/emscripten-core/emsdk/blob/main/bazel/emscripten_toolchain/emscripten_config

I think the only one you want to override here is EM_CACHE. The others should all come from that config file.

Copy link
Contributor Author

@allsey87 allsey87 Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem there (I think/I will test this) is that emscripten_config is part of the @emsdk// while this code is inside @emscripten_bin_XXX// so this might create a circular dependency.

I think the only one you want to override here is EM_CACHE. The others should all come from that config file.

Also the values of EMSCRIPTEN_ROOT, BINARYEN_ROOT, LLVM_ROOT in emscripten_config are based on environment variables which are set in env.sh/env.bat, which in turn, depends on environment variables set in toolchain.bzl.

@sbc100
Copy link
Collaborator

sbc100 commented Jun 12, 2024

Is there some shared, writable location that we can use is the one true location for the emscripten cache? Somewhere that is both readable and writable when the build rules run?

@allsey87
Copy link
Contributor Author

allsey87 commented Jun 13, 2024

Somewhere that is both readable and writable when the build rules run?

You can always set the following in .bazelrc

build --action_env=EM_CACHE=/tmp/emscripten_cache --action_env=EM_FROZEN_CACHE=0

This seems to work, well, sort of... The cache is built fine but it is breaking my builds for the moment. When using:

  • configure_make, I am getting errors about duplicate symbols __vfprintf_internal, vfiprintf, vfprintf, __small_vfprintf.
  • cc_binary, I am getting an error about llvm_ar not been found.
emcc: error: Cannot find /home/developer/.cache/bazel/_bazel_developer/25e07d78077dfe1eca932359d50e41ef/sandbox/processwrapper-sandbox/1/execroot/_main/external/emscripten_bin_linux/bin/llvm-ar

I thought the latter issue with cc_binary was unrelated to what I added in .bazelrc, but after running bazel clean --expunge and using upstream emsdk (without the changes in this PR) and building with the above line off/on, it really seems like it is the setting of those environment variables that is causing the issue...

@allsey87
Copy link
Contributor Author

allsey87 commented Jun 13, 2024

I thought I had a really nice alternative solution to what I have done so far. The idea would have been to move cache generation from inside emscripten_bin_XXX to emsdk after the toolchain has been set up and then during builds override EM_CACHE on a per target basis.

For example, you set up a secondary cache in your WORKSPACE with something like:

emsdk_emscripten_cache(
    name = "pic_cache"
    mode = ["--pic"],
    libraries = ["crtbegin", "libprintf_long_double-debug"]
)

and then for each target that needs to use this cache, you could set EM_CACHE=//:pic_cache via env. The problem: the meaning of env seems to vary a lot in Bazel. For foreign cc rules like configure_make, this works since the entries in env are set during build. However, for the inbuilt cc_binary, env appears to be only set at runtime.

In addition to setting EM_CACHE globally (as per the previous comment), I am also experimenting with the idea of using select in the toolchain to swap out emscripten_config for an alternative, generated configuration file that sets CACHE to the secondary cache iff it is declared. At the moment, I am stuck on an issue with the paths which I asked about in the Bazel user group.

@allsey87 allsey87 closed this by deleting the head repository Jun 14, 2024
@allsey87
Copy link
Contributor Author

Closed in favour of #1405

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants