-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Finer grained precompile native code cache (part 1) #58592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This is fantastic! I have long been wanting to explore more fine-grained compilation caching using approaches like My big picture questions would be:
Another concern is the many small files since they often perform very badly, so something along the line of |
It's a little different from the GPUCompiler approach, since it makes no attempt The idea with this PR is to do the minimum thing that will give a correct llvm-cas possibly being merged soon is the reason I have so far avoided pulling So far, there is no invalidation, pending a switch to a real KV store. |
This confusingly sounds like something we already have. It might be good to make it clearer in the title and top comment how this differentiates from the current situation. |
It sounds like that isn't an issue for this approach, but it could be possible to have a cache keyed by CodeInstances that is only used during precompilation, where uniqueness is guaranteed. Yes, precompilation is already generating a cached code, but we can have multiple layers of caching, and this could be used to speed up precompilation. |
Reverted two commits that were an attempt at getting better cache hit rates with the simplest possible change, in favour of doing a link-time rename step. |
src/codegen.cpp
Outdated
static StringMap<uint64_t> fresh_name_map; | ||
static std::mutex fresh_name_lock; | ||
|
||
static void freshen_name(std::string &name) | ||
{ | ||
uint64_t n; | ||
{ | ||
std::lock_guard guard{fresh_name_lock}; | ||
n = fresh_name_map[name]++; | ||
} | ||
raw_string_ostream(name) << "_" << n; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried that this map could grow big. What issue are you trying to solve here? Just that the order of generation causes different names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was an attempt at getting reasonable hit rates with the minimum amount of work; it turned out not to be worth it. The new plan is to abandon globally unique names at this stage and resolve things after code generation, but before linking.
Overview
This pull request adds a new mode for precompiling sysimages and pkgimages that caches the results of compiling each LLVM module, reducing the time spent emitting native code for CodeInstances that generate identical LLVM IR. For now, it works only when using the ahead-of-time compiler, but the approach is also valid for JITed code.
Usage
Set
JULIA_NATIVE_CACHE=<dir>
to look for and store compiled objects in<dir>
. WhenJULIA_IMAGE_TIMINGS=1
is also set, the cache hit rate will be printed, like:Internals
Normally,
jl_emit_native
emits every CodeInstance into a separate LLVM module before combining everything into a single module. When the fine-grained cache is enabled, the modules are serialized to bitcode separately. The cache key for each module is computed from the hash of the serialized bitcode and the LLVM version, and the compiled object file is the value.The
partitionModule
pass is not run, since multi-threaded compilation is done by havingJULIA_IMAGE_THREADS
worker threads take from the queue of serialized modules.When the fine-grained cache is used, we generate a single
jl_image_shard_t
table for the entire image. Thegvar_offsets
are resolved by the linker.Currently, the cache uses LLVM's
FileCache
, a thread safe key-value store that uses one file per key and write-and-rename to implement atomic updates. It's a convenient choice for development because the contents can be easilyobjdump
ed, but the long term plan is to switch to a more appropriate database, be it LLVMCAS when it is merged, or sqlite.Current limitations
.o
outputs are cached. The fine-grained cache cannot be used if--output-bc
,--output-unopt-bc
or--output-asm
are specified.jl_get_llvm_module_impl
,jl_get_llvm_function_impl
,jl_emit_native_impl
withllvmmod
set).Plan