-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[lld][MachO]Multi-threaded i/o. Twice as fast linking a large project. #147134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-lld-macho @llvm/pr-subscribers-lld Author: John Holdsworth (johnno1962) ChangesThis PR adds a new option to lld 26.01, 25.84, 26.15, 26.03, 27.10, 25.90, 25.86, 25.81, 25.80, 25.87 With the proposed code change, and using the 21.13, 20.35, 20.01, 20.01, 20.30, 20.39, 19.97, 20.23, 20.17, 20.23 The secret sauce is in the new function Full diff: https://github.com/llvm/llvm-project/pull/147134.diff 3 Files Affected:
diff --git a/lld/MachO/Config.h b/lld/MachO/Config.h
index a01e60efbe761..92c6eb85f4123 100644
--- a/lld/MachO/Config.h
+++ b/lld/MachO/Config.h
@@ -186,6 +186,7 @@ struct Configuration {
bool interposable = false;
bool errorForArchMismatch = false;
bool ignoreAutoLink = false;
+ int readThreads = 0;
// ld64 allows invalid auto link options as long as the link succeeds. LLD
// does not, but there are cases in the wild where the invalid linker options
// exist. This allows users to ignore the specific invalid options in the case
diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index 9eb391c4ee1b9..a244f2781c22c 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -47,6 +47,7 @@
#include "llvm/Support/TarWriter.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/TimeProfiler.h"
+#include "llvm/Support/Process.h"
#include "llvm/TargetParser/Host.h"
#include "llvm/TextAPI/Architecture.h"
#include "llvm/TextAPI/PackedVersion.h"
@@ -282,11 +283,11 @@ static void saveThinArchiveToRepro(ArchiveFile const *file) {
": Archive::children failed: " + toString(std::move(e)));
}
-static InputFile *addFile(StringRef path, LoadType loadType,
- bool isLazy = false, bool isExplicit = true,
- bool isBundleLoader = false,
- bool isForceHidden = false) {
- std::optional<MemoryBufferRef> buffer = readFile(path);
+static InputFile *deferredAddFile(std::optional<MemoryBufferRef> buffer,
+ StringRef path, LoadType loadType,
+ bool isLazy = false, bool isExplicit = true,
+ bool isBundleLoader = false,
+ bool isForceHidden = false) {
if (!buffer)
return nullptr;
MemoryBufferRef mbref = *buffer;
@@ -441,6 +442,14 @@ static InputFile *addFile(StringRef path, LoadType loadType,
return newFile;
}
+static InputFile *addFile(StringRef path, LoadType loadType,
+ bool isLazy = false, bool isExplicit = true,
+ bool isBundleLoader = false,
+ bool isForceHidden = false) {
+ return deferredAddFile(readFile(path), path, loadType, isLazy,
+ isExplicit, isBundleLoader, isForceHidden);
+}
+
static std::vector<StringRef> missingAutolinkWarnings;
static void addLibrary(StringRef name, bool isNeeded, bool isWeak,
bool isReexport, bool isHidden, bool isExplicit,
@@ -564,13 +573,21 @@ void macho::resolveLCLinkerOptions() {
}
}
-static void addFileList(StringRef path, bool isLazy) {
+typedef struct { StringRef path; std::optional<MemoryBufferRef> buffer; } DeferredFile;
+
+static void addFileList(StringRef path, bool isLazy,
+ std::vector<DeferredFile> &deferredFiles, int readThreads) {
std::optional<MemoryBufferRef> buffer = readFile(path);
if (!buffer)
return;
MemoryBufferRef mbref = *buffer;
for (StringRef path : args::getLines(mbref))
- addFile(rerootPath(path), LoadType::CommandLine, isLazy);
+ if (readThreads) {
+ StringRef rrpath = rerootPath(path);
+ deferredFiles.push_back({rrpath, readFile(rrpath)});
+ }
+ else
+ addFile(rerootPath(path), LoadType::CommandLine, isLazy);
}
// We expect sub-library names of the form "libfoo", which will match a dylib
@@ -1215,13 +1232,61 @@ static void handleSymbolPatterns(InputArgList &args,
parseSymbolPatternsFile(arg, symbolPatterns);
}
-static void createFiles(const InputArgList &args) {
+// Most input files have been mapped but not yet paged in.
+// This code forces the page-ins on multiple threads so
+// the process is not stalled waiting on disk buffer i/o.
+void multiThreadedPageIn(std::vector<DeferredFile> &deferred, int nthreads) {
+ typedef struct {
+ std::vector<DeferredFile> &deferred;
+ size_t counter, total, pageSize;
+ pthread_mutex_t mutex;
+ } PageInState;
+ PageInState state = {deferred, 0, 0,
+ llvm::sys::Process::getPageSizeEstimate(), pthread_mutex_t()};
+ pthread_mutex_init(&state.mutex, NULL);
+
+ pthread_t running[200];
+ int maxthreads = sizeof running / sizeof running[0];
+ if (nthreads > maxthreads)
+ nthreads = maxthreads;
+ for (int t=0; t<nthreads; t++)
+ pthread_create(&running[t], nullptr, [](void* ptr) -> void*{
+ PageInState &state = *(PageInState *)ptr;
+ static int total = 0;
+ while (true) {
+ pthread_mutex_lock(&state.mutex);
+ if (state.counter >= state.deferred.size()) {
+ pthread_mutex_unlock(&state.mutex);
+ return nullptr;
+ }
+ DeferredFile &add = state.deferred[state.counter];
+ state.counter += 1;
+ pthread_mutex_unlock(&state.mutex);
+
+ int t = 0; // Reference each page to load it into memory.
+ for (const char *start = add.buffer->getBuffer().data(),
+ *page = start; page<start+add.buffer->getBuffer().size();
+ page += state.pageSize)
+ t += *page;
+ state.total += t; // Avoids whole section being optimised out.
+ }
+ }, &state);
+
+ for (int t=0; t<nthreads; t++)
+ pthread_join(running[t], nullptr);
+
+ pthread_mutex_destroy(&state.mutex);
+}
+
+void createFiles(const InputArgList &args, int readThreads) {
TimeTraceScope timeScope("Load input files");
// This loop should be reserved for options whose exact ordering matters.
// Other options should be handled via filtered() and/or getLastArg().
bool isLazy = false;
// If we've processed an opening --start-lib, without a matching --end-lib
bool inLib = false;
+ std::vector<DeferredFile> deferredFiles;
+
for (const Arg *arg : args) {
const Option &opt = arg->getOption();
warnIfDeprecatedOption(opt);
@@ -1229,6 +1294,11 @@ static void createFiles(const InputArgList &args) {
switch (opt.getID()) {
case OPT_INPUT:
+ if (readThreads) {
+ StringRef rrpath = rerootPath(arg->getValue());
+ deferredFiles.push_back({rrpath,readFile(rrpath)});
+ break;
+ }
addFile(rerootPath(arg->getValue()), LoadType::CommandLine, isLazy);
break;
case OPT_needed_library:
@@ -1249,7 +1319,7 @@ static void createFiles(const InputArgList &args) {
dylibFile->forceWeakImport = true;
break;
case OPT_filelist:
- addFileList(arg->getValue(), isLazy);
+ addFileList(arg->getValue(), isLazy, deferredFiles, readThreads);
break;
case OPT_force_load:
addFile(rerootPath(arg->getValue()), LoadType::CommandLineForce);
@@ -1295,6 +1365,12 @@ static void createFiles(const InputArgList &args) {
break;
}
}
+
+ if (readThreads) {
+ multiThreadedPageIn(deferredFiles, readThreads);
+ for (auto &add : deferredFiles)
+ deferredAddFile(add.buffer, add.path, LoadType::CommandLine, isLazy);
+ }
}
static void gatherInputSections() {
@@ -1687,6 +1763,14 @@ bool link(ArrayRef<const char *> argsArr, llvm::raw_ostream &stdoutOS,
}
}
+ if (auto *arg = args.getLastArg(OPT_read_threads)) {
+ StringRef v(arg->getValue());
+ unsigned threads = 0;
+ if (!llvm::to_integer(v, threads, 0) || threads < 0)
+ error(arg->getSpelling() + ": expected a positive integer, but got '" +
+ arg->getValue() + "'");
+ config->readThreads = threads;
+ }
if (auto *arg = args.getLastArg(OPT_threads_eq)) {
StringRef v(arg->getValue());
unsigned threads = 0;
@@ -2107,7 +2191,7 @@ bool link(ArrayRef<const char *> argsArr, llvm::raw_ostream &stdoutOS,
TimeTraceScope timeScope("ExecuteLinker");
initLLVM(); // must be run before any call to addFile()
- createFiles(args);
+ createFiles(args, config->readThreads);
// Now that all dylibs have been loaded, search for those that should be
// re-exported.
diff --git a/lld/MachO/Options.td b/lld/MachO/Options.td
index 4f0602f59812b..3dc98fccc1b7b 100644
--- a/lld/MachO/Options.td
+++ b/lld/MachO/Options.td
@@ -396,6 +396,9 @@ def dead_strip : Flag<["-"], "dead_strip">,
def interposable : Flag<["-"], "interposable">,
HelpText<"Indirects access to all exported symbols in an image">,
Group<grp_opts>;
+def read_threads : Joined<["--"], "read-threads=">,
+ HelpText<"Number of threads to use paging in files.">,
+ Group<grp_lld>;
def order_file : Separate<["-"], "order_file">,
MetaVarName<"<file>">,
HelpText<"Layout functions and data according to specification in <file>">,
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
b66eb42
to
fd5647a
Compare
9acbaea
to
47bad1d
Compare
47bad1d
to
3d11a33
Compare
a324caa
to
6936449
Compare
fdc4c38
to
767b7b1
Compare
767b7b1
to
0fabbd1
Compare
0fabbd1
to
02fb145
Compare
1a6ec75
to
e267526
Compare
f6d2bbf
to
7c850e9
Compare
7c850e9
to
a8eeead
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The improvements are interesting. I have some comments, but I like that the default behaviour is not modified unless opt-in. I have not look at some pieces of the logic, which are a little bit difficult to figure out when parallelism is introduced (the if (running)
to wait for the previous files being loaded...)
@@ -396,6 +396,9 @@ def dead_strip : Flag<["-"], "dead_strip">, | |||
def interposable : Flag<["-"], "interposable">, | |||
HelpText<"Indirects access to all exported symbols in an image">, | |||
Group<grp_opts>; | |||
def read_threads : Joined<["--"], "read-threads=">, | |||
HelpText<"Number of threads to use paging in files.">, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add "Default is not to use threads".
I am not a native speaker, but the wording might be hard to understand. "Number of threads to use while paging in files." might be clearer, but I am not sure if "paging in files" makes it obvious what it is going to happen, but I don't have a better wording.
Also, not sure why not use the value that people already provides in --threads=
with this option being a boolean "read input files in parallel yes/no", or like ThinLTO does and have a --load-file-jobs=
which defaults to --threads=
.
|
||
pthread_mutex_destroy(&state->mutex); | ||
if (getenv("LLD_MULTI_THREAD_PAGE")) | ||
printf("multiThreadedPageIn %ld/%ld\n", totalBytes, state->deferred.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use lld::log()
or lld::message()
?
(I would had also say to use llvm::dbgs()
, but it seems that LLD for Mach-O doesn't seem to use it at all).
typedef struct { | ||
StringRef path; | ||
bool isLazy; | ||
std::optional<MemoryBufferRef> buffer; | ||
const char *start; | ||
size_t size; | ||
} DeferredFile; | ||
typedef std::vector<DeferredFile> DeferredFiles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this file is C++, the typedef
before the struct
should not be necessary. The typedef
for DeferredFiles
can be maybe a using
instead.
typedef struct { | ||
DeferredFiles deferred; | ||
size_t counter, total, pageSize; | ||
pthread_mutex_t mutex; | ||
} PageInState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. It will also allow mutex
to be provided an initializer and deferred
to be a reference and avoid the copying of the vector.
pthread_create( | ||
&running, nullptr, | ||
[](void *ptr) -> void * { | ||
PageInState *state = (PageInState *)ptr; | ||
multiThreadedPageInBackground(state); | ||
pthread_mutex_unlock(&busy); | ||
delete state; | ||
return nullptr; | ||
}, | ||
state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering why not use llvm::parallel
and I think I don't see any other reason that this small thread to create the background threads. I am not sure how to replicate this with llvm::parallel
easy, but I think the background thread can be a parallelFor
with little effort, which will also allow the code to be available for _WIN32
.
Another reason might be that it will integrate better with the pieces of LLD which already use llvm::parallel
.
This PR adds a new option to lld
--read-threads=20
that defers all disk I/o then performs it multithreaded so the process is never stalled waiting for the I/o of the page-in of mapped input files. This results in a saving of elapsed time. For a large link (iterating on Chromium) these are the baseline linkage times saving a single file and rebuilding (seconds inside Xcode):26.01, 25.84, 26.15, 26.03, 27.10, 25.90, 25.86, 25.81, 25.80, 25.87
With the proposed code change, and using the
--read-threads=20
option, the linking times reduce to the following:21.13, 20.35, 20.01, 20.01, 20.30, 20.39, 19.97, 20.23, 20.17, 20.23
The secret sauce is in the new function
multiThreadedPageIn()
in Driver.cpp. Without the option lld behaves as before.Edit: with subsequent commits I've taken this novel i/o approach to its full potential. Latest linking times are now:
13.2, 11.9, 12.12, 12.01, 11.99, 13.11, 11.93, 11.95, 12.18, 11.97
Chrome is still linking and running so it doesn't look like anything is broken. Despite being multi-threaded all memory access is readonly and the original code paths are not changed. All that is happening is the system is being asked to proactively page in files rather than waiting for processing to page fault which would otherwise stall the process.