Skip to content

Address the most important (decided by me) of Teresa's DTLTO review comments #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: DTLTO_llvm_only
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
03e98d4
[DTLTO][LLVM] Store the Target Triple for ThinLTO modules.
bd1976bris Feb 19, 2025
19ef1d1
[DTLTO][LLVM] Make the ThinLTO backend wait() method virtual
bd1976bris Feb 19, 2025
bc7f32e
[DTLTO][LLVM] Generalize the emit files infrastructure
bd1976bris Feb 19, 2025
7d0c1c8
[DTLTO][LLVM] Add a setup() member to the ThinLTO backends
bd1976bris Feb 19, 2025
d55d8c0
[DTLTO][LLVM] Derive the InProcess backend from a base class
bd1976bris Feb 19, 2025
e3c016f
[DTLTO][LLVM] Implement integrated distribution for ThinLTO (DTLTO).
bd1976bris Feb 17, 2025
0490b3b
[DTLTO][LLVM] Translate some LTO configuration state into clang options.
bd1976bris Feb 19, 2025
2c9710f
[DTLTO][LLVM] Allow LTO to take an AddBuffer function and use in DTLTO
bd1976bris Feb 19, 2025
0c46c0c
[DTLTO][LLVM][Doc] Add DTLTO documentation
bd1976bris Feb 19, 2025
33dbf55
[DTLTO][LLVM] clang format LTO.h to prevent automated checks errors
bd1976bris Feb 19, 2025
9b5162d
Improve the test distributors
bd1976bris Feb 20, 2025
2be4b0c
Address minor test nits
bd1976bris Feb 20, 2025
8923484
Support more than one LTO partition.
bd1976bris Feb 25, 2025
0bfafdd
UI improvements to the current `remote-opt-tool`
bd1976bris Feb 27, 2025
2f9d381
Update python module docstrings to match the script names
bd1976bris Feb 28, 2025
a828b70
Sync with https://github.com/llvm/llvm-project/pull/127749
bd1976bris Mar 3, 2025
a6e8106
> What is UID in this context? Should this be PID?
bd1976bris Mar 5, 2025
1eebf91
> Maybe NOIMPORTFILES (they are imports files not index files)?
bd1976bris Mar 5, 2025
b46bfab
> Is this description correct?
bd1976bris Mar 5, 2025
a98fe79
> This isn't really tracing the full "thin backend", should a differe…
bd1976bris Mar 5, 2025
88e6151
> What is IndexPath?
bd1976bris Mar 5, 2025
e877208
> Is the use of llvm-lto and opt here just to do testing before the c…
bd1976bris Mar 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
224 changes: 224 additions & 0 deletions llvm/docs/DTLTO.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
===================
DTLTO
===================
.. contents::
:local:
:depth: 2

.. toctree::
:maxdepth: 1

Distributed ThinLTO (DTLTO)
===========================

Distributed ThinLTO (DTLTO) facilitates the distribution of backend ThinLTO
compilations via external distribution systems such as Incredibuild.

The existing method of distributing ThinLTO compilations via separate thin-link,
backend compilation, and link steps often requires significant changes to the
user's build process to adopt, as it requires using a build system which can
handle the dynamic dependencies specified by the index files, such as Bazel.

DTLTO eliminates this need by managing distribution internally within the LLD
linker during the traditional link step. This allows DTLTO to be used with any
build process that supports in-process ThinLTO.

Limitations
-----------

The current implementation of DTLTO has the following limitations:

- The ThinLTO cache is not supported.
- Only ELF and COFF platforms are supported.
- Archives with bitcode members are not supported.
- Only a very limited set of LTO configurations are currently supported, e.g.,
support for basic block sections is not currently available.

Overview of Operation
---------------------

For each ThinLTO backend compilation job, LLD:

1. Generates the required summary index shard.
2. Records a list of input and output files.
3. Constructs a Clang command line to perform the ThinLTO backend compilation.

This information is supplied, via a JSON file, to a distributor program that
executes the backend compilations using a distribution system. Upon completion,
LLD integrates the compiled native object files into the link process.

The design keeps the details of distribution systems out of the LLVM source
code.

Distributors
------------

Distributors are programs responsible for:

1. Consuming the JSON backend compilations job description file.
2. Translating job descriptions into requests for the distribution system.
3. Blocking execution until all backend compilations are complete.

Distributors must return a non-zero exit code on failure. They can be
implemented as binaries or in scripting languages, such as Python. An example
script demonstrating basic local execution is available with the LLVM source
code.

How Distributors Are Invoked
----------------------------

Clang and LLD provide options to specify a distributor program for managing
backend compilations. Distributor options and backend compilation options, can
also be specified. Such options are transparently forwarded.

The backend compilations are currently performed by invoking Clang. For further
details, refer to:

- Clang documentation: https://clang.llvm.org/docs/ThinLTO.html
- LLD documentation: https://lld.llvm.org/DTLTO.html

When invoked with a distributor, LLD generates a JSON file describing the
backend compilation jobs and executes the distributor passing it this file. The
JSON file provides the following information to the distributor:

- The **command line** to execute the backend compilations.
- DTLTO constructs a Clang command line by translating some of the LTO
configuration state into Clang options and forwarding options specified
by the user.

- **Link output path**.
- A string identifying the output to which this LTO invocation will
contribute. Distributors can use this to label build jobs for informational
purposes.

- The list of **imports** required for each job.
- The per-job list of bitcode files from which importing will occur. This is
the same information that is emitted into import files for ThinLTO.

- The **input files** required for each job.
- The per-job set of files required for backend compilation, such as bitcode
files, summary index files, and profile data.

- The **output files** generated by each job.
- The per-job files generated by the backend compilations, such as compiled
object files and toolchain metrics.

Temporary Files
---------------

During its operation, DTLTO generates temporary files. Temporary files are
created in the same directory as the linker's output file and their filenames
include the stem of the bitcode module, or the output file that the LTO
invocation is contributing to, to aid the user in identifying them:

- **JSON Job Description File**:
- Format: `dtlto.<PID>.dist-file.json`
- Example: `dtlto.77380.dist-file.json` (for output file `dtlto.elf`).

- **Object Files From Backend Compilations**:
- Format: `<Module ID stem>.<Task>.<PID>.native.o`
- Example: `my.1.77380.native.o` (for bitcode module `my.o`).

- **Summary Index Shard Files**:
- Format: `<Module ID stem>.<Task>.<PID>.native.o.thinlto.bc`
- Example: `my.1.77380.native.o.thinlto.bc` (for bitcode module `my.o`).

Temporary files are removed, by default, after the backend compilations complete.

JSON Schema
-----------

Below is an example of a JSON job file for backend compilation of the module
`dtlto.o`:

.. code-block:: json

{
"common": {
"linker_output": "dtlto.elf",
"linker_version": "LLD 20.0.0",
"args": [
"/usr/local/clang",
"-O3", "-fprofile-sample-use=my.profdata",
"-o", ["primary_output", 0],
"-c", "-x", "ir", ["primary_input", 0],
["summary_index", "-fthinlto-index=", 0],
"--target=x86_64-sie-ps5"
]
},
"jobs": [
{
"primary_input": ["dtlto.o"],
"summary_index": ["dtlto.1.51232.native.o.thinlto.bc"],
"primary_output": ["dtlto.1.51232.native.o"],
"imports": [],
"additional_inputs": ["my.profdata"]
}
]
}

Key Features of the Schema
~~~~~~~~~~~~~~~~~~~~~~~~~~

- **Input/Output Paths**: Paths are stored in per-file-type array fields. This
allows files to be adjusted, if required, to meet the constraints of the
underlying distribution system. For example, a system may only be able to read
and write remote files to `C:\\sandbox`. The remote paths used can be adjusted
by the distributor for such constraints. Once outputs are back on the local
system, the distributor can rename them as required.


- **Command-Line Template**: Command-line options are stored in a common
template to avoid duplication for each job. The template consists of an array
of strings and arrays. The arrays are placeholders which reference per-job
paths. This allows the remote compiler and its arguments to be changed without
updating the distributors.

Command-Line Expansion Example
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To create the backend compilation commands, the command-line template is
expanded for each job. Placeholders are expanded in the following way: The first
array element specifies the name of the array field to look in. The remaining
elements are converted to strings and concatenated. Integers are converted by
indexing into the specified array.

The example above generates the following backend compilation command for
`main.o`:

.. code-block:: console

/usr/local/clang -O3 -fprofile-sample-use=my.profdata \
-o dtlto.1.51232.native.o -c -x ir dtlto.o \
-fthinlto-index=dtlto.1.51232.native.o.thinlto.bc --target=x86_64-sie-ps5

This expansion scheme allows the remote compiler to be changed without updating
the distributors. For example, if the "args" field in the above example was
replaced with:

.. code-block:: json

"args": [
"custom-compiler",
"-opt-level=2",
"-profile-instrument-use-path=my.profdata",
"-output", ["primary_output", 0],
"-input", ["primary_input", 0],
"-thinlto-index", ["summary_index", 0],
"-triple", "x86_64-sie-ps5"
]

Then distributors can expand the command line without needing to be updated:

.. code-block:: console

custom-compiler -opt-level=2 -profile-instrument-use-path=my.profdata \
-output dtlto.1.51232.native.o -input dtlto.o \
-thinlto-index dtlto.1.51232.native.o.thinlto.bc -triple x86_64-sie-ps5

Constraints
-----------

- Matching versions of Clang and LLD should be used.
- The distributor used must support the JSON schema generated by the version of
LLD in use.
6 changes: 6 additions & 0 deletions llvm/docs/UserGuides.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ intermediate LLVM representation.
DebuggingJITedCode
DirectXUsage
Docker
DTLTO
FatLTO
ExtendingLLVM
GitHub
Expand Down Expand Up @@ -164,6 +165,11 @@ Optimizations
This document describes the interface between LLVM intermodular optimizer
and the linker and its design

:doc:`DTLTO`
This document describes the DTLTO implementation, which allows for
distributing ThinLTO backend compilations without requiring support from
the build system.

:doc:`GoldPlugin`
How to build your programs with link-time optimization on Linux.

Expand Down
63 changes: 50 additions & 13 deletions llvm/include/llvm/LTO/LTO.h
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,8 @@ class InputFile {

using IndexWriteCallback = std::function<void(const std::string &)>;

using ImportsFilesContainer = llvm::SmallVector<std::string>;

/// This class defines the interface to the ThinLTO backend.
class ThinBackendProc {
protected:
Expand All @@ -223,13 +225,15 @@ class ThinBackendProc {
BackendThreadPool(ThinLTOParallelism) {}

virtual ~ThinBackendProc() = default;
virtual void setup(unsigned MaxTasks, unsigned ReservedTasks) {}
virtual Error start(
unsigned Task, BitcodeModule BM,
const FunctionImporter::ImportMapTy &ImportList,
const FunctionImporter::ExportSetTy &ExportList,
const std::map<GlobalValue::GUID, GlobalValue::LinkageTypes> &ResolvedODR,
MapVector<StringRef, BitcodeModule> &ModuleMap) = 0;
Error wait() {
MapVector<StringRef, BitcodeModule> &ModuleMap,
DenseMap<StringRef, std::string> &ModuleTriples) = 0;
virtual Error wait() {
BackendThreadPool.wait();
if (Err)
return std::move(*Err);
Expand All @@ -240,8 +244,15 @@ class ThinBackendProc {

// Write sharded indices and (optionally) imports to disk
Error emitFiles(const FunctionImporter::ImportMapTy &ImportList,
llvm::StringRef ModulePath,
const std::string &NewModulePath) const;
StringRef ModulePath, const std::string &NewModulePath) const;

// Write sharded indices to SummaryPath, (optionally) imports to disk, and
// (optionally) record imports in ImportsFiles.
Error emitFiles(const FunctionImporter::ImportMapTy &ImportList,
StringRef ModulePath, StringRef SummaryPath,
const std::string &NewModulePath,
std::optional<std::reference_wrapper<ImportsFilesContainer>>
ImportsFiles) const;
};

/// This callable defines the behavior of a ThinLTO backend after the thin-link
Expand All @@ -253,7 +264,7 @@ class ThinBackendProc {
using ThinBackendFunction = std::function<std::unique_ptr<ThinBackendProc>(
const Config &C, ModuleSummaryIndex &CombinedIndex,
const DenseMap<StringRef, GVSummaryMapTy> &ModuleToDefinedGVSummaries,
AddStreamFn AddStream, FileCache Cache)>;
AddStreamFn AddStream, AddBufferFn AddBuffer, FileCache Cache)>;

/// This type defines the behavior following the thin-link phase during ThinLTO.
/// It encapsulates a backend function and a strategy for thread pool
Expand All @@ -268,10 +279,10 @@ struct ThinBackend {
std::unique_ptr<ThinBackendProc> operator()(
const Config &Conf, ModuleSummaryIndex &CombinedIndex,
const DenseMap<StringRef, GVSummaryMapTy> &ModuleToDefinedGVSummaries,
AddStreamFn AddStream, FileCache Cache) {
AddStreamFn AddStream, AddBufferFn AddBuffer, FileCache Cache) {
assert(isValid() && "Invalid backend function");
return Func(Conf, CombinedIndex, ModuleToDefinedGVSummaries,
std::move(AddStream), std::move(Cache));
std::move(AddStream), std::move(AddBuffer), std::move(Cache));
}
ThreadPoolStrategy getParallelism() const { return Parallelism; }
bool isValid() const { return static_cast<bool>(Func); }
Expand All @@ -294,6 +305,22 @@ ThinBackend createInProcessThinBackend(ThreadPoolStrategy Parallelism,
bool ShouldEmitIndexFiles = false,
bool ShouldEmitImportsFiles = false);

/// This ThinBackend generates the index shards and then runs the individual
/// backend jobs via an external process. It takes the same parameters as the
/// InProcessThinBackend, however, these parameters only control the behavior
/// when generating the index files for the modules. Additionally:
/// LinkerOutputFile is a string that should identify this LTO invocation in
/// the context of a wider build. It's used for naming to aid the user in
/// identifying activity related to a specific LTO invocation.
/// Distributor specifies the path to a process to invoke to manage the backend
/// jobs execution.
/// SaveTemps is a debugging tool that prevents temporary files created by this
/// backend from being cleaned up.
ThinBackend createOutOfProcessThinBackend(
ThreadPoolStrategy Parallelism, IndexWriteCallback OnWrite,
bool ShouldEmitIndexFiles, bool ShouldEmitImportsFiles,
StringRef LinkerOutputFile, StringRef Distributor, bool SaveTemps);

/// This ThinBackend writes individual module indexes to files, instead of
/// running the individual backend jobs. This backend is for distributed builds
/// where separate processes will invoke the real backends.
Expand Down Expand Up @@ -369,15 +396,22 @@ class LTO {
/// full description of tasks see LTOBackend.h.
unsigned getMaxTasks() const;

/// Runs the LTO pipeline. This function calls the supplied AddStream
/// function to add native object files to the link.
/// Runs the LTO pipeline. This function calls the supplied AddStream or
/// AddBuffer function to add native object files to the link depending on
/// whether the files are streamed into memory or written to disk by the
/// backend.
///
/// The Cache parameter is optional. If supplied, it will be used to cache
/// native object files and add them to the link.
///
/// The client will receive at most one callback (via either AddStream or
/// The AddBuffer parameter is only required for DTLTO, currently. It is
/// optional to minimise the impact on current LTO users (DTLTO is not used
/// currently).
///
/// The client will receive at most one callback (via AddStream, AddBuffer or
/// Cache) for each task identifier.
Error run(AddStreamFn AddStream, FileCache Cache = {});
Error run(AddStreamFn AddStream, FileCache Cache = {},
AddBufferFn AddBuffer = nullptr);

/// Static method that returns a list of libcall symbols that can be generated
/// by LTO but might not be visible from bitcode symbol table.
Expand Down Expand Up @@ -426,6 +460,7 @@ class LTO {
// The bitcode modules to compile, if specified by the LTO Config.
std::optional<ModuleMapType> ModulesToCompile;
DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID;
DenseMap<StringRef, std::string> ModuleTriples;
} ThinLTO;

// The global resolution for a particular (mangled) symbol name. This is in
Expand Down Expand Up @@ -517,10 +552,12 @@ class LTO {
bool LivenessFromIndex);

Error addThinLTO(BitcodeModule BM, ArrayRef<InputFile::Symbol> Syms,
const SymbolResolution *&ResI, const SymbolResolution *ResE);
const SymbolResolution *&ResI, const SymbolResolution *ResE,
StringRef Triple);

Error runRegularLTO(AddStreamFn AddStream);
Error runThinLTO(AddStreamFn AddStream, FileCache Cache,
Error runThinLTO(AddStreamFn AddStream, AddBufferFn AddBuffer,
FileCache Cache,
const DenseSet<GlobalValue::GUID> &GUIDPreservedSymbols);

Error checkPartiallySplit();
Expand Down
3 changes: 2 additions & 1 deletion llvm/include/llvm/Support/Caching.h
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,8 @@ struct FileCache {
std::string CacheDirectoryPath;
};

/// This type defines the callback to add a pre-existing file (e.g. in a cache).
/// This type defines the callback to add a pre-existing file (e.g. in a cache
/// or created by a backend compilation run as a separate process).
///
/// Buffer callbacks must be thread safe.
using AddBufferFn = std::function<void(unsigned Task, const Twine &ModuleName,
Expand Down
Loading