bd1976bris · bd1976bris · Feb 19, 2025 · Feb 19, 2025 · Feb 19, 2025 · Feb 19, 2025
diff --git a/llvm/docs/DTLTO.rst b/llvm/docs/DTLTO.rst
@@ -0,0 +1,224 @@
+===================
+DTLTO
+===================
+.. contents::
+   :local:
+   :depth: 2
+
+.. toctree::
+   :maxdepth: 1
+
+Distributed ThinLTO (DTLTO)
+===========================
+
+Distributed ThinLTO (DTLTO) facilitates the distribution of backend ThinLTO
+compilations via external distribution systems such as Incredibuild.
+
+The existing method of distributing ThinLTO compilations via separate thin-link,
+backend compilation, and link steps often requires significant changes to the
+user's build process to adopt, as it requires using a build system which can
+handle the dynamic dependencies specified by the index files, such as Bazel.
+
+DTLTO eliminates this need by managing distribution internally within the LLD
+linker during the traditional link step. This allows DTLTO to be used with any
+build process that supports in-process ThinLTO.
+
+Limitations
+-----------
+
+The current implementation of DTLTO has the following limitations:
+
+- The ThinLTO cache is not supported.
+- Only ELF and COFF platforms are supported.
+- Archives with bitcode members are not supported.
+- Only a very limited set of LTO configurations are currently supported, e.g.,
+  support for basic block sections is not currently available.
+
+Overview of Operation
+---------------------
+
+For each ThinLTO backend compilation job, LLD:
+
+1. Generates the required summary index shard.
+2. Records a list of input and output files.
+3. Constructs a Clang command line to perform the ThinLTO backend compilation.
+
+This information is supplied, via a JSON file, to a distributor program that
+executes the backend compilations using a distribution system. Upon completion,
+LLD integrates the compiled native object files into the link process.
+
+The design keeps the details of distribution systems out of the LLVM source
+code.
+
+Distributors
+------------
+
+Distributors are programs responsible for:
+
+1. Consuming the JSON backend compilations job description file.
+2. Translating job descriptions into requests for the distribution system.
+3. Blocking execution until all backend compilations are complete.
+
+Distributors must return a non-zero exit code on failure. They can be
+implemented as binaries or in scripting languages, such as Python. An example
+script demonstrating basic local execution is available with the LLVM source
+code.
+
+How Distributors Are Invoked
+----------------------------
+
+Clang and LLD provide options to specify a distributor program for managing
+backend compilations. Distributor options and backend compilation options, can
+also be specified. Such options are transparently forwarded.
+
+The backend compilations are currently performed by invoking Clang. For further
+details, refer to:
+
+- Clang documentation: https://clang.llvm.org/docs/ThinLTO.html
+- LLD documentation: https://lld.llvm.org/DTLTO.html
+
+When invoked with a distributor, LLD generates a JSON file describing the
+backend compilation jobs and executes the distributor passing it this file. The
+JSON file provides the following information to the distributor:
+
+- The **command line** to execute the backend compilations.
+   - DTLTO constructs a Clang command line by translating some of the LTO
+     configuration state into Clang options and forwarding options specified
+     by the user.
+
+- **Link output path**.
+   - A string identifying the output to which this LTO invocation will 
+     contribute. Distributors can use this to label build jobs for informational
+     purposes.
+
+- The list of **imports** required for each job.
+   - The per-job list of bitcode files from which importing will occur. This is
+     the same information that is emitted into import files for ThinLTO.
+
+- The **input files** required for each job.
+   - The per-job set of files required for backend compilation, such as bitcode
+     files, summary index files, and profile data.
+
+- The **output files** generated by each job.
+   - The per-job files generated by the backend compilations, such as compiled
+     object files and toolchain metrics.
+
+Temporary Files
+---------------
+
+During its operation, DTLTO generates temporary files. Temporary files are
+created in the same directory as the linker's output file and their filenames
+include the stem of the bitcode module, or the output file that the LTO 
+invocation is contributing to, to aid the user in identifying them:
+
+- **JSON Job Description File**:
+    - Format:  `dtlto.<PID>.dist-file.json`
+    - Example: `dtlto.77380.dist-file.json` (for output file `dtlto.elf`).
+
+- **Object Files From Backend Compilations**:
+    - Format:  `<Module ID stem>.<Task>.<PID>.native.o`
+    - Example: `my.1.77380.native.o` (for bitcode module `my.o`).
+
+- **Summary Index Shard Files**:
+    - Format:  `<Module ID stem>.<Task>.<PID>.native.o.thinlto.bc`
+    - Example: `my.1.77380.native.o.thinlto.bc` (for bitcode module `my.o`).
+
+Temporary files are removed, by default, after the backend compilations complete.
+
+JSON Schema
+-----------
+
+Below is an example of a JSON job file for backend compilation of the module
+`dtlto.o`:
+
+.. code-block:: json
+
+    {
+        "common": {
+            "linker_output": "dtlto.elf",
+            "linker_version": "LLD 20.0.0",
+            "args": [
+                "/usr/local/clang",
+                "-O3", "-fprofile-sample-use=my.profdata",
+                "-o", ["primary_output", 0],
+                "-c", "-x", "ir", ["primary_input", 0],
+                ["summary_index", "-fthinlto-index=", 0],
+                "--target=x86_64-sie-ps5"
+            ]
+        },
+        "jobs": [
+            {
+                "primary_input": ["dtlto.o"],
+                "summary_index": ["dtlto.1.51232.native.o.thinlto.bc"],
+                "primary_output": ["dtlto.1.51232.native.o"],
+                "imports": [],
+                "additional_inputs": ["my.profdata"]
+            }
+        ]
+    }
+
+Key Features of the Schema
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **Input/Output Paths**: Paths are stored in per-file-type array fields. This
+  allows files to be adjusted, if required, to meet the constraints of the
+  underlying distribution system. For example, a system may only be able to read
+  and write remote files to `C:\\sandbox`. The remote paths used can be adjusted
+  by the distributor for such constraints. Once outputs are back on the local
+  system, the distributor can rename them as required.
+
+
+- **Command-Line Template**: Command-line options are stored in a common
+  template to avoid duplication for each job. The template consists of an array
+  of strings and arrays. The arrays are placeholders which reference per-job
+  paths. This allows the remote compiler and its arguments to be changed without
+  updating the distributors.
+
+Command-Line Expansion Example
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To create the backend compilation commands, the command-line template is
+expanded for each job. Placeholders are expanded in the following way: The first
+array element specifies the name of the array field to look in. The remaining
+elements are converted to strings and concatenated. Integers are converted by
+indexing into the specified array.
+
+The example above generates the following backend compilation command for
+`main.o`:
+
+.. code-block:: console
+
+    /usr/local/clang -O3 -fprofile-sample-use=my.profdata \
+        -o dtlto.1.51232.native.o -c -x ir dtlto.o \
+        -fthinlto-index=dtlto.1.51232.native.o.thinlto.bc --target=x86_64-sie-ps5
+
+This expansion scheme allows the remote compiler to be changed without updating
+the distributors. For example, if the "args" field in the above example was
+replaced with:
+
+.. code-block:: json
+
+    "args": [
+        "custom-compiler",
+        "-opt-level=2",
+        "-profile-instrument-use-path=my.profdata",
+        "-output", ["primary_output", 0],
+        "-input", ["primary_input", 0],
+        "-thinlto-index", ["summary_index", 0],
+        "-triple", "x86_64-sie-ps5"
+    ]
+
+Then distributors can expand the command line without needing to be updated:
+
+.. code-block:: console
+
+    custom-compiler -opt-level=2 -profile-instrument-use-path=my.profdata \
+        -output dtlto.1.51232.native.o -input dtlto.o \
+        -thinlto-index dtlto.1.51232.native.o.thinlto.bc -triple x86_64-sie-ps5
+
+Constraints
+-----------
+
+- Matching versions of Clang and LLD should be used.
+- The distributor used must support the JSON schema generated by the version of
+  LLD in use.
diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst
@@ -32,6 +32,7 @@ intermediate LLVM representation.
    DebuggingJITedCode
    DirectXUsage
    Docker
+   DTLTO
    FatLTO
    ExtendingLLVM
    GitHub
@@ -164,6 +165,11 @@ Optimizations
    This document describes the interface between LLVM intermodular optimizer
    and the linker and its design
 
+:doc:`DTLTO`
+   This document describes the DTLTO implementation, which allows for
+   distributing ThinLTO backend compilations without requiring support from
+   the build system.
+
 :doc:`GoldPlugin`
    How to build your programs with link-time optimization on Linux.
 

diff --git a/llvm/include/llvm/LTO/LTO.h b/llvm/include/llvm/LTO/LTO.h
@@ -199,6 +199,8 @@ class InputFile {
 
 using IndexWriteCallback = std::function<void(const std::string &)>;
 
+using ImportsFilesContainer = llvm::SmallVector<std::string>;
+
 /// This class defines the interface to the ThinLTO backend.
 class ThinBackendProc {
 protected:
@@ -223,13 +225,15 @@ class ThinBackendProc {
         BackendThreadPool(ThinLTOParallelism) {}
 
   virtual ~ThinBackendProc() = default;
+  virtual void setup(unsigned MaxTasks, unsigned ReservedTasks) {}
   virtual Error start(
       unsigned Task, BitcodeModule BM,
       const FunctionImporter::ImportMapTy &ImportList,
       const FunctionImporter::ExportSetTy &ExportList,
       const std::map<GlobalValue::GUID, GlobalValue::LinkageTypes> &ResolvedODR,
-      MapVector<StringRef, BitcodeModule> &ModuleMap) = 0;
-  Error wait() {
+      MapVector<StringRef, BitcodeModule> &ModuleMap,
+      DenseMap<StringRef, std::string> &ModuleTriples) = 0;
+  virtual Error wait() {
     BackendThreadPool.wait();
     if (Err)
       return std::move(*Err);
@@ -240,8 +244,15 @@ class ThinBackendProc {
 
   // Write sharded indices and (optionally) imports to disk
   Error emitFiles(const FunctionImporter::ImportMapTy &ImportList,
-                  llvm::StringRef ModulePath,
-                  const std::string &NewModulePath) const;
+                  StringRef ModulePath, const std::string &NewModulePath) const;
+
+  // Write sharded indices to SummaryPath, (optionally) imports to disk, and
+  // (optionally) record imports in ImportsFiles.
+  Error emitFiles(const FunctionImporter::ImportMapTy &ImportList,
+                  StringRef ModulePath, StringRef SummaryPath,
+                  const std::string &NewModulePath,
+                  std::optional<std::reference_wrapper<ImportsFilesContainer>>
+                      ImportsFiles) const;
 };
 
 /// This callable defines the behavior of a ThinLTO backend after the thin-link
@@ -253,7 +264,7 @@ class ThinBackendProc {
 using ThinBackendFunction = std::function<std::unique_ptr<ThinBackendProc>(
     const Config &C, ModuleSummaryIndex &CombinedIndex,
     const DenseMap<StringRef, GVSummaryMapTy> &ModuleToDefinedGVSummaries,
-    AddStreamFn AddStream, FileCache Cache)>;
+    AddStreamFn AddStream, AddBufferFn AddBuffer, FileCache Cache)>;
 
 /// This type defines the behavior following the thin-link phase during ThinLTO.
 /// It encapsulates a backend function and a strategy for thread pool
@@ -268,10 +279,10 @@ struct ThinBackend {
   std::unique_ptr<ThinBackendProc> operator()(
       const Config &Conf, ModuleSummaryIndex &CombinedIndex,
       const DenseMap<StringRef, GVSummaryMapTy> &ModuleToDefinedGVSummaries,
-      AddStreamFn AddStream, FileCache Cache) {
+      AddStreamFn AddStream, AddBufferFn AddBuffer, FileCache Cache) {
     assert(isValid() && "Invalid backend function");
     return Func(Conf, CombinedIndex, ModuleToDefinedGVSummaries,
-                std::move(AddStream), std::move(Cache));
+                std::move(AddStream), std::move(AddBuffer), std::move(Cache));
   }
   ThreadPoolStrategy getParallelism() const { return Parallelism; }
   bool isValid() const { return static_cast<bool>(Func); }
@@ -294,6 +305,22 @@ ThinBackend createInProcessThinBackend(ThreadPoolStrategy Parallelism,
                                        bool ShouldEmitIndexFiles = false,
                                        bool ShouldEmitImportsFiles = false);
 
+/// This ThinBackend generates the index shards and then runs the individual
+/// backend jobs via an external process. It takes the same parameters as the
+/// InProcessThinBackend, however, these parameters only control the behavior
+/// when generating the index files for the modules. Additionally:
+/// LinkerOutputFile is a string that should identify this LTO invocation in
+/// the context of a wider build. It's used for naming to aid the user in
+/// identifying activity related to a specific LTO invocation.
+/// Distributor specifies the path to a process to invoke to manage the backend
+/// jobs execution.
+/// SaveTemps is a debugging tool that prevents temporary files created by this
+/// backend from being cleaned up.
+ThinBackend createOutOfProcessThinBackend(
+    ThreadPoolStrategy Parallelism, IndexWriteCallback OnWrite,
+    bool ShouldEmitIndexFiles, bool ShouldEmitImportsFiles,
+    StringRef LinkerOutputFile, StringRef Distributor, bool SaveTemps);
+
 /// This ThinBackend writes individual module indexes to files, instead of
 /// running the individual backend jobs. This backend is for distributed builds
 /// where separate processes will invoke the real backends.
@@ -369,15 +396,22 @@ class LTO {
   /// full description of tasks see LTOBackend.h.
   unsigned getMaxTasks() const;
 
-  /// Runs the LTO pipeline. This function calls the supplied AddStream
-  /// function to add native object files to the link.
+  /// Runs the LTO pipeline. This function calls the supplied AddStream or
+  /// AddBuffer function to add native object files to the link depending on
+  /// whether the files are streamed into memory or written to disk by the
+  /// backend.
   ///
   /// The Cache parameter is optional. If supplied, it will be used to cache
   /// native object files and add them to the link.
   ///
-  /// The client will receive at most one callback (via either AddStream or
+  /// The AddBuffer parameter is only required for DTLTO, currently. It is
+  /// optional to minimise the impact on current LTO users (DTLTO is not used
+  /// currently).
+  ///
+  /// The client will receive at most one callback (via AddStream, AddBuffer or
   /// Cache) for each task identifier.
-  Error run(AddStreamFn AddStream, FileCache Cache = {});
+  Error run(AddStreamFn AddStream, FileCache Cache = {},
+            AddBufferFn AddBuffer = nullptr);
 
   /// Static method that returns a list of libcall symbols that can be generated
   /// by LTO but might not be visible from bitcode symbol table.
@@ -426,6 +460,7 @@ class LTO {
     // The bitcode modules to compile, if specified by the LTO Config.
     std::optional<ModuleMapType> ModulesToCompile;
     DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID;
+    DenseMap<StringRef, std::string> ModuleTriples;
   } ThinLTO;
 
   // The global resolution for a particular (mangled) symbol name. This is in
@@ -517,10 +552,12 @@ class LTO {
                        bool LivenessFromIndex);
 
   Error addThinLTO(BitcodeModule BM, ArrayRef<InputFile::Symbol> Syms,
-                   const SymbolResolution *&ResI, const SymbolResolution *ResE);
+                   const SymbolResolution *&ResI, const SymbolResolution *ResE,
+                   StringRef Triple);
 
   Error runRegularLTO(AddStreamFn AddStream);
-  Error runThinLTO(AddStreamFn AddStream, FileCache Cache,
+  Error runThinLTO(AddStreamFn AddStream, AddBufferFn AddBuffer,
+                   FileCache Cache,
                    const DenseSet<GlobalValue::GUID> &GUIDPreservedSymbols);
 
   Error checkPartiallySplit();

diff --git a/llvm/include/llvm/Support/Caching.h b/llvm/include/llvm/Support/Caching.h
@@ -84,7 +84,8 @@ struct FileCache {
   std::string CacheDirectoryPath;
 };
 
-/// This type defines the callback to add a pre-existing file (e.g. in a cache).
+/// This type defines the callback to add a pre-existing file (e.g. in a cache
+/// or created by a backend compilation run as a separate process).
 ///
 /// Buffer callbacks must be thread safe.
 using AddBufferFn = std::function<void(unsigned Task, const Twine &ModuleName,