Skip to content

Commit 31ed910

Browse files
feat: Add support for cross-repo code navigation (#338)
1 parent 0a105e8 commit 31ed910

32 files changed

+682
-85
lines changed

README.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
# scip-clang: SCIP indexer for C and C++ ![(Status: Beta)](https://img.shields.io/badge/status-beta-yellow?style=flat)
22

3-
Status: scip-clang currently supports single-repository precise code navigation
4-
for C and C++.
3+
scip-clang is a precise code indexer based on Clang 16,
4+
which supports cross-repository code navigation for C and C++
5+
in Sourcegraph.
56

67
Here are some code navigation examples in [llvm/llvm-project](https://sourcegraph.com/github.com/llvm/llvm-project):
78
- [Find references for #include](https://sourcegraph.com/github.com/llvm/llvm-project@97a03eb2eb5acf269db6253fe540626b52950f97/-/blob/llvm/include/llvm/ADT/SmallSet.h?L1:1-1:81#tab=references)
@@ -24,7 +25,8 @@ You can also test out [code navigation in Chromium](https://sourcegraph.com/gith
2425
- [Generating a compilation database](#generating-a-compilation-database)
2526
- [Building code](#building-code)
2627
- [Initial scip-clang testing](#initial-scip-clang-testing)
27-
- [Running scip-clang on the entire codebase](#running-scip-clang-on-the-entire-codebase)
28+
- [Running scip-clang on a single repo](#running-scip-clang-on-the-entire-codebase)
29+
- [Setting up cross-repo code navigation](#setting-up-cross-repo-code-navigation)
2830
- [Troubleshooting](#troubleshooting)
2931
- [Reporting issues](#reporting-issues)
3032
- [Documentation](#documentation)
@@ -138,7 +140,7 @@ If there are any other errors,
138140
such as standard library or platform headers not being found,
139141
please [report an issue](#reporting-issues).
140142

141-
### Running scip-clang on the entire codebase
143+
### Running scip-clang on a single repo
142144

143145
```bash
144146
scip-clang --compdb-path=build/compile_commands.json
@@ -149,6 +151,10 @@ since scip-clang is still able to index code in the presence of
149151
compiler errors, and any errors in headers will get repeated
150152
for each translation unit in which the header is included.
151153

154+
### Setting up cross-repo code navigation
155+
156+
See the [cross-repository setup docs](/docs/CrossRepo.md).
157+
152158
## Troubleshooting
153159

154160
See the [Troubleshooting docs](/docs/Troubleshooting.md).

docs/CrossRepo.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Cross-repository code navigation
2+
3+
scip-clang supports cross-repository configuration via explicit configuration.
4+
Before following the steps here, we recommend that you first
5+
[set up single-repository code navigation](/README.md#usage)
6+
and check that it works as expected.
7+
8+
Specifically, each project (i.e. different repo in Sourcegraph)
9+
of interest needs to be indexed with
10+
an extra JSON file describing package information.
11+
12+
```bash
13+
# Run from package0's root directory
14+
scip-clang --package-map-path=package-map.json <other flags>...
15+
```
16+
17+
The package map JSON file contains a list of objects in the following format:
18+
19+
```json
20+
[
21+
{"path": ".", "package": "package0@v0"},
22+
{"path": "path/to/package1_root", "package": "package1@v1"},
23+
...
24+
]
25+
```
26+
27+
1. The `path` key may be an absolute path, or a path relative to the current directory
28+
(which must be the project root).
29+
2. The `package` key consists of a `name` followed by an `@` separator and a `version`.
30+
- The name and version must only contain characters belonging to `[a-zA-Z0-9_\-.]`.
31+
- The version should be chosen based on release information.
32+
For example, if you use git tags to mark releases in repos,
33+
and repositories only depend on tagged releases (instead of arbitrary commits),
34+
then the git tag can be used as the version.
35+
The important thing is that the version needs to be consistent when
36+
different projects are indexed, and it should not be reused over time.
37+
38+
The reason for this is that cross-repo code navigation works
39+
by treating the concatenation of (1) the package name,
40+
(2) the package version and (3) the qualified symbol name
41+
(e.g `std::vector`) as the unique symbol ID across a Sourcegraph instance.
42+
43+
Files under the directories `path/to/package1_root`
44+
will be treated as belonging to `package1`'s `v1` version.
45+
46+
If one package root is a prefix of another, package information
47+
is assigned based on the longest match.
48+
49+
For cross-repository navigation to work,
50+
`package1` must also be indexed with the same version information:
51+
52+
```bash
53+
# This doesn't contain information about package0,
54+
# as package0 depends on package1, but not vice-versa.
55+
$ cat other-package-map.json
56+
[
57+
{"path": ".", "package": "package1@v1"},
58+
...
59+
]
60+
$ scip-clang --package-map-path=other-package-map.json <other flags>...
61+
```
62+
63+
Once both these indexing operations are performed and the indexes
64+
are uploaded to a Sourcegraph instance, cross-repository navigation
65+
should work across `package0` and `package1`.
66+
67+
## Limitations
68+
69+
At the moment, the amount of indexing work required
70+
scales quadratically with the depth of the dependency graph.
71+
Specifically, if package PC depends on PB which depends on PA,
72+
then PA will need to be indexed thrice (once by itself, once when PB is indexed,
73+
once when PC is indexed), PB will be indexed twice (once by itself,
74+
once when PC is indexed), and PC will be indexed once.
75+
76+
The reason for this is that the indexer needs to identify
77+
package information for which declaration is defined in which package,
78+
so that it can correctly support code navigation for forward declarations.
79+
However, strictly speaking, any package can forward-declare any entity.
80+
81+
For example, if PA defines a function f, and a header in PC
82+
forward-declares `pa::f`, then when C is indexed, the indexer
83+
needs to somehow know that the definition of `f` lives in some
84+
file in PA. There are broadly 3 possible ways to do this:
85+
86+
1. Always index all TUs. This is the current strategy.
87+
This is the build equivalent of building everything from source.
88+
2. Provide a way to reuse the indexes from dependencies,
89+
and use those to resolve forward declarations.
90+
This is the build equivalent of using archives/shared libraries/TBDs.
91+
3. Only index the TUs for the "current" package. If the
92+
definition for a forward declaration is not found,
93+
no reference information is emitted (slightly worse UX, but faster).
94+
This is not supported directly in scip-clang,
95+
but can be achieved by removing entries for
96+
out-of-project files from the compilation database.
97+
4. Rely on some heuristics and/or user-supplied hint.
98+
For example, if there was a way to provide a hint that
99+
unresolved forward declarations in namespace `pa` must map
100+
to declarations in package PA, then indexer could
101+
trust that information and correctly emit a reference
102+
for `pa::f` at the forward declaration without having
103+
to re-index TUs in PA.
104+
105+
We're [looking for feedback](https://github.com/sourcegraph/scip-clang/issues/358)
106+
on which approach would work best for your use case,
107+
before implementing a solution for this.

docs/IndexingProjects.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,45 @@
11
# Indexing test projects
22

3+
## scip-clang
4+
5+
For the sake of this example, I'll only describe how
6+
to index scip-clang with cross-repo code navigation support
7+
for [Abseil](https://github.com/abseil/abseil-cpp/).
8+
9+
First, index Abseil.
10+
11+
```
12+
git checkout 4ffaea74
13+
cmake -B build -G Ninja -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS="-g0" -DABSL_BUILD_TESTING=ON -DABSL_USE_GOOGLETEST_HEAD=ON
14+
scip-clang --compdb-path=build/compile_commands.json --package-map=abseil-package-map.json
15+
```
16+
17+
The `abseil-package-map.json` file used here is:
18+
19+
```json
20+
[ {"path": ".", "package": "abseil-cpp@4ffaea74"} ]
21+
```
22+
23+
Next, index scip-clang itself. We subset out the compilation
24+
database first to not index TUs in other packages.
25+
26+
```bash
27+
jq '[.[] | select(.file | (contains("indexer/") or startswith("test/") or contains("com_google_absl")))]' compile_commands.json > min.json
28+
./bazel-bin/indexer/scip-clang --compdb-path=min.json --package-map-path=package-map.json
29+
```
30+
31+
Here, the `package-map.json` is as follows:
32+
33+
```json
34+
[
35+
{"path": ".", "package": "scip-clang@v0.1.3"},
36+
{"path": "./bazel-scip-clang/external/com_google_absl", "package": "abseil-cpp@4ffaea74"}
37+
]
38+
```
39+
40+
The exact versions need to be determined based on current tag
41+
or hashes provided to Bazel.
42+
343
## LLVM
444

545
Tested environments: Ubuntu 18.04, Ubuntu 22.04, macOS 13.

indexer/AstConsumer.cc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,8 @@ void IndexerAstConsumer::HandleTranslationUnit(clang::ASTContext &astContext) {
192192
}
193193

194194
FileMetadataMap fileMetadataMap{this->options.projectRootPath,
195-
this->options.buildRootPath};
195+
this->options.buildRootPath,
196+
this->options.packageMap};
196197
FileIdsToBeIndexedSet toBeIndexed{};
197198
this->computeFileIdsToBeIndexed(astContext, emitIndexDetails,
198199
clangIdLookupMap, fileMetadataMap,

indexer/AstConsumer.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ class CompilerInstance;
2121

2222
namespace scip_clang {
2323
class ClangIdLookupMap;
24+
class PackageMap;
2425
class FileMetadataMap;
2526
class IndexerPreprocessorWrapper;
2627
class MacroIndexer;
@@ -41,6 +42,7 @@ struct IndexerAstConsumerOptions {
4142
RootPath buildRootPath;
4243
WorkerCallback getEmitIndexDetails;
4344
bool deterministic;
45+
PackageMap &packageMap;
4446
};
4547

4648
struct TuIndexingOutput {

indexer/CliOptions.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ struct CliOptions {
3030
std::string temporaryOutputDir;
3131
std::string indexOutputPath;
3232
std::string statsFilePath;
33+
std::string packageMapPath;
3334
bool showCompilerDiagnostics;
3435

3536
size_t ipcSizeHintBytes;

indexer/Driver.cc

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,7 @@ struct DriverOptions {
206206
AbsolutePath compdbPath;
207207
AbsolutePath indexOutputPath;
208208
AbsolutePath statsFilePath;
209+
AbsolutePath packageMapPath;
209210
bool showCompilerDiagnostics;
210211
DriverIpcOptions ipcOptions;
211212
size_t numWorkers;
@@ -224,7 +225,7 @@ struct DriverOptions {
224225
explicit DriverOptions(std::string driverId, const CliOptions &cliOpts)
225226
: workerExecutablePath(),
226227
projectRootPath(AbsolutePath("/"), RootKind::Project), compdbPath(),
227-
indexOutputPath(), statsFilePath(),
228+
indexOutputPath(), statsFilePath(), packageMapPath(),
228229
showCompilerDiagnostics(cliOpts.showCompilerDiagnostics),
229230
ipcOptions{cliOpts.ipcSizeHintBytes, cliOpts.receiveTimeout},
230231
numWorkers(cliOpts.numWorkers), deterministic(cliOpts.deterministic),
@@ -279,6 +280,7 @@ struct DriverOptions {
279280
setAbsolutePath(cliOpts.indexOutputPath, this->indexOutputPath);
280281
setAbsolutePath(cliOpts.compdbPath, this->compdbPath);
281282
setAbsolutePath(cliOpts.statsFilePath, this->statsFilePath);
283+
setAbsolutePath(cliOpts.packageMapPath, this->packageMapPath);
282284

283285
auto makeDirs = [](const StdPath &path, const char *name) {
284286
std::error_code error;
@@ -322,6 +324,10 @@ struct DriverOptions {
322324
if (!this->statsFilePath.asStringRef().empty()) {
323325
args.push_back("--measure-statistics");
324326
}
327+
if (!this->packageMapPath.asStringRef().empty()) {
328+
args.push_back(
329+
fmt::format("--package-map-path={}", packageMapPath.asStringRef()));
330+
}
325331
if (this->noStacktrace) {
326332
args.push_back("--no-stack-trace");
327333
}

indexer/FileMetadata.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ struct PackageId {
5858
struct PackageMetadata {
5959
PackageId id;
6060
AbsolutePathRef rootPath;
61+
bool isMainPackage;
6162
};
6263

6364
/// Represents important metadata related to a file.

indexer/FileSystem.cc

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
#include <string>
2+
3+
#include "llvm/Support/raw_ostream.h"
4+
5+
#include "indexer/Enforce.h"
6+
#include "indexer/FileSystem.h"
7+
8+
namespace scip_clang {
9+
10+
std::string readFileToString(const StdPath &path, std::error_code &ec) {
11+
llvm::raw_fd_stream stream(path.c_str(), ec);
12+
if (ec) {
13+
return "";
14+
}
15+
std::string out;
16+
char buf[16384] = {0};
17+
while (true) {
18+
auto nread = stream.read(buf, sizeof(buf));
19+
if (nread == 0) {
20+
break;
21+
}
22+
if (nread == -1) {
23+
ec = stream.error();
24+
return "";
25+
}
26+
ENFORCE(nread >= 1);
27+
out.append(std::string_view(buf, nread));
28+
}
29+
return out;
30+
}
31+
32+
} // namespace scip_clang

indexer/IdPathMappings.cc

Lines changed: 21 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,8 @@ bool FileMetadataMap::insert(clang::FileID fileId, AbsolutePathRef absPathRef) {
8585
ENFORCE(!absPathRef.asStringView().empty(),
8686
"inserting file with empty absolute path");
8787

88-
std::optional<PackageMetadata> optPackageMetadata = std::nullopt;
88+
auto optPackageMetadata = this->packageMap.lookup(absPathRef);
89+
8990
auto insertRelPath = [&](RootRelativePathRef relPathRef,
9091
bool isInProject) -> bool {
9192
ENFORCE(!relPathRef.asStringView().empty(),
@@ -98,12 +99,24 @@ bool FileMetadataMap::insert(clang::FileID fileId, AbsolutePathRef absPathRef) {
9899
return this->map.insert({{fileId}, std::move(metadata)}).second;
99100
};
100101

101-
// In practice, CMake ends up passing paths to project files as well
102-
// as files inside the build root. Normally, files inside the build root
103-
// are generated ones, but to be safe, check if the corresponding file
104-
// exists in the project. Since the build root itself is typically inside
105-
// the project root, check the build root first.
106-
if (auto buildRootRelPath = this->buildRootPath.tryMakeRelative(absPathRef)) {
102+
if (optPackageMetadata.has_value()) {
103+
if (auto optStrView =
104+
optPackageMetadata->rootPath.makeRelative(absPathRef)) {
105+
return insertRelPath(RootRelativePathRef(*optStrView, RootKind::External),
106+
/*isInProject*/ optPackageMetadata->isMainPackage);
107+
} else {
108+
spdlog::warn("package info map determined '{}' as root for path '{}', "
109+
"but prefix check failed",
110+
optPackageMetadata->rootPath.asStringView(),
111+
absPathRef.asStringView());
112+
}
113+
// In practice, CMake ends up passing paths to project files as well
114+
// as files inside the build root. Normally, files inside the build root
115+
// are generated ones, but to be safe, check if the corresponding file
116+
// exists in the project. Since the build root itself is typically inside
117+
// the project root, check the build root first.
118+
} else if (auto buildRootRelPath =
119+
this->buildRootPath.tryMakeRelative(absPathRef)) {
107120
auto originalFileSourcePath =
108121
this->projectRootPath.makeAbsoluteAllowKindMismatch(
109122
buildRootRelPath.value());
@@ -123,18 +136,7 @@ bool FileMetadataMap::insert(clang::FileID fileId, AbsolutePathRef absPathRef) {
123136
this->projectRootPath.tryMakeRelative(absPathRef)) {
124137
return insertRelPath(optProjectRootRelPath.value(), /*isInProject*/ true);
125138
}
126-
if (optPackageMetadata.has_value()) {
127-
if (auto optStrView =
128-
optPackageMetadata->rootPath.makeRelative(absPathRef)) {
129-
return insertRelPath(RootRelativePathRef(*optStrView, RootKind::External),
130-
/*isInProject*/ false);
131-
} else {
132-
spdlog::warn("package info map determined '{}' as root for path '{}', "
133-
"but prefix check failed",
134-
optPackageMetadata->rootPath.asStringView(),
135-
absPathRef.asStringView());
136-
}
137-
}
139+
138140
auto optFileName = absPathRef.fileName();
139141
ENFORCE(optFileName.has_value(),
140142
"Clang returned file path {} without a file name",

0 commit comments

Comments
 (0)