Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Sketch of SCIP-based GraphQL API #61217

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
e50bb3b
WIP: New SCIP-based GraphQL API draft
varungandhi-src Mar 18, 2024
e5b9e38
Make repository and path filters objects instead of primitives
varungandhi-src Apr 22, 2024
df4fb88
Make symbol name and provenance comparison objects instead of primitives
varungandhi-src Apr 22, 2024
203e505
Move usages API to top-level
varungandhi-src Apr 22, 2024
1949594
Address feedback abotu optionality, add comments, introduce doc API, …
varungandhi-src Apr 23, 2024
eebbd29
Add more comments, consistent naming, add spaces
varungandhi-src Apr 25, 2024
072bbc0
Add filter for provenance
varungandhi-src Apr 25, 2024
8e947b7
Rename symbolName -> symbol in Occurrence
varungandhi-src Apr 25, 2024
4032028
Replace DEPRECATED: comments with @deprecated
varungandhi-src Apr 25, 2024
2146fed
Tweak comments/replace symbolName->symbol
varungandhi-src Apr 25, 2024
570798a
Return multiple CodeGraphData due to multiple tool possibility
varungandhi-src May 2, 2024
9269229
Mention toolInfo and build identifiers.
varungandhi-src May 2, 2024
7c42739
Mention file navigation and use common-ish syntax for extension ideas
varungandhi-src May 2, 2024
e90ab20
Fix naming convention + use pagination for symbols & occurrences
varungandhi-src May 2, 2024
5f0c62a
Use Connection for SCIP Document data
varungandhi-src May 2, 2024
dbd4068
Rename Provenance & remove WithProvenance interface
varungandhi-src May 2, 2024
e0af435
Add comment for symbols vs scipDocument.data.symbols
varungandhi-src May 2, 2024
653a197
Add filter equivalence and design comment
varungandhi-src May 2, 2024
e20c0a5
Rename field & make blob: GitBlob optional
varungandhi-src May 2, 2024
38fc951
Move comment to different field
varungandhi-src May 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 101 additions & 2 deletions cmd/frontend/graphqlbackend/codeintel.codenav.graphql
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,21 @@ type GitBlobLSIFData implements TreeEntryLSIFData {
"""
character: Int!

"""
When specified, indicates that this request should be paginated and
to fetch results starting at this cursor.
A future request can be made for more results by passing in the
'LocationConnection.pageInfo.endCursor' that is returned.
"""
after: String

"""
When specified, indicates that this request should be paginated and
the first N results (relative to the cursor) should be returned. i.e.
how many results to return per page.
"""
first: Int

"""
When specified, it filters references by filename.
"""
Expand All @@ -118,12 +133,12 @@ type GitBlobLSIFData implements TreeEntryLSIFData {
"""
The line on which the symbol occurs (zero-based, inclusive).
"""
line: Int!
line: Int

"""
The character (not byte) of the start line on which the symbol occurs (zero-based, inclusive).
"""
character: Int!
character: Int
Comment on lines -121 to +281
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Revert these changes, we don't care about this API anymore


"""
When specified, indicates that this request should be paginated and
Expand Down Expand Up @@ -231,6 +246,19 @@ type GitBlobLSIFData implements TreeEntryLSIFData {
character: Int!
): Hover

"""
Either one of 'symbol' or 'range' must be provided, but this isn't enforced at the GraphQL
layer due to the lack of support for input unions.
https://github.com/graphql/graphql-wg/blob/main/rfcs/InputUnion.md
"""
usages(
symbol: LookupSCIPSymbol,
range: LookupRange,
filter: UsagesFilter,
first: Int,
after: String,
): UsageConnection!
Copy link
Contributor Author

@varungandhi-src varungandhi-src Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anton's Q: Why not split this into RangeUsageConnection and SymbolUsageConnection? This will cause more complexity in the outputs because of input dependence.

Varun's A: The output needs to have some complexity anyways because of needing to support search-based/syntactic/precise and some information will not be present depending on the data source. (Anton's idea: In GraphQL, you'd typically use union types for results so that caller can check the type and cast.) Also, depending on whether the symbol is file-local, repo-local or cross-repo, we do actually need different kinds of range information.


"""
Code diagnostics provided through LSIF.
"""
Expand All @@ -247,6 +275,77 @@ type GitBlobLSIFData implements TreeEntryLSIFData {
snapshot(indexID: ID!): [SnapshotData!]
}

"""
In the future, we may want to extend this to allow passing in a suffix or a "symbol pattern".
"""
input LookupSCIPSymbol {
name: String!
provenance: Provenance!
}

type SCIPSymbol {
name: String!
provenance: Provenance!
}

input LookupRange {
start: LookupPosition!
end: LookupPosition!
}

input LookupPosition {
line: Int!
character: Int!
}

input UsagesFilter {
and: [UsagesFilter!]
or: [UsagesFilter!]
not: UsagesFilter
# TODO: Can we be flexible here and allow patterns while still maintaining fast queries.
repository: String
path: String
kind: SymbolUsageKind
# TODO: Provide a way of controlling fallback behavior for indexes?
# Not super sure how that would work, given that for different repositories,
# the hashes will all be different?
}

enum SymbolUsageKind {
Definition,
Reference,
Implementation,
Super,
}

type UsageConnection {
nodes: [Usage!]!
pageInfo: PageInfo!
}

type Usage {
symbol: SCIPSymbol!
"""
NOTE: Do not pass
TODO: For this blob, it is generally more useful to get +N/-N lines
rather than having to pass a range internally for the contents since
the offsets from the start will not be known a-priori.
"""
blob: GitBlob!
"""
Instead of blob { content }, allows accessing a sub-span of the content
using relative coordinates from the range of this usage. If linesBefore
or linesAfter is negative or exceeds the number of available lines,
the value is interpreted as until the start/end of the file.
"""
surroundingBlobContent(surroundingLines: SurroundingLines = {linesBefore: 0, linesAfter: 0}): String!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 meaning "no value given" is a very go-ey thing, I think you don't need a default value here, since surroundingLines: SurroundingLines already indicates nullable and the backend can then decide how to represent that in the given programming language

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, 0 doesn't mean "no value given", it literally means 0 because that makes sense as a default -- only the single line from the blob will be returned in the default case. I wrote the 0 explicitly here as it serves as documentation for an API client for what the behavior will be if they omit this parameter. Another potential default would be INT32_MAX (i.e. get the full blob contents), but we're deliberately picking 0 here, not INT32_MAX.

}

input SurroundingLines {
linesBefore: Int
linesAfter: Int
}

"""
The SCIP snapshot decoration for a single SCIP Occurrence.
"""
Expand Down
40 changes: 39 additions & 1 deletion cmd/frontend/graphqlbackend/schema.graphql
Original file line number Diff line number Diff line change
Expand Up @@ -6172,17 +6172,55 @@ type HighlightedFile {
"""
html: String!
"""
Base64 encoded JSON payload of LSIF Typed with syntax highlighting data.
DEPRECATED: Base64 encoded JSON payload of SCIP with syntax highlighting data.
"""
lsif: String!
"""
"""
document: AssociatedSCIPDocument!
"""
A list of the desired line ranges. Each list is a list of lines, where each element is an HTML
table row '<tr>...</tr>' string. This is useful if you only need to display specific subsets of
the file.
"""
lineRanges(ranges: [HighlightLineRange!]!): [[String!]!]!
}

enum Provenance {
Precise,
Syntactic,
SearchBased,
}

interface WithProvenance {
"""
Coarse-grained information about the data source.
"""
provenance: Provenance!
"""
Opaque fine-grained information describing the data source.

Provided only for debugging.

This field should be ignored when checking structural equality.
"""
dataSource: String!
}

type AssociatedSCIPDocument implements WithProvenance {
"""
Base64 encoded SCIP payload.

After decoding, the resulting byte buffer can be parsed with
pre-generated bindings in the sourcegraph/scip repository or
manually generating bindings using
https://github.com/sourcegraph/scip/blob/main/scip.proto
"""
data: String!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder: would it be more natural to use proto's canonical JSON encoding for a GraphQL API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the downside is then there is temptation to not use the SCIP bindings and just use the deserialized JSON object directly. Also it doesn't look like the bindings support JSON serialization. So nevermind, ignore me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We don't want to update our GraphQL API for every change to SCIP, because most of our indexers may not adopt the new features right away. Some features also add a significant amount of complexity (e.g. this one being proposed which exposes full type structures Add support for SymbolInformation.signature scip#231), and I don't think we should expose that in our GraphQL API. Exposing the raw SCIP allows a customer to upload updated indexes using the new SCIP feature and implement custom functionality.
  2. Protobuf allows a client to parse the data in a streaming manner, which can allow for optimizations like interning-while-parsing (e.g. for symbol names), as well as just skipping needless input, without allocating a lot of memory.

Some of the important Document data is already exposed by the symbols and occurrences fields in the API today. So 'light' clients can make use of it directly. 'Heavy' clients can take on a bit more complexity by parsing the Protobuf for full access to all the details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By shoving this into a string field, you lose the ability to do a subselect, which kind of defeats GraphQL. That said, if we really need a proto representation here, I don't think that is blocking.

However, from an APIs perspective, it feels a little cleaner to return JSON.

If having to support streaming decoding here, I think I have another concern:
Is this gonna be a huge string sent over? Base64 encoding adds another bit of overhead to the size of the proto, and you need to unmarshal the whole JSON GQL response to then grab this string to then streaming decode it.
Our backend has to build up the whole response in memory, and the clients load it into memory entirely. Is that a concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing here is that we want clients to use the SCIP proto-bindings for interacting with this field

provenance: Provenance!
dataSource: String!
}

"""
A file match.
"""
Expand Down