Skip to content

Commit 151d3af

Browse files
committed
Update Victoria's RFC from 2018
1 parent b6fffdf commit 151d3af

File tree

1 file changed

+196
-0
lines changed

1 file changed

+196
-0
lines changed

text/0000-rustdoc-json.md

Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
- Feature Name: `rustdoc_json`
2+
- Start Date: 2020-06-26
3+
- RFC PR: (leave this empty)
4+
- Rust Issue: (leave this empty)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
This RFC describes the design of a JSON output for the tool `rustdoc`, to allow tools to
10+
lean on its data collection and refinement but provide a different front-end.
11+
12+
# Motivation
13+
[motivation]: #motivation
14+
15+
The current HTML output of `rustdoc` is often lauded as a key selling point of Rust. Using this
16+
ubiquitous tool, you can easily find nearly anything you need to know about a crate. However,
17+
despite its versatility, the use of this specific output has its drawbacks:
18+
19+
- Viewing this output requires a web browser, with (for some features of the output) a JavaScript
20+
interpreter.
21+
- The HTML output of `rustdoc` is explicitly not stabilized, to allow `rustdoc` developers the
22+
option to tweak the display of information, add new information, etc. However, this also means
23+
that converting this HTML into a different output is infeasible.
24+
- As the HTML is the only available output of `rustdoc`, its integration into centralized,
25+
multi-language, documentation browsers is difficult.
26+
27+
In addition, `rustdoc` had JSON output in the past, but it failed to keep up with the changing
28+
language and [was taken out][remove-json] in 2016. With `rustdoc` in a more stable position, it's
29+
possible to re-introduce this feature and ensure its stability. This [was brought up in 2018][2018-discussion]
30+
with a positive response and there are [several][2019-interest] [recent][rustdoc-infopages] discussions
31+
indicating that it would be a nice feature to have.
32+
33+
In [the draft RFC from 2018][previous-rfc] there was some discussion of utilizing `save-analysis` to
34+
provide this information, but with [RLS being replaced by rust-analyzer][RA-RLS] it's possible that
35+
the feature will be eventually removed from the compiler. In addition `save-analysis` output is just
36+
as if not more unstable than the current HTML output of `rustdoc`, so a separate feature is preferable.
37+
38+
[remove-json]: https://github.com/rust-lang/rust/pull/32773
39+
[2018-discussion]: https://internals.rust-lang.org/t/design-discussion-json-output-for-rustdoc/8271/6
40+
[2019-interest]: https://github.com/rust-lang/rust/issues/44136#issuecomment-467144974
41+
[rustdoc-infopages]: https://internals.rust-lang.org/t/current-state-of-rustdoc-and-cargo/11721
42+
[previous-rfc]: https://github.com/QuietMisdreavus/rfcs/blob/rustdoc-json/text/0000-rustdoc-json.md#unresolved-questions
43+
[RA-RLS]: https://github.com/rust-lang/rfcs/pull/2912
44+
45+
# Guide-level explanation
46+
[guide-level-explanation]: #guide-level-explanation
47+
48+
(*Upon successful implementation/stabilization, this documentation should live in The Rustdoc
49+
Book.*)
50+
51+
In addition to generating the regular HTML, `rustdoc` can create JSON files based on your crate.
52+
These can be used by other tools to take information about your crate and convert it into other
53+
output formats, insert into centralized documentation systems, create language bindings, etc.
54+
55+
To get this output, pass the `--output-format json` flag to `rustdoc`:
56+
57+
```console
58+
$ rustdoc lib.rs --output-format json
59+
```
60+
61+
This will output a JSON file in the current directory (by default). For example, say you have the
62+
following crate:
63+
64+
```rust
65+
//! Here are some crate-level docs!
66+
67+
/// Here are some docs for `some_fn`!
68+
pub fn some_fn() {}
69+
70+
/// Here are some docs for `SomeStruct`!
71+
pub struct SomeStruct;
72+
```
73+
74+
After running the above command, you should get a `lib.json` file like the following (indented for
75+
clarity):
76+
77+
```json
78+
{
79+
TODO
80+
}
81+
```
82+
83+
# Reference-level explanation
84+
[reference-level-explanation]: #reference-level-explanation
85+
86+
(*Upon successful implementation/stabilization, this documentation should live in The Rustdoc
87+
Book.*)
88+
89+
When you request JSON output from `rustdoc`, you're getting a version of the Rust abstract syntax
90+
tree (AST), so you could see anything that you could export from a valid Rust crate. The following
91+
types can appear in the output:
92+
93+
TODO
94+
95+
You also get a collection of mappings between items such as all the types that implement a certain
96+
trait and vice versa. The structure of those mappings is as follows:
97+
98+
TODO
99+
100+
(*This documentation is deliberately left incomplete; filling it out will happen during the design process.*)
101+
102+
(*Complete documentation information is deferred to final design and implementation work.*)
103+
104+
# Drawbacks
105+
[drawbacks]: #drawbacks
106+
107+
- By supporting JSON output for `rustdoc`, we should consider how much it should mirror the internal
108+
structures used in `rustdoc` and in the compiler. Depending on how much we want to stabilize, we
109+
could accidentally stabilize the internal structures of `rustdoc`.
110+
111+
- Even if we don't accidentally stabilize `rustdoc`'s internals, adding JSON output adds *another*
112+
thing that must be kept up to date with language changes, and another thing for compiler
113+
contributors to potentially break with their changes. Because the HTML output is only meant for
114+
display, it requires less vigilant updating when new language features are added.
115+
116+
# Rationale and alternatives
117+
[rationale-and-alternatives]: #rationale-and-alternatives
118+
119+
- **Status quo.** Keep the HTML the way it is, and make users who want a machine-readable version of
120+
a crate parse it themselves. In the absence of an accepted JSON output, the `--output-format` flag in rustdoc
121+
remains deprecated and unused.
122+
- **Alternate data format (XML, Bincode, CapnProto, etc).** JSON was selected for its ubiquity in
123+
available parsers, but selecting a different data format may provide benefits for file size,
124+
compressibility, speed of conversion, etc. If the implementation leans on serde then this may be a
125+
non-issue as it would be trivial to switch serialization formats.
126+
- **Alternate data structure.** Massage the data so that it echoes something closer to user
127+
perception, rather than the internal `clean` AST that they're currently modeled after. Such a
128+
refinement can be provided in a future RFC, as a potential alternate data format to output, if
129+
necessary.
130+
131+
# Prior art
132+
[prior-art]: #prior-art
133+
134+
A handful of other languages and systems have documentation tools that output an intermediate
135+
representation separate from the human-readable outputs:
136+
137+
- [PureScript] uses an intermediate JSON representation when publishing package information to their
138+
[Pursuit] directory. It's primarily used to generate documentation, but can also be used to
139+
generate `etags` files.
140+
- [Doxygen] has an option to generate an XML file with the code's information.
141+
- [Haskell]'s documentation tool, [Haddock], can generate an intermediate representation used by the
142+
type search engine [Hoogle] to integrate documentation of several packages.
143+
- [Kythe] is a "(mostly) language-agnostic" system for integrating documentation across several
144+
langauges. It features its own schema that code information can be translated into, that services
145+
can use to aggregate information about projects that span multiple languages.
146+
- [GObject Introspection] has an intermediate XML representation called GIR that's used to create
147+
langauge bindings for GObject-based C libraries. While (at the time of this writing) it's not
148+
currently used to create documentation, it is a stated goal to use this information to document
149+
these libraries.
150+
151+
[PureScript]: http://www.purescript.org/
152+
[Pursuit]: https://pursuit.purescript.org/
153+
[Doxygen]: https://www.doxygen.nl/
154+
[Haskell]: https://www.haskell.org/
155+
[Haddock]: https://www.haskell.org/haddock/
156+
[Hoogle]: https://www.haskell.org/hoogle/
157+
[Kythe]: http://kythe.io/
158+
[GObject Introspection]: https://gi.readthedocs.io/en/latest/
159+
160+
# Unresolved questions
161+
[unresolved-questions]: #unresolved-questions
162+
163+
- What is the stabilization story? As langauge features are added, this representation will need to
164+
be extended to accommodate it. As this will change the structure of the data, what does that mean
165+
for its consumers?
166+
- How will intra-doc links be handled? Supporting `struct.SomeStruct.html` style links is pretty
167+
infeasible since it would tie alternative front-ends to `rustdoc`'s file/folder format. With the
168+
nightly intra-rustdoc link syntax it's debatable whether we should resolve those to HTML links or
169+
leave that up to whatever consumes the JSON.
170+
- How do we represent types, and allow people to properly collect type information from places like
171+
struct fields, function signatures, etc? `rustdoc`'s own `clean::Type` enum is large and recursive
172+
and represents a lot of primitives, in addition to ultimately deferring the lookup to a DefId.
173+
- The `id` field is basically a copy of DefId from inside the compiler; is there a better way to
174+
represent it? How necessary is it to have?
175+
- Where should we store impls?
176+
- In `rustdoc`, trait impls are pooled in the crate root (or placed in the module they're declared
177+
in), but before rendering, the information is copied into two maps: one mapping traits to their
178+
implementors, and one mapping types to all their impls (inherent or trait).
179+
- The HIR copies all trait impls into a map connecting traits to their implementors, though
180+
they're also available in the location they're defined if you iterate over the HIR.
181+
- However, while trait impls are unburdened by scope rules for visibility, *inherent* impls are.
182+
Currently, if `--document-private-items` is passed, the methods defined in an impl are all
183+
pooled into a struct, and any `pub(restricted)` scopes link to their respective modules.
184+
However, private methods are just shown as private, without any information connecting them to
185+
where they're allowed.
186+
- This leads to wanting to pool impls on their type (and copying them in to their trait for trait
187+
impls), and leaving the visibility fix for a later PR.
188+
189+
190+
# Future possibilities
191+
[future-possibilities]: #future-possibilities
192+
193+
- Since refactoring has to be done to support both the HTML and JSON backends to `rustdoc`, future
194+
work to add other output formats such as pure markdown should be relatively simple after this.
195+
- Once the JSON output is added, a Rust library for parsing it back into useful structs that lives
196+
outside the compiler would be helpful to allow people to easily use this representation.

0 commit comments

Comments
 (0)