|
| 1 | +- Feature Name: `rustdoc_json` |
| 2 | +- Start Date: 2020-06-26 |
| 3 | +- RFC PR: (leave this empty) |
| 4 | +- Rust Issue: (leave this empty) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +This RFC describes the design of a JSON output for the tool `rustdoc`, to allow tools to |
| 10 | +lean on its data collection and refinement but provide a different front-end. |
| 11 | + |
| 12 | +# Motivation |
| 13 | +[motivation]: #motivation |
| 14 | + |
| 15 | +The current HTML output of `rustdoc` is often lauded as a key selling point of Rust. Using this |
| 16 | +ubiquitous tool, you can easily find nearly anything you need to know about a crate. However, |
| 17 | +despite its versatility, the use of this specific output has its drawbacks: |
| 18 | + |
| 19 | +- Viewing this output requires a web browser, with (for some features of the output) a JavaScript |
| 20 | + interpreter. |
| 21 | +- The HTML output of `rustdoc` is explicitly not stabilized, to allow `rustdoc` developers the |
| 22 | + option to tweak the display of information, add new information, etc. However, this also means |
| 23 | + that converting this HTML into a different output is infeasible. |
| 24 | +- As the HTML is the only available output of `rustdoc`, its integration into centralized, |
| 25 | + multi-language, documentation browsers is difficult. |
| 26 | + |
| 27 | +In addition, `rustdoc` had JSON output in the past, but it failed to keep up with the changing |
| 28 | +language and [was taken out][remove-json] in 2016. With `rustdoc` in a more stable position, it's |
| 29 | +possible to re-introduce this feature and ensure its stability. This [was brought up in 2018][2018-discussion] |
| 30 | +with a positive response and there are [several][2019-interest] [recent][rustdoc-infopages] discussions |
| 31 | +indicating that it would be a nice feature to have. |
| 32 | + |
| 33 | +In [the draft RFC from 2018][previous-rfc] there was some discussion of utilizing `save-analysis` to |
| 34 | +provide this information, but with [RLS being replaced by rust-analyzer][RA-RLS] it's possible that |
| 35 | +the feature will be eventually removed from the compiler. In addition `save-analysis` output is just |
| 36 | +as if not more unstable than the current HTML output of `rustdoc`, so a separate feature is preferable. |
| 37 | + |
| 38 | +[remove-json]: https://github.com/rust-lang/rust/pull/32773 |
| 39 | +[2018-discussion]: https://internals.rust-lang.org/t/design-discussion-json-output-for-rustdoc/8271/6 |
| 40 | +[2019-interest]: https://github.com/rust-lang/rust/issues/44136#issuecomment-467144974 |
| 41 | +[rustdoc-infopages]: https://internals.rust-lang.org/t/current-state-of-rustdoc-and-cargo/11721 |
| 42 | +[previous-rfc]: https://github.com/QuietMisdreavus/rfcs/blob/rustdoc-json/text/0000-rustdoc-json.md#unresolved-questions |
| 43 | +[RA-RLS]: https://github.com/rust-lang/rfcs/pull/2912 |
| 44 | + |
| 45 | +# Guide-level explanation |
| 46 | +[guide-level-explanation]: #guide-level-explanation |
| 47 | + |
| 48 | +(*Upon successful implementation/stabilization, this documentation should live in The Rustdoc |
| 49 | +Book.*) |
| 50 | + |
| 51 | +In addition to generating the regular HTML, `rustdoc` can create JSON files based on your crate. |
| 52 | +These can be used by other tools to take information about your crate and convert it into other |
| 53 | +output formats, insert into centralized documentation systems, create language bindings, etc. |
| 54 | + |
| 55 | +To get this output, pass the `--output-format json` flag to `rustdoc`: |
| 56 | + |
| 57 | +```console |
| 58 | +$ rustdoc lib.rs --output-format json |
| 59 | +``` |
| 60 | + |
| 61 | +This will output a JSON file in the current directory (by default). For example, say you have the |
| 62 | +following crate: |
| 63 | + |
| 64 | +```rust |
| 65 | +//! Here are some crate-level docs! |
| 66 | + |
| 67 | +/// Here are some docs for `some_fn`! |
| 68 | +pub fn some_fn() {} |
| 69 | + |
| 70 | +/// Here are some docs for `SomeStruct`! |
| 71 | +pub struct SomeStruct; |
| 72 | +``` |
| 73 | + |
| 74 | +After running the above command, you should get a `lib.json` file like the following (indented for |
| 75 | +clarity): |
| 76 | + |
| 77 | +```json |
| 78 | +{ |
| 79 | +TODO |
| 80 | +} |
| 81 | +``` |
| 82 | + |
| 83 | +# Reference-level explanation |
| 84 | +[reference-level-explanation]: #reference-level-explanation |
| 85 | + |
| 86 | +(*Upon successful implementation/stabilization, this documentation should live in The Rustdoc |
| 87 | +Book.*) |
| 88 | + |
| 89 | +When you request JSON output from `rustdoc`, you're getting a version of the Rust abstract syntax |
| 90 | +tree (AST), so you could see anything that you could export from a valid Rust crate. The following |
| 91 | +types can appear in the output: |
| 92 | + |
| 93 | +TODO |
| 94 | + |
| 95 | +You also get a collection of mappings between items such as all the types that implement a certain |
| 96 | +trait and vice versa. The structure of those mappings is as follows: |
| 97 | + |
| 98 | +TODO |
| 99 | + |
| 100 | +(*This documentation is deliberately left incomplete; filling it out will happen during the design process.*) |
| 101 | + |
| 102 | +(*Complete documentation information is deferred to final design and implementation work.*) |
| 103 | + |
| 104 | +# Drawbacks |
| 105 | +[drawbacks]: #drawbacks |
| 106 | + |
| 107 | +- By supporting JSON output for `rustdoc`, we should consider how much it should mirror the internal |
| 108 | + structures used in `rustdoc` and in the compiler. Depending on how much we want to stabilize, we |
| 109 | + could accidentally stabilize the internal structures of `rustdoc`. |
| 110 | + |
| 111 | +- Even if we don't accidentally stabilize `rustdoc`'s internals, adding JSON output adds *another* |
| 112 | + thing that must be kept up to date with language changes, and another thing for compiler |
| 113 | + contributors to potentially break with their changes. Because the HTML output is only meant for |
| 114 | + display, it requires less vigilant updating when new language features are added. |
| 115 | + |
| 116 | +# Rationale and alternatives |
| 117 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 118 | + |
| 119 | +- **Status quo.** Keep the HTML the way it is, and make users who want a machine-readable version of |
| 120 | + a crate parse it themselves. In the absence of an accepted JSON output, the `--output-format` flag in rustdoc |
| 121 | + remains deprecated and unused. |
| 122 | +- **Alternate data format (XML, Bincode, CapnProto, etc).** JSON was selected for its ubiquity in |
| 123 | + available parsers, but selecting a different data format may provide benefits for file size, |
| 124 | + compressibility, speed of conversion, etc. If the implementation leans on serde then this may be a |
| 125 | + non-issue as it would be trivial to switch serialization formats. |
| 126 | +- **Alternate data structure.** Massage the data so that it echoes something closer to user |
| 127 | + perception, rather than the internal `clean` AST that they're currently modeled after. Such a |
| 128 | + refinement can be provided in a future RFC, as a potential alternate data format to output, if |
| 129 | + necessary. |
| 130 | + |
| 131 | +# Prior art |
| 132 | +[prior-art]: #prior-art |
| 133 | + |
| 134 | +A handful of other languages and systems have documentation tools that output an intermediate |
| 135 | +representation separate from the human-readable outputs: |
| 136 | + |
| 137 | +- [PureScript] uses an intermediate JSON representation when publishing package information to their |
| 138 | + [Pursuit] directory. It's primarily used to generate documentation, but can also be used to |
| 139 | + generate `etags` files. |
| 140 | +- [Doxygen] has an option to generate an XML file with the code's information. |
| 141 | +- [Haskell]'s documentation tool, [Haddock], can generate an intermediate representation used by the |
| 142 | + type search engine [Hoogle] to integrate documentation of several packages. |
| 143 | +- [Kythe] is a "(mostly) language-agnostic" system for integrating documentation across several |
| 144 | + langauges. It features its own schema that code information can be translated into, that services |
| 145 | + can use to aggregate information about projects that span multiple languages. |
| 146 | +- [GObject Introspection] has an intermediate XML representation called GIR that's used to create |
| 147 | + langauge bindings for GObject-based C libraries. While (at the time of this writing) it's not |
| 148 | + currently used to create documentation, it is a stated goal to use this information to document |
| 149 | + these libraries. |
| 150 | + |
| 151 | +[PureScript]: http://www.purescript.org/ |
| 152 | +[Pursuit]: https://pursuit.purescript.org/ |
| 153 | +[Doxygen]: https://www.doxygen.nl/ |
| 154 | +[Haskell]: https://www.haskell.org/ |
| 155 | +[Haddock]: https://www.haskell.org/haddock/ |
| 156 | +[Hoogle]: https://www.haskell.org/hoogle/ |
| 157 | +[Kythe]: http://kythe.io/ |
| 158 | +[GObject Introspection]: https://gi.readthedocs.io/en/latest/ |
| 159 | + |
| 160 | +# Unresolved questions |
| 161 | +[unresolved-questions]: #unresolved-questions |
| 162 | + |
| 163 | +- What is the stabilization story? As langauge features are added, this representation will need to |
| 164 | + be extended to accommodate it. As this will change the structure of the data, what does that mean |
| 165 | + for its consumers? |
| 166 | +- How will intra-doc links be handled? Supporting `struct.SomeStruct.html` style links is pretty |
| 167 | + infeasible since it would tie alternative front-ends to `rustdoc`'s file/folder format. With the |
| 168 | + nightly intra-rustdoc link syntax it's debatable whether we should resolve those to HTML links or |
| 169 | + leave that up to whatever consumes the JSON. |
| 170 | +- How do we represent types, and allow people to properly collect type information from places like |
| 171 | + struct fields, function signatures, etc? `rustdoc`'s own `clean::Type` enum is large and recursive |
| 172 | + and represents a lot of primitives, in addition to ultimately deferring the lookup to a DefId. |
| 173 | +- The `id` field is basically a copy of DefId from inside the compiler; is there a better way to |
| 174 | + represent it? How necessary is it to have? |
| 175 | +- Where should we store impls? |
| 176 | + - In `rustdoc`, trait impls are pooled in the crate root (or placed in the module they're declared |
| 177 | + in), but before rendering, the information is copied into two maps: one mapping traits to their |
| 178 | + implementors, and one mapping types to all their impls (inherent or trait). |
| 179 | + - The HIR copies all trait impls into a map connecting traits to their implementors, though |
| 180 | + they're also available in the location they're defined if you iterate over the HIR. |
| 181 | + - However, while trait impls are unburdened by scope rules for visibility, *inherent* impls are. |
| 182 | + Currently, if `--document-private-items` is passed, the methods defined in an impl are all |
| 183 | + pooled into a struct, and any `pub(restricted)` scopes link to their respective modules. |
| 184 | + However, private methods are just shown as private, without any information connecting them to |
| 185 | + where they're allowed. |
| 186 | + - This leads to wanting to pool impls on their type (and copying them in to their trait for trait |
| 187 | + impls), and leaving the visibility fix for a later PR. |
| 188 | + |
| 189 | + |
| 190 | +# Future possibilities |
| 191 | +[future-possibilities]: #future-possibilities |
| 192 | + |
| 193 | +- Since refactoring has to be done to support both the HTML and JSON backends to `rustdoc`, future |
| 194 | + work to add other output formats such as pure markdown should be relatively simple after this. |
| 195 | +- Once the JSON output is added, a Rust library for parsing it back into useful structs that lives |
| 196 | + outside the compiler would be helpful to allow people to easily use this representation. |
0 commit comments