Meta Issue: 3rd party use (distributing schemas, writing/reading facts, etc)

Glean schemas are dynamically handled at runtime, which is great, but as of today I'm not sure it's practical to write schemas that exist outside of the existing source code repository without some further work. Or maybe it can work, but it's not immediately obvious what I need based on my cursory analysis.

As some background, I'd like to write my own custom schemas, mostly for fun, but also so I can experiment with some ideas like adding code coverage information into Glean ("is this line executed or not?") as its own set of facts. So, what I need to do is:

1. Write the schema (in my repo)
2. Create facts (somehow) using a program
3. Write the facts to the database
4. Read the facts and display them in some way.

But the real problems are step 2, 3, and 4:

1. How do I create valid facts for my schema?
2. How do I write the facts to a write server?
3. How do I query them after the fact?

I think these are probably all solvable in a way that is probably still maintainable for y'all, while being usable for outsiders, but it needs a bit of thought.

## Problem: Generating usable code to create facts

Facts are (very simplified) just JSON blobs, but the structure of these of the generated facts is determined *a priori* by the schema. Therefore, I need some way to take my schema and then create facts that are compatible with it.

Today this is handled with code generation via `gen-schema`, from what I can tell. The way it works is like this:

1. Given the schema, it will generate source code in various languages, which you then import into your program. The API from the generated module is created from the structure of the schema, so your program must generate correct facts "by design"
2. You then write a program that does... something... and imports this generated code and uses it to write facts based on something it did.

The problem is that all of the existing code generators are designed to generate code that works inside Meta, to some degree. For example, the Python code generator seems to depend on `pythrift3`, while the OCaml generator depends on "supercaml"(?), and the Haskell code generator relies on the Glean codebase itself (`import Glean.Types`). This in effect makes them unusable for anyone else.

In short, the problem is that I need *some* way to get structural information about the shape of the Angle schema, so I can then shovel in facts of the appropriate shape. I think code generation is the right tool for this, but the existing setup isn't usable for anyone else.

## (Minor) Problem: Writing facts

This problem is basically a lot simpler to describe: how do I write facts to the database? The basic options I see are:

- Use the `glean-client` Haskell package (somehow, given that it isn't distributed on Hackage) and write a Haskell program to do your bidding.
- Use the `glean write` CLI command to write facts by writing JSON to the filesystem and shelling out to a subprocess to do it.
- That's all

The problem is that, in general, I'd like write facts in some language-agnostic way. JSON completely sucks, I admit, but it is widely available. However, at the very minimum, I think shelling out to the CLI and using `glean write` is an acceptable workaround, and it can do the process of translating the JSON crap on the client side and avoid the server needing to do that itself. So, I'd say this is a relatively small problem for now.

## Problem: Reading facts

Similar problem that is a combination of the prior two issues: how do I read facts from a read-only server? Normally you need to:

- Construct an angle query
- Submit it to the server, somehow
- Read the resulting facts

Today, to do this, you can:

- Use `glean-client` and write a Haskell program to do it
- That's it?

I realize this is addressed in the docs to some extent, that only the Haskell client exists, but my interest is more that I want to read facts in a relatively general way that will be accessible. I might use Rust, or TypeScript, *or* Haskell to do this, and the read path is much more varied than the write path, so something here is really important.

I think the biggest problem here is less the shape of the query and the response (steps 1 and 3), and more that there isn't a clear API to interact with the sever in any way (step 2).

## Analysis

All together I think these 3 problems somewhat inhibit anyone from meaningfully querying things without writing Haskell code and integrating with the build system.

Primarily, I think the biggest issue here is that there are no end-user tools that can provide machine-readable information about the shape of an Angle schema, notably for production and consumption of facts. If I could at least *produce* the proper facts from my build system, I can start by writing bunch of tests using only the `glean` CLI, which is good enough to get going. But right now, it's hard to even do that.

I *think* that the most straightforward path here might be to follow in the footsteps of how something like how Cap'n Proto, AKA [`capnp`](https://capnproto.org/otherlang.html#how-to-write-compiler-plugins) does it. Cap'n Proto is an IDL/RPC language like thrift, but the "compiler" that generates code from a schema does not actually generate code itself. When you run something like `capnp compile -o mylang foo.capnp`, the `capnp` command only parses the schema, then hands the encoded parse tree off to the `capnp-mylang` command on standard input. The encoded parse tree has a known format and semantics, so you can implement a standalone program in whatever language you want to do this part.

In Glean's case, there would be some kind of `glean-angle` command that parsed a schema and handed it off to other tools. The existing `gen-schema` would then be wholly specific to Meta (renamed `gen-schema-fb`, perhaps?) and could be implemented in terms of that tool, and then nobody would even need it on the outside.

## Conclusion

There isn't so much a conclusion as a bunch of stuff I wanted to get out there so we're all on the same page.

## Other notes

Some other unassorted notes that I am thinking of on the note of 3rd party use, fact writing, etc.

### REST query interface

I don't really know what provisions there are for doing REST queries against the server, for querying facts. That's probably important. Does the current `hsthrift` integration support something like that? My understanding is that the use of `hsthrift` now is just to keep the code paths between OSS/internal Glean consistent, and that the OSS build of `hsthrift` used here only actually supports JSON interop.

Would this be possible to do, even as a hack, so more languages could be used?

### JSON is bad

JSON is terrible and inefficient beyond belief. It would be nice if, in lieu of using `glean write`, there was some other format I could use to submit e.g. binary blobs directly to a write server endpoint. I believe this was mentioned on Discord, but I figured I'd bring it up here. You could always use CBOR [which has a great library available for it ;)](https://github.com/well-typed/cborg), I hear the authors are pretty nice...

### What about Glass?

My understanding is that Glass consumes the code generated from the existing upstream schemas, and its internal query layer etc is built on that format, and so it probably is not usable with third party schemas. I'm not sure if that understanding is correct or not.

### Haskell client

As you know, I would of course be happy to just write Haskell programs for a lot of this, including the read path, but the client library really needs to be easier to consume for that to happen.

It _feels_ like the Angle library (parser/typechecker/etc) and the client library (reads/writes) should be able to be "cabal-ified" and turned into reusable packages, but I'm not sure how practical or far off that is right now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Meta Issue: 3rd party use (distributing schemas, writing/reading facts, etc) #490

Problem: Generating usable code to create facts

(Minor) Problem: Writing facts

Problem: Reading facts

Analysis

Conclusion

Other notes

REST query interface

JSON is bad

What about Glass?

Haskell client

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Meta Issue: 3rd party use (distributing schemas, writing/reading facts, etc) #490

Description

Problem: Generating usable code to create facts

(Minor) Problem: Writing facts

Problem: Reading facts

Analysis

Conclusion

Other notes

REST query interface

JSON is bad

What about Glass?

Haskell client

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions