-
Notifications
You must be signed in to change notification settings - Fork 70
Description
Glean schemas are dynamically handled at runtime, which is great, but as of today I'm not sure it's practical to write schemas that exist outside of the existing source code repository without some further work. Or maybe it can work, but it's not immediately obvious what I need based on my cursory analysis.
As some background, I'd like to write my own custom schemas, mostly for fun, but also so I can experiment with some ideas like adding code coverage information into Glean ("is this line executed or not?") as its own set of facts. So, what I need to do is:
- Write the schema (in my repo)
- Create facts (somehow) using a program
- Write the facts to the database
- Read the facts and display them in some way.
But the real problems are step 2, 3, and 4:
- How do I create valid facts for my schema?
- How do I write the facts to a write server?
- How do I query them after the fact?
I think these are probably all solvable in a way that is probably still maintainable for y'all, while being usable for outsiders, but it needs a bit of thought.
Problem: Generating usable code to create facts
Facts are (very simplified) just JSON blobs, but the structure of these of the generated facts is determined a priori by the schema. Therefore, I need some way to take my schema and then create facts that are compatible with it.
Today this is handled with code generation via gen-schema
, from what I can tell. The way it works is like this:
- Given the schema, it will generate source code in various languages, which you then import into your program. The API from the generated module is created from the structure of the schema, so your program must generate correct facts "by design"
- You then write a program that does... something... and imports this generated code and uses it to write facts based on something it did.
The problem is that all of the existing code generators are designed to generate code that works inside Meta, to some degree. For example, the Python code generator seems to depend on pythrift3
, while the OCaml generator depends on "supercaml"(?), and the Haskell code generator relies on the Glean codebase itself (import Glean.Types
). This in effect makes them unusable for anyone else.
In short, the problem is that I need some way to get structural information about the shape of the Angle schema, so I can then shovel in facts of the appropriate shape. I think code generation is the right tool for this, but the existing setup isn't usable for anyone else.
(Minor) Problem: Writing facts
This problem is basically a lot simpler to describe: how do I write facts to the database? The basic options I see are:
- Use the
glean-client
Haskell package (somehow, given that it isn't distributed on Hackage) and write a Haskell program to do your bidding. - Use the
glean write
CLI command to write facts by writing JSON to the filesystem and shelling out to a subprocess to do it. - That's all
The problem is that, in general, I'd like write facts in some language-agnostic way. JSON completely sucks, I admit, but it is widely available. However, at the very minimum, I think shelling out to the CLI and using glean write
is an acceptable workaround, and it can do the process of translating the JSON crap on the client side and avoid the server needing to do that itself. So, I'd say this is a relatively small problem for now.
Problem: Reading facts
Similar problem that is a combination of the prior two issues: how do I read facts from a read-only server? Normally you need to:
- Construct an angle query
- Submit it to the server, somehow
- Read the resulting facts
Today, to do this, you can:
- Use
glean-client
and write a Haskell program to do it - That's it?
I realize this is addressed in the docs to some extent, that only the Haskell client exists, but my interest is more that I want to read facts in a relatively general way that will be accessible. I might use Rust, or TypeScript, or Haskell to do this, and the read path is much more varied than the write path, so something here is really important.
I think the biggest problem here is less the shape of the query and the response (steps 1 and 3), and more that there isn't a clear API to interact with the sever in any way (step 2).
Analysis
All together I think these 3 problems somewhat inhibit anyone from meaningfully querying things without writing Haskell code and integrating with the build system.
Primarily, I think the biggest issue here is that there are no end-user tools that can provide machine-readable information about the shape of an Angle schema, notably for production and consumption of facts. If I could at least produce the proper facts from my build system, I can start by writing bunch of tests using only the glean
CLI, which is good enough to get going. But right now, it's hard to even do that.
I think that the most straightforward path here might be to follow in the footsteps of how something like how Cap'n Proto, AKA capnp
does it. Cap'n Proto is an IDL/RPC language like thrift, but the "compiler" that generates code from a schema does not actually generate code itself. When you run something like capnp compile -o mylang foo.capnp
, the capnp
command only parses the schema, then hands the encoded parse tree off to the capnp-mylang
command on standard input. The encoded parse tree has a known format and semantics, so you can implement a standalone program in whatever language you want to do this part.
In Glean's case, there would be some kind of glean-angle
command that parsed a schema and handed it off to other tools. The existing gen-schema
would then be wholly specific to Meta (renamed gen-schema-fb
, perhaps?) and could be implemented in terms of that tool, and then nobody would even need it on the outside.
Conclusion
There isn't so much a conclusion as a bunch of stuff I wanted to get out there so we're all on the same page.
Other notes
Some other unassorted notes that I am thinking of on the note of 3rd party use, fact writing, etc.
REST query interface
I don't really know what provisions there are for doing REST queries against the server, for querying facts. That's probably important. Does the current hsthrift
integration support something like that? My understanding is that the use of hsthrift
now is just to keep the code paths between OSS/internal Glean consistent, and that the OSS build of hsthrift
used here only actually supports JSON interop.
Would this be possible to do, even as a hack, so more languages could be used?
JSON is bad
JSON is terrible and inefficient beyond belief. It would be nice if, in lieu of using glean write
, there was some other format I could use to submit e.g. binary blobs directly to a write server endpoint. I believe this was mentioned on Discord, but I figured I'd bring it up here. You could always use CBOR which has a great library available for it ;), I hear the authors are pretty nice...
What about Glass?
My understanding is that Glass consumes the code generated from the existing upstream schemas, and its internal query layer etc is built on that format, and so it probably is not usable with third party schemas. I'm not sure if that understanding is correct or not.
Haskell client
As you know, I would of course be happy to just write Haskell programs for a lot of this, including the read path, but the client library really needs to be easier to consume for that to happen.
It feels like the Angle library (parser/typechecker/etc) and the client library (reads/writes) should be able to be "cabal-ified" and turned into reusable packages, but I'm not sure how practical or far off that is right now.