A generic "generate" goal for version-controlled generated files #18235
Replies: 3 comments 4 replies
-
| Native ability to define simple targets that write back to the non-sandboxed filesystem (with built-in up-to-date-ness checking) would be great! Some use-cases I've encountered: 
 Some small/bikeshed-y thoughts: 
 | 
Beta Was this translation helpful? Give feedback.
-
| Just recording the workaround we use for this for now: 
 E.g. for a file  # path/to/BUILD
shell_command(
    name="generate-foo",
    command="echo abc > foo-generated.txt",
    tools=["echo"],
    output_files=["foo-generated.txt"],
)
files(name="foo", sources=["foo.txt"])
run_shell_command(
    name="write-foo",
    # different file name so that they can be compared in the test
    command="cp {chroot}/path/to/foo-generated.txt foo.txt",
    execution_dependencies=[":generate-foo"]
)
experimental_test_shell_command(
    # if this fails, run `pants run path/to:write-foo`
    name="check-foo-is-up-to-date",
    command="diff foo.txt foo-generated.txt",
    execution_dependencies=[":generate-foo", ":foo"],
) | 
Beta Was this translation helpful? Give feedback.
-
| I would also add that it would be useful to have outputs of tests be output as well. For example some of them run long running computations which I can't rerun all the time to get the outputs. Normally I write them out as one "test" and then have other acceptance tests that just assume the data was already generated locally. (Additionally I version control them with a data version control system (dvc)). | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
For a variety of reasons (many of which could probably be summarized as "legacy"), some generated files need to be committed to version control.
Since these files need to be in version control, it would be great if pants could make it easier to manage them. Here are the goals I have for whatever solution we use:
A survey of current pants goals
goal:
export-codegenAt first blush, codegen seems ideal for making generated files. But codegen in pants is not designed for version controlled files.
When pants generates code, it is considered an internal by-product as described in the PR that introduced the
export-codegengoal:The
export-codegengoal makes these generated files available indist/for those who want to:So, we need a different form of codegen in pants for files that should be version controlled.
goals: lint, fmt, fix
Version controlled files are the bread and butter of
lint,fmt, andfixwhich is opposite ofexport-codegenwhich is for files that are not version controlled.Fixers and formatters also run during the
lintgoal so that lint fails if they need to fix or format a file.Code generation of files that are version controlled is remarkably similar to
fmtandfix. Files that need to be changed are advertised via thelintgoal. Logically, fmt and fix are very different than codegen - but the process, the UX, and the underlying rules seem quite similar to me.In the StackStorm project, I've shoved my (version controlled) file generation under the
fmtumbrella even though it's not really a formatter. Logically this generation is also not a fixer. But, one of the benefits of doing this is that: lint will fail if the generator needs to regenerate the file(s) and it will provide an error message that explains how to run./pants fmt ...to resolve the lint error.I've heard of others that create a "test" that fails if the generated file(s) were not properly regenerated. I really like using
lintfor this to take advantage of an the fine grained caching and reusing the cache for both lint and fmt/fix.goal: generate-lockfiles
Another goal that is very similar to the file generation feature I'm looking for is the
generate-lockfilesgoal. The warnings/errors saying to regenerate can be triggered in many places where rules need to use the lockfiles. So, there is no need to integrate that warning in a lint backend/rule. The goal also provides a fairly ergonomic interface for regenerating those.generate-lockfiles looks up the resolves, translates them into lockfile paths, and then generates the lockfile at that path.
Proposal for a new "generate" goal
If there were a generic "generate" goal, it could go in the opposite direction of
generate-lockfiles: given the path of a lockfile, look up the resolve associated with that lockfile, and then generate the lockfile at that path.Similarly, that "generate" goal could take a path to another generated file, look up owning target for it, and if that target is explicitly for generated files, run the plugin/subsystem/tool that encompasses the codegen logic for that target.
This goal could be only for non-lockfile generated files. If we did include lockfile generation in this goal, then the owning target would probably be a synthetic target synthesized from the config in pants.toml -- so it is explicitly defined (even though not defined in a BUILD file).
This goal should be modeled on fix/fmt (not modeled on export-codegen). This would extend the lint request/result classes (like fix/fmt do) so that file generation happens in a sandbox during lint, failing lint if something should be regenerated. When lint fails here, the error message would prompt the user to run
./pants generate path/to/generated/file/or/dir.For a plugin to provide this generate rule, it would either have to add a custom target, or add a field to an existing target to allow flagging it as owning generated files. This would be a codified assumption built into the
generategoal: all generated files must be owned by exactly 1 file generating target: If unowned, they will not be materialized in the workspace; If owned by more than one file generating targets, raise an error. Generating a file that matches a file glob for that target counts for this.Then, if any other linters can work with a generated files the standard
skip_toolfields provide an escape hatch if the generated code does not need to meet the same standards as other files. Or, for StackStorm, I can add an extra linter specific to the generated file that checks more than just whether or not it needs to be regenerated.Goal: run
A lot of work has gone into runnable targets like
adhoc_toolandshell_command. If combined with the generate goal, those could allow repos to use light-weight run targets and file generator targets instead of writing a medium- or heavy-weight pants plugins.So, a runnable target could be tied to the file generating target such that the adhoc_tool or pex_binary is used to actually do the generation when a user does
./pants generate path/to/generated/file.For the
rungoal, the UX focuses on the generator script/target. For this "generate" goal, the UX focuses on the files to generate. So they are complementary.goal: go-generate
#16909
This seems very similar and could perhaps be rolled into the new "generate" goal. go-generate references the go modules that define the generation. I'm looking at something that targets the generated files. But, often go:generate commands work within the same module where they are defined, so maybe it is effectively the same thing.
But, this is something that would not run via Lint. So, that would probably be a go backend option to skip running for lint is the goals were combined.
discussion
Actually naming the new "generate" goal is a different discussion. When we discuss that, we'll need to make sure it can be distinguished from the codegen that already exists in pants.
So, would a generic generate goal be helpful for anyone else? Does a generate goal that follows a process similar to fix/fmt make sense for other projects?
Beta Was this translation helpful? Give feedback.
All reactions