-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[mlgo-utils] Hoist entrypoint scripts to mlgo-utils directory #146981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-mlgo Author: Vincent Lee (thevinster) ChangesThese scripts belong in the
Full diff: https://github.com/llvm/llvm-project/pull/146981.diff 4 Files Affected:
diff --git a/llvm/utils/mlgo-utils/mlgo/corpus/combine_training_corpus.py b/llvm/utils/mlgo-utils/combine_training_corpus.py
similarity index 100%
rename from llvm/utils/mlgo-utils/mlgo/corpus/combine_training_corpus.py
rename to llvm/utils/mlgo-utils/combine_training_corpus.py
diff --git a/llvm/utils/mlgo-utils/mlgo/corpus/extract_ir.py b/llvm/utils/mlgo-utils/extract_ir.py
similarity index 100%
rename from llvm/utils/mlgo-utils/mlgo/corpus/extract_ir.py
rename to llvm/utils/mlgo-utils/extract_ir.py
diff --git a/llvm/utils/mlgo-utils/mlgo/corpus/make_corpus.py b/llvm/utils/mlgo-utils/make_corpus.py
similarity index 100%
rename from llvm/utils/mlgo-utils/mlgo/corpus/make_corpus.py
rename to llvm/utils/mlgo-utils/make_corpus.py
diff --git a/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel b/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
index b618c74c19da1..db8a92fd25de6 100644
--- a/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
+++ b/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
@@ -5210,8 +5210,8 @@ py_binary(
py_binary(
name = "extract_ir",
srcs = [
+ "utils/mlgo-utils/extract_ir.py",
"utils/mlgo-utils/mlgo/__init__.py",
- "utils/mlgo-utils/mlgo/corpus/extract_ir.py",
"utils/mlgo-utils/mlgo/corpus/extract_ir_lib.py",
"utils/mlgo-utils/mlgo/corpus/flags.py",
],
@@ -5221,8 +5221,8 @@ py_binary(
py_binary(
name = "combine_training_corpus",
srcs = [
+ "utils/mlgo-utils/combine_training_corpus.py",
"utils/mlgo-utils/mlgo/__init__.py",
- "utils/mlgo-utils/mlgo/corpus/combine_training_corpus.py",
"utils/mlgo-utils/mlgo/corpus/combine_training_corpus_lib.py",
"utils/mlgo-utils/mlgo/corpus/flags.py",
],
@@ -5232,8 +5232,8 @@ py_binary(
py_binary(
name = "make_corpus",
srcs = [
+ "utils/mlgo-utils/make_corpus.py",
"utils/mlgo-utils/mlgo/__init__.py",
- "utils/mlgo-utils/mlgo/corpus/make_corpus.py",
"utils/mlgo-utils/mlgo/corpus/make_corpus_lib.py",
],
imports = ["utils/mlgo-utils"],
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested that the binaries still work like this when installed through the pip package? That's the primary way I've been using them.
Why exactly does this need to be changed though? I don't know if the external definitions have ever been tested, and the internal (to Google) rules for these binaries are a bit different because you can't import a python project with a hyphen in it (like llvm-project
), but it works just fine after symlinking all the files to a different directory in the same structure.
I didn't realize these could also be installed through pip. From the looks of it, there's a top level wrapper that I can see that essentially just imports the main file and then executes it. This is the model that lit has well, with the exception that the source tree doesn't contain that wrapper in llvm that pip has created.
The purpose of moving is not related to the script itself but more related to the structure of those files. The context here is that there's another build system (buck) which has very similar internal mechanics to bazel (both operate on Starlark). We're trying to experiment by converting bazel rules to buck rules so we can also be able to build llvm and friends with buck. It would be great if we can rely on the bazel definitions as is to build the same target. Unfortunately, with buck, we'd have to execute the binary as if it was invoked by the interpreter directly (which you can see in the summary). This PR would allow us to converge on one implementation that bazel and buck could use. If this is breaking some other mechanism, we can always do other internal hacks to get it working. I figured I'd try to see if we can have harmony first between the build systems first before resorting to other solutions. |
Can you build the package locally, install it, and test that the binaries work as expected?
I don't think this should break anything on the bazel/blaze side. I'm fine with this patch going in if it doesn't break any functionality in the pip package. |
✅ With the latest revision this PR passed the Python code formatter. |
@boomanaiden154 To avoid breaking any of the pip packaging functionality, I added the same top level wrapper referencing those same files that pip install also does. With this, the bazel changes will reference these top level wrappers instead which fixes our side without needing to worry if it breaks pip packaging. |
ignore this, missing context here for a quick answer :) (it's not a problem of sub-packages given the directory name has a slash inside which cannot be a python package name...) |
I would prefer to not have the wrappers in tree if possible.
I'm not really understanding how this constraint leads to these changes. Are there constraints on the CWD of the interpreter or something? |
An alternative is to have a symlink of the script to the mlgo-utils directory. This avoids the need to have to change two places if the script were to change. Otherwise, I'm open to other ideas on how we can fix it.
It's not the cwd of the interpreter. It's just placed incorrectly if invoked directly from python3. The file imports from |
I think a symlink would be fine for now.
Buck doesn't do anything to
|
With the symlink solution, I needed to move the scripts to
Due to constraints out of my reach, I don't have the ability to modify the environment. Even if I could, it's generally discouraged as build systems are meant to handle paths and dependencies internally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave the files in their original locations and create the symlinks in the root directory.
These scripts belong in the
mlgo-utils
directory when directly used with python3. But since they are also used to package with pip, hoist the entrypoint scripts to mlgo-utils directory. Adjust the bazel paths to account for this as well. This follows the same structure as lit.Verified that I was also able to build the package successfully and use the script.