Skip to content

[IR2Vec] Add llvm-ir2vec tool for generating triplet embeddings #147842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: users/svkeerthy/07-09-_nfc_ir2vec_exposing_helpers_in_ir2vec_vocabulary
Choose a base branch
from

Conversation

svkeerthy
Copy link
Contributor

@svkeerthy svkeerthy commented Jul 9, 2025

Add a new LLVM tool llvm-ir2vec. This tool is primarily intended to generate triplets for training the vocabulary (#141834) and to potentially generate the embeddings in a stand alone manner.

This PR introduces the tool with triplet generation functionality. In the upcoming PRs I'll add scripts under utils/mlgo to complete the vocabulary tooling. #147844 adds embedding generation logic to the tool.

(Tracking issue - #141817)

@svkeerthy svkeerthy changed the title IR2Vec Tool [IR2Vec] Add llvm-ir2vec tool for generating triplet embeddings Jul 9, 2025
@svkeerthy svkeerthy marked this pull request as ready for review July 9, 2025 22:54
Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI failures look relevant.

It would also be good to add a documentation file to the command guide. I would be fine with that happening in a separate patch though to keep the review focused.

Also, what's the point of generating these triplets? Nothing immediately springs to mind on how they would be useful.

Copy link
Contributor Author

svkeerthy commented Jul 10, 2025

The CI failures look relevant.

Thanks! Looking into it.

It would also be good to add a documentation file to the command guide. I would be fine with that happening in a separate patch though to keep the review focused.

Yes, will add it in the next patch.

Also, what's the point of generating these triplets? Nothing immediately springs to mind on how they would be useful.

The triplets collected on various ll files act as the corpus for training the vocabulary. I shall add the helper scripts for training in the subsequent patches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants