-
Notifications
You must be signed in to change notification settings - Fork 102
feat: support Ollama with synthetic-data-kit #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @rachittshah! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Thank you SO MUCH for the instant PR, Rachit! I will test and review later today/tomorrow |
Sounds good Sanyam! Let me know if there's any changes needed, and we can pickup SGLang next :D |
@init27 did you get a chance to review this PR? |
@init27 |
Thanks for the follow up @rachittshah and @dhuebner! I was planning to add CI/CD tests to main but it's taking a bit, so let me review the PR now please. Thanks again for the reminder! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great, I did a quick review and test:
- We need to fix
cli.py
to work with ollama/vLLM to makecurate
functions work - After this we need to re-test with
curate
function
Out of curiosity, why focus on individual backends when it could support openAI compatible APIs instead, which would unlock a lot more backends, including ollama? |
@arty-hlr great question and valid point, I think the main reason being we want to focus on llama model supports. IIRC, some of the model endpoints/providers are not OAI compatible and ideally we'd like to maximize support for them hence this approach. But open to your suggestions like always. |
@init27 I just tried synthetic-data-kit with Ollama over api-endpoint and it worked well. Except of some problems when running in a Notebook, I was able to create and curate some data. Is the new version with |
@dhuebner thanks for the feedback and trying! Can you please open an issue or share the details of the problems with notebook please, I can take a look. I've put out an alpha version: https://pypi.org/project/synthetic-data-kit/0.0.4a0/ |
@init27 what do you need from me to merge this PR? |
Ollama Backend Integration
Resolves #3
Changes
BaseLLMBackend
classOllamaBackend
with OpenAI-compatible API supportLLMClient
to support backend selection (vLLM/Ollama)system-check
command to display available Ollama modelsTesting
Documentation
Migration
No breaking changes. Existing vLLM users can continue using the default backend.