Skip to content

Replace datasets hasher with python hashlib #8550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chenmoneygithub
Copy link
Collaborator

datasets is no longer a required dep for DSPy, so we should remove its usage from the critical path.

@chenmoneygithub chenmoneygithub requested a review from okhat July 19, 2025 22:01
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the usage of datasets.fingerprint.Hasher with Python's built-in hashlib to remove the dependency on the datasets library, which is no longer required for DSPy.

  • Replaces datasets.fingerprint.Hasher with hashlib.sha256 for hash generation
  • Adds ujson import and usage for JSON serialization before hashing
  • Updates two files that were using the datasets hasher in their critical path

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
dspy/teleprompt/bootstrap.py Replaces datasets hasher with hashlib for random seed generation in bootstrap sampling
dspy/clients/utils_finetune.py Replaces datasets hasher with hashlib for generating unique file names based on data hash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant