Skip to content

feat: File hashing and duplicate prevention during import #7765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

dannyball710
Copy link

Feature

This feature introduces the ability to record the SHA256 hash of uploaded files and prevent task creation from files that have a duplicate hash within the same project.

This functionality is controlled by the DATA_UPLOAD_IGNORE_DUPLICATES environment variable. If set to True, files will be hashed, and duplicates (based on content hash) will be ignored during the import process.
If False or not set, the system behaves as before.

Duplicate File Handling:

Database Changes:

Utility Functions:

  • label_studio/data_import/uploader.py: Added the full_files_hash function to calculate hashes for all FileUpload instances missing a hash, ensuring backward compatibility with existing records.

@dannyball710 dannyball710 requested a review from a team as a code owner June 15, 2025 13:43
Copy link

netlify bot commented Jun 15, 2025

👷 Deploy request for label-studio-docs-new-theme pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit bb111f1

Copy link

netlify bot commented Jun 15, 2025

👷 Deploy request for heartex-docs pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit bb111f1

Copy link

netlify bot commented Jun 15, 2025

Deploy Preview for label-studio-playground canceled.

Name Link
🔨 Latest commit bb111f1
🔍 Latest deploy log https://app.netlify.com/projects/label-studio-playground/deploys/684ecdfd62b3fa0008b6a14b

Copy link

netlify bot commented Jun 15, 2025

Deploy Preview for label-studio-storybook canceled.

Name Link
🔨 Latest commit bb111f1
🔍 Latest deploy log https://app.netlify.com/projects/label-studio-storybook/deploys/684ecdfd78eec90008991bb8

@github-actions github-actions bot added the feat label Jun 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant