Skip to content

Pull requests: modelscope/data-juicer

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[WIP] Python 3.11/3.12 compatibility update
#749 opened Jul 25, 2025 by cmgzn Loading…
[NewOp] Add group_diversity_filter op
#745 opened Jul 22, 2025 by lingzhq Loading…
Update video_split_by_scene_mapper.py
#744 opened Jul 21, 2025 by liuyuhanalex Loading…
Filter Optimization dj:op issues/PRs about some specific OPs enhancement New feature or request
#741 opened Jul 18, 2025 by HYLcool Loading…
Support shard_size and extra args for write methods in export_extra_args for RayExporter dj:core issues/PRs about the core functions of Data-Juicer dj:dist issues/PRs about distributed data processing enhancement New feature or request
#739 opened Jul 17, 2025 by HYLcool Loading…
Add lidar object segmentation op
#736 opened Jul 14, 2025 by Qirui-jiao Loading…
[WIP] add lidar object detection op
#721 opened Jun 26, 2025 by Cathy0908 Loading…
[WIP] Optimization framework dj:core issues/PRs about the core functions of Data-Juicer dj:efficiency regarding to efficiency issues and enhancements
#702 opened Jun 13, 2025 by cyruszhang Loading…
[WIP] fix calculate_np
#679 opened May 22, 2025 by Cathy0908 Loading…
[WIP] deduping benchmark suite
#607 opened Mar 4, 2025 by cyruszhang Loading…
Optimize dedup to avoid oom dj:dist issues/PRs about distributed data processing dj:efficiency regarding to efficiency issues and enhancements dj:tools issues/PRs about specific tools enhancement New feature or request good first issue Good for newcomers
#568 opened Feb 7, 2025 by coolderli Loading…
Add humanvbench operators dj:multimodal issues/PRs about multimodal data processing dj:op issues/PRs about some specific OPs good first issue Good for newcomers
#553 opened Jan 17, 2025 by SYSUzhouting Loading…
Add minhash deduplicator based on RAY and Redis dj:dist issues/PRs about distributed data processing dj:efficiency regarding to efficiency issues and enhancements dj:op issues/PRs about some specific OPs
#489 opened Nov 15, 2024 by pan-x-c Loading…
Automatically split input dataset in ray mode
#415 opened Sep 4, 2024 by pan-x-c Loading…
[WIP]Add text tagging by prompt mapper op dj:op issues/PRs about some specific OPs
#408 opened Aug 30, 2024 by garyzhang99 Loading…
1 task
Add GPT-4V as evaluator dj:multimodal issues/PRs about multimodal data processing enhancement New feature or request stale-pr
#276 opened Mar 22, 2024 by drcege Draft DJ-SORA
ProTip! Updated in the last three days: updated:>2025-07-25.