Skip to content

Commit 156ed20

Browse files
authored
update link (#228)
1 parent 2720113 commit 156ed20

File tree

4 files changed

+38
-38
lines changed

4 files changed

+38
-38
lines changed

README.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
[[中文主页]](README_ZH.md) | [[Docs]](README.md#documentation-index--文档索引-a-namedocumentationindex) | [[API]](https://alibaba.github.io/data-juicer) | [[*DJ-SORA*]](docs/DJ_SORA.md)
1+
[[中文主页]](README_ZH.md) | [[Docs]](#documents) | [[API]](https://alibaba.github.io/data-juicer) | [[*DJ-SORA*]](docs/DJ_SORA.md)
22

33
# Data-Juicer: A One-Stop Data Processing System for Large Language Models
44

@@ -16,8 +16,8 @@
1616

1717

1818

19-
[![Document_List](https://img.shields.io/badge/Docs-English-blue?logo=Markdown)](README.md#documentation-index--文档索引-a-namedocumentationindex)
20-
[![文档列表](https://img.shields.io/badge/文档-中文-blue?logo=Markdown)](README_ZH.md#documentation-index--文档索引-a-namedocumentationindex)
19+
[![Document_List](https://img.shields.io/badge/Docs-English-blue?logo=Markdown)](#documents)
20+
[![文档列表](https://img.shields.io/badge/文档-中文-blue?logo=Markdown)](README_ZH.md#documents)
2121
[![API Reference](https://img.shields.io/badge/Docs-API_Reference-blue?logo=Markdown)](https://alibaba.github.io/data-juicer/)
2222
[![Paper](http://img.shields.io/badge/cs.LG-arXiv%3A2309.02033-B31B1B?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2309.02033)
2323

@@ -45,7 +45,7 @@ In this new version, we support more features for **multimodal data (including v
4545
- ![new](https://img.alicdn.com/imgextra/i4/O1CN01kUiDtl1HVxN6G56vN_!!6000000000764-2-tps-43-19.png) [2024-02-05] Our paper has been accepted by SIGMOD'24 industrial track!
4646
- [2024-01-10] Discover new horizons in "Data Mixture"—Our second data-centric LLM competition has kicked off! Please visit the competition's [official website](https://tianchi.aliyun.com/competition/entrance/532174) for more information.
4747
- [2024-01-05] We release **Data-Juicer v0.1.3** now!
48-
In this new version, we support **more Python versions** (3.7-3.10), and support **multimodal** dataset [converting](tools/multimodal/README.md)/[processing](docs/Operators.md) (Including texts, images, and audios. More modalities will be supported in the future).
48+
In this new version, we support **more Python versions** (3.8-3.10), and support **multimodal** dataset [converting](tools/multimodal/README.md)/[processing](docs/Operators.md) (Including texts, images, and audios. More modalities will be supported in the future).
4949
Besides, our paper is also updated to [v3](https://arxiv.org/abs/2309.02033).
5050

5151
- [2023-10-13] Our first data-centric LLM competition begins! Please
@@ -59,7 +59,7 @@ Table of Contents
5959
* [Data-Juicer: A One-Stop Data Processing System for Large Language Models](#data-juicer-a-one-stop-data-processing-system-for-large-language-models)
6060
* [Table of Contents](#table-of-contents)
6161
* [Features](#features)
62-
* [Documentation Index | 文档索引](#documentation-index--文档索引-a-namedocumentationindex)
62+
* [Documentation Index](#documents)
6363
* [Demos](#demos)
6464
* [Prerequisites](#prerequisites)
6565
* [Installation](#installation)
@@ -111,19 +111,19 @@ Table of Contents
111111

112112

113113

114-
## Documentation Index | 文档索引 <a name="documentationindex"/>
114+
## Documentation Index <a name="documents"/>
115115

116-
- [Overview](README.md) | [概览](README_ZH.md)
117-
- [Operator Zoo](docs/Operators.md) | [算子库](docs/Operators_ZH.md)
118-
- [Configs](configs/README.md) | [配置系统](configs/README_ZH.md)
119-
- [Developer Guide](docs/DeveloperGuide.md) | [开发者指南](docs/DeveloperGuide_ZH.md)
120-
- ["Bad" Data Exhibition](docs/BadDataExhibition.md) | [“坏”数据展览](docs/BadDataExhibition_ZH.md)
121-
- Dedicated Toolkits | 专用工具箱
122-
- [Quality Classifier](tools/quality_classifier/README.md) | [质量分类器](tools/quality_classifier/README_ZH.md)
123-
- [Auto Evaluation](tools/evaluator/README.md) | [自动评测](tools/evaluator/README_ZH.md)
124-
- [Preprocess](tools/preprocess/README.md) | [前处理](tools/preprocess/README_ZH.md)
125-
- [Postprocess](tools/postprocess/README.md) | [后处理](tools/postprocess/README_ZH.md)
126-
- [Third-parties (LLM Ecosystems)](thirdparty/README.md) | [第三方库(大语言模型生态)](thirdparty/README_ZH.md)
116+
- [Overview](README.md)
117+
- [Operator Zoo](docs/Operators.md)
118+
- [Configs](configs/README.md)
119+
- [Developer Guide](docs/DeveloperGuide.md)
120+
- ["Bad" Data Exhibition](docs/BadDataExhibition.md)
121+
- Dedicated Toolkits
122+
- [Quality Classifier](tools/quality_classifier/README.md)
123+
- [Auto Evaluation](tools/evaluator/README.md)
124+
- [Preprocess](tools/preprocess/README.md)
125+
- [Postprocess](tools/postprocess/README.md)
126+
- [Third-parties (LLM Ecosystems)](thirdparty/README.md)
127127
- [API references](https://alibaba.github.io/data-juicer/)
128128
- [Awesome LLM-Data](docs/awesome_llm_data.md)
129129
- [DJ-SORA](docs/DJ_SORA.md)

README_ZH.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
[[English Page]](README.md) | [[文档]](README_ZH.md#documentation-index--文档索引-a-namedocumentationindex) | [[API]](https://alibaba.github.io/data-juicer) | [[*DJ-SORA*]](docs/DJ_SORA_ZH.md)
1+
[[English Page]](README.md) | [[文档]](#documents) | [[API]](https://alibaba.github.io/data-juicer) | [[*DJ-SORA*]](docs/DJ_SORA_ZH.md)
22

33
# Data-Juicer: 为大语言模型提供更高质量、更丰富、更易“消化”的数据
44

@@ -14,8 +14,8 @@
1414
[![ModelScope- Demos](https://img.shields.io/badge/ModelScope-Demos-4e29ff.svg?logo=data:image/svg+xml;base64,PHN2ZyB2aWV3Qm94PSIwIDAgMjI0IDEyMS4zMyIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KCTxwYXRoIGQ9Im0wIDQ3Ljg0aDI1LjY1djI1LjY1aC0yNS42NXoiIGZpbGw9IiM2MjRhZmYiIC8+Cgk8cGF0aCBkPSJtOTkuMTQgNzMuNDloMjUuNjV2MjUuNjVoLTI1LjY1eiIgZmlsbD0iIzYyNGFmZiIgLz4KCTxwYXRoIGQ9Im0xNzYuMDkgOTkuMTRoLTI1LjY1djIyLjE5aDQ3Ljg0di00Ny44NGgtMjIuMTl6IiBmaWxsPSIjNjI0YWZmIiAvPgoJPHBhdGggZD0ibTEyNC43OSA0Ny44NGgyNS42NXYyNS42NWgtMjUuNjV6IiBmaWxsPSIjMzZjZmQxIiAvPgoJPHBhdGggZD0ibTAgMjIuMTloMjUuNjV2MjUuNjVoLTI1LjY1eiIgZmlsbD0iIzM2Y2ZkMSIgLz4KCTxwYXRoIGQ9Im0xOTguMjggNDcuODRoMjUuNjV2MjUuNjVoLTI1LjY1eiIgZmlsbD0iIzYyNGFmZiIgLz4KCTxwYXRoIGQ9Im0xOTguMjggMjIuMTloMjUuNjV2MjUuNjVoLTI1LjY1eiIgZmlsbD0iIzM2Y2ZkMSIgLz4KCTxwYXRoIGQ9Im0xNTAuNDQgMHYyMi4xOWgyNS42NXYyNS42NWgyMi4xOXYtNDcuODR6IiBmaWxsPSIjNjI0YWZmIiAvPgoJPHBhdGggZD0ibTczLjQ5IDQ3Ljg0aDI1LjY1djI1LjY1aC0yNS42NXoiIGZpbGw9IiMzNmNmZDEiIC8+Cgk8cGF0aCBkPSJtNDcuODQgMjIuMTloMjUuNjV2LTIyLjE5aC00Ny44NHY0Ny44NGgyMi4xOXoiIGZpbGw9IiM2MjRhZmYiIC8+Cgk8cGF0aCBkPSJtNDcuODQgNzMuNDloLTIyLjE5djQ3Ljg0aDQ3Ljg0di0yMi4xOWgtMjUuNjV6IiBmaWxsPSIjNjI0YWZmIiAvPgo8L3N2Zz4K)](https://modelscope.cn/studios?name=Data-Jiucer&page=1&sort=latest&type=1)
1515
[![HuggingFace- Demos](https://img.shields.io/badge/🤗HuggingFace-Demos-4e29ff.svg)](https://huggingface.co/spaces?&search=datajuicer)
1616

17-
[![Document_List](https://img.shields.io/badge/Docs-English-blue?logo=Markdown)](README.md#documentation-index--文档索引-a-namedocumentationindex)
18-
[![文档列表](https://img.shields.io/badge/文档-中文-blue?logo=Markdown)](README_ZH.md#documentation-index--文档索引-a-namedocumentationindex)
17+
[![Document_List](https://img.shields.io/badge/Docs-English-blue?logo=Markdown)](README.md#documents)
18+
[![文档列表](https://img.shields.io/badge/文档-中文-blue?logo=Markdown)](#documents)
1919
[![API Reference](https://img.shields.io/badge/Docs-API_Reference-blue?logo=Markdown)](https://alibaba.github.io/data-juicer/)
2020
[![Paper](http://img.shields.io/badge/cs.LG-arXiv%3A2309.02033-B31B1B?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2309.02033)
2121

@@ -40,7 +40,7 @@ Data-Juicer(包含[DJ-SORA](docs/DJ_SORA_ZH.md))正在积极更新和维护
4040
- [2024-01-10] 开启“数据混合”新视界——第二届Data-Juicer大模型数据挑战赛已经正式启动!立即访问[竞赛官网](https://tianchi.aliyun.com/competition/entrance/532174),了解赛事详情。
4141

4242
-[2024-01-05] 现在,我们发布了 **Data-Juicer v0.1.3** 版本!
43-
在这个新版本中,我们支持了**更多Python版本**(3.7-3.10),同时支持了**多模态**数据集的[转换](tools/multimodal/README_ZH.md)[处理](docs/Operators_ZH.md)(包括文本、图像和音频。更多模态也将会在之后支持)。
43+
在这个新版本中,我们支持了**更多Python版本**(3.8-3.10),同时支持了**多模态**数据集的[转换](tools/multimodal/README_ZH.md)[处理](docs/Operators_ZH.md)(包括文本、图像和音频。更多模态也将会在之后支持)。
4444
此外,我们的论文也更新到了[第三版](https://arxiv.org/abs/2309.02033)
4545

4646
- [2023-10-13] 我们的第一届以数据为中心的 LLM 竞赛开始了!
@@ -53,7 +53,7 @@ Data-Juicer(包含[DJ-SORA](docs/DJ_SORA_ZH.md))正在积极更新和维护
5353
* [Data-Juicer: 为大语言模型提供更高质量、更丰富、更易“消化”的数据](#data-juicer-为大语言模型提供更高质量更丰富更易消化的数据)
5454
* [目录](#目录)
5555
* [特点](#特点)
56-
* [Documentation Index | 文档索引](#documentation-index--文档索引-a-namedocumentationindex)
56+
* [文档索引](#documents)
5757
* [演示样例](#演示样例)
5858
* [前置条件](#前置条件)
5959
* [安装](#安装)
@@ -93,20 +93,20 @@ Data-Juicer(包含[DJ-SORA](docs/DJ_SORA_ZH.md))正在积极更新和维护
9393
* **灵活 & 易扩展**:支持大多数数据格式(如jsonl、parquet、csv等),并允许灵活组合算子。支持[自定义算子](docs/DeveloperGuide_ZH.md#构建自己的算子),以执行定制化的数据处理。
9494

9595

96-
## Documentation Index | 文档索引 <a name="documentationindex"/>
97-
98-
* [Overview](README.md) | [概览](README_ZH.md)
99-
* [Operator Zoo](docs/Operators.md) | [算子库](docs/Operators_ZH.md)
100-
* [Configs](configs/README.md) | [配置系统](configs/README_ZH.md)
101-
* [Developer Guide](docs/DeveloperGuide.md) | [开发者指南](docs/DeveloperGuide_ZH.md)
102-
* ["Bad" Data Exhibition](docs/BadDataExhibition.md) | [“坏”数据展览](docs/BadDataExhibition_ZH.md)
103-
* Dedicated Toolkits | 专用工具箱
104-
* [Quality Classifier](tools/quality_classifier/README.md) | [质量分类器](tools/quality_classifier/README_ZH.md)
105-
* [Auto Evaluation](tools/evaluator/README.md) | [自动评测](tools/evaluator/README_ZH.md)
106-
* [Preprocess](tools/preprocess/README.md) | [前处理](tools/preprocess/README_ZH.md)
107-
* [Postprocess](tools/postprocess/README.md) | [后处理](tools/postprocess/README_ZH.md)
108-
* [Third-parties (LLM Ecosystems)](thirdparty/README.md) | [第三方库(大语言模型生态)](thirdparty/README_ZH.md)
109-
* [API references](https://alibaba.github.io/data-juicer/)
96+
## 文档索引 <a name="documents"/>
97+
98+
* [概览](README_ZH.md)
99+
* [算子库](docs/Operators_ZH.md)
100+
* [配置系统](configs/README_ZH.md)
101+
* [开发者指南](docs/DeveloperGuide_ZH.md)
102+
* [“坏”数据展览](docs/BadDataExhibition_ZH.md)
103+
* 专用工具箱
104+
* [质量分类器](tools/quality_classifier/README_ZH.md)
105+
* [自动评测](tools/evaluator/README_ZH.md)
106+
* [前处理](tools/preprocess/README_ZH.md)
107+
* [后处理](tools/postprocess/README_ZH.md)
108+
* [第三方库(大语言模型生态)](thirdparty/README_ZH.md)
109+
* [API 参考](https://alibaba.github.io/data-juicer/)
110110
* [Awesome LLM-Data](docs/awesome_llm_data.md)
111111
* [DJ-SORA](docs/DJ_SORA_ZH.md)
112112

data_juicer/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
__version__ = '0.1.3'
1+
__version__ = '0.2.0'
22

33
import os
44
import subprocess

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ def get_install_requirements(require_f_paths, env_dir='environments'):
5555
name='py-data-juicer',
5656
version=version,
5757
url='https://github.com/alibaba/data-juicer',
58-
author='SysML team of Alibaba DAMO Academy',
58+
author='SysML Team of Alibaba Tongyi Lab',
5959
description='A One-Stop Data Processing System for Large Language '
6060
'Models.',
6161
long_description=readme_md,

0 commit comments

Comments
 (0)