You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* * change simhash-py to simhash-pybind
+ update docs for new version
* * install pip for unit-test machine explicitly
* * install pip for unit-test machine explicitly
* * update wechat QR code
* * update dynamic QR code for WeChat group
* * update unittest
* add missing dependency
* * update news list
* * update version number
* * update release date
* * bold key content in README_ZH.md like the English version
* * minor changes on ZH docs
* * move infos about discussion groups to the front
Copy file name to clipboardExpand all lines: README.md
+12-4Lines changed: 12 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -33,12 +33,20 @@ This project is being actively updated and maintained, and we will periodically
33
33
If you find Data-Juicer useful for your research or development, please kindly
34
34
cite our [work](#references).
35
35
36
+
Welcome to join our [Slack channel](https://join.slack.com/t/data-juicer/shared_invite/zt-23zxltg9d-Z4d3EJuhZbCLGwtnLWWUDg?spm=a2c22.12281976.0.0.7a8253f30mgpjw), [DingDing group](https://qr.dingtalk.com/action/joingroup?spm=a2c22.12281976.0.0.7a8253f30mgpjw&code=v1,k1,C0DI7CwRFrg7gJP5aMC95FUmsNuwuKJboT62BqP5DAk=&_dt_no_comment=1&origin=11), or WeChat group (scan the QR code below with WeChat) for discussion.
37
+
38
+
<imgsrc="https://img.alicdn.com/imgextra/i3/O1CN01QbwHJa1EV5uZwmU9c_!!6000000000356-2-tps-400-400.png"width = "100"height = "100"alt="QR Code for WeChat group"align=center />
39
+
36
40
37
41
----
38
42
39
43
## News
40
-
-[2023-10-13] Our first data-centric LLM competition begins! Please
41
-
visit the competition's official websites, **FT-Data Ranker** ([1B Track](https://tianchi.aliyun.com/competition/entrance/532157), [7B Track](https://tianchi.aliyun.com/competition/entrance/532158)), for more information.
44
+
-[2024-01-05] We release **Data-Juicer v0.1.3** now!
45
+
In this new version, we support **more Python versions** (3.7-3.10), and support **multimodal** dataset [converting](tools/multimodal/README.md)/[processing](docs/Operators.md) (Including texts, images, and audios. More modalities will be supported in the future).
46
+
Besides, our paper is also updated to [v3](https://arxiv.org/abs/2309.02033).
47
+
48
+
-[2023-10-13] Our first data-centric LLM competition begins! Please
49
+
visit the competition's official websites, FT-Data Ranker ([1B Track](https://tianchi.aliyun.com/competition/entrance/532157), [7B Track](https://tianchi.aliyun.com/competition/entrance/532158)), for more information.
42
50
43
51
-[2023-10-8] We update our paper to the 2nd version and release the corresponding version 0.1.2 of Data-Juicer!
44
52
@@ -98,7 +106,7 @@ Table of Contents
98
106
99
107
## Prerequisites
100
108
101
-
- Recommend Python==3.8
109
+
- Recommend Python>=3.7,<=3.10
102
110
- gcc >= 5 (at least C++14 support)
103
111
104
112
## Installation
@@ -330,7 +338,7 @@ We are in a rapidly developing field and greatly welcome contributions of new
330
338
features, bug fixes and better documentations. Please refer to
331
339
[How-to Guide for Developers](docs/DeveloperGuide.md).
332
340
333
-
Welcome to join our [Slack channel](https://join.slack.com/t/data-juicer/shared_invite/zt-23zxltg9d-Z4d3EJuhZbCLGwtnLWWUDg?spm=a2c22.12281976.0.0.7a8253f30mgpjw), or [DingDing group](https://qr.dingtalk.com/action/joingroup?spm=a2c22.12281976.0.0.7a8253f30mgpjw&code=v1,k1,C0DI7CwRFrg7gJP5aMC95FUmsNuwuKJboT62BqP5DAk=&_dt_no_comment=1&origin=11) for discussion.
341
+
If you have any questions, please join our [discussion groups](README.md).
334
342
335
343
## Acknowledgement
336
344
Data-Juicer is used across various LLM products and research initiatives,
0 commit comments