使用流式加载报错，传递了额外参数：TypeError: IterableDataset.map() got an unexpected keyword argument 'num_proc'

### Reminder

- [x] I have read the README and searched the existing issues.

### System Info

使用流式加载时，TypeError: IterableDataset.map() got an unexpected keyword argument 'num_proc'
问题出在：https://github.com/Qihoo360/360-LLaMA-Factory/blob/3bc07289eefcf8c8ea05f553e4ef0b82008419e4/src/llamafactory/data/loader.py#L224。
经检查Datasets库中IterableDataset map函数无法接收kwargs中的三个参数：
```     
    kwargs = dict(
        num_proc=data_args.preprocessing_num_workers,
        load_from_cache_file=(not data_args.overwrite_cache) or (training_args.local_process_index != 0),
        desc="Running sequence parallel split on dataset",
    )
```

### Reproduction

开启流式加载即可 --streaming True

### Expected behavior

一般的Dataset map函数可以接收这些参数：
![Image](https://github.com/user-attachments/assets/c08916d9-a047-4023-b9ac-dcb62eb91e47)
流式加载IterableDataset map：
![Image](https://github.com/user-attachments/assets/f662e17d-22c4-4023-92ef-9c60c099272d)

修复方式：只需在 _get_sequence_parallel_dataset 中添加额外的判断逻辑即可，目前我本地运行良好

### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

使用流式加载报错，传递了额外参数：TypeError: IterableDataset.map() got an unexpected keyword argument 'num_proc' #57

Reminder

System Info

Reproduction

Expected behavior

Others

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

使用流式加载报错，传递了额外参数：TypeError: IterableDataset.map() got an unexpected keyword argument 'num_proc' #57

Description

Reminder

System Info

Reproduction

Expected behavior

Others

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions