pure-text data format

I'm curious about the data format of pure-text used in pre-training in DeepSeek-VL2 but I haven't found specific details.
In LLM pre-training, the loss is typically computed over all tokens. However, in traditional MLLM pre-training, the data format often resembles a multi-turn QA setup, and only the answer tokens are used for loss computation — the system prompt and question are usually masked.
So, what approach does DeepSeek-VL2 use? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pure-text data format #130

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pure-text data format #130

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions