Skip to content

Commit 2d7f2ac

Browse files
committed
add README
1 parent 9a66467 commit 2d7f2ac

File tree

4 files changed

+45
-7
lines changed

4 files changed

+45
-7
lines changed

.github/CODEOWNERS

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# This is a CODEOWNERS file.
2+
# Each line is a file pattern followed by one or more owners.
3+
4+
# These owners will be the default owners for everything in
5+
# the repo. Unless a later match takes precedence,
6+
# they will be requested for review when someone opens a pull request.
7+
* @tianyu-l @fegin @wwwjn @wconstab
8+
9+
# Exclude the experiments directory by adding a pattern without owners
10+
/torchtitan/experiments/
Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,35 @@
1-
# DeepSeek-V3 in torchtitan
1+
# DeepSeek-V3 in TorchTitan
22

3-
Download tokenizer:
3+
DeepSeek-V3 is a Mixture-of-Experts (MoE) transformer model with Multi-head Latent Attention (MLA) architecture.
44

5-
```
5+
## Setup
6+
7+
### Download Tokenizer
8+
9+
```bash
610
# DeepSeek tokenizer (automatically downloads tokenizer.json and tokenizer_config.json)
711
python scripts/download_tokenizer.py --repo_id deepseek-ai/DeepSeek-V3
812
```
13+
14+
## Training
15+
16+
### Debug Training
17+
18+
```bash
19+
# Quick debug run with small model
20+
CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml" ./run_train.sh
21+
```
22+
23+
### Full Model Training
24+
25+
```bash
26+
# 16B parameter model
27+
CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/deepseek_v3_16b.toml" ./run_train.sh
28+
```
29+
30+
31+
## Supported Features
32+
- FSDP, HSDP
33+
- Activation checkpointing
34+
- Tensor Parallel (TP)
35+
- Expert Parallel (EP)

torchtitan/models/deepseek_v3/train_configs/debug_model.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ lr_min = 0.0
4040

4141
[training]
4242
local_batch_size = 8
43-
seq_len = 2048
43+
seq_len = 4096
4444
max_norm = 1.0 # grad norm clipping
4545
steps = 10
4646
compile = false
@@ -52,6 +52,7 @@ data_parallel_shard_degree = -1
5252
fsdp_reshard_after_forward = "default" # default / never / always
5353
tensor_parallel_degree = 1
5454
enable_async_tensor_parallel = false
55+
expert_parallel_degree = 1
5556

5657
[checkpoint]
5758
enable_checkpoint = false

torchtitan/models/deepseek_v3/train_configs/deepseek_v3_16b.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,8 @@ decay_type = "linear"
3838
lr_min = 0.0
3939

4040
[training]
41-
local_batch_size = 16
42-
seq_len = 2048
41+
local_batch_size = 8
42+
seq_len = 4096
4343
max_norm = 1.0 # grad norm clipping
4444
steps = 100
4545
compile = false
@@ -51,7 +51,7 @@ data_parallel_shard_degree = -1
5151
fsdp_reshard_after_forward = "default" # default / never / always
5252
tensor_parallel_degree = 1
5353
enable_async_tensor_parallel = false
54-
expert_parallel_degree = 2
54+
expert_parallel_degree = 1
5555

5656
[checkpoint]
5757
enable_checkpoint = false

0 commit comments

Comments
 (0)