Skip to content

Commit effc955

Browse files
trvachovlvojtku
andauthored
Docs fix (#826)
### Description ### For the Geneformer documentation: 1. **Capitalization standardization**: - Fixed capitalization of "BioNeMo", "Geneformer", "HuggingFace", "ReLU", "BERT MLM" - Corrected spelling of "Crohn's disease" (previously "Chron's disease") - Fixed "children" (previously "chidlren") 2. **Formatting improvements**: - Properly formatted model version bullet points with nesting - Added proper headings for property categories - Fixed displayed values (e.g., ".5M" → "0.5M") - Standardized formatting of data collection/labeling methods sections 3. **Image captions**: - Replaced low-quality image captions with descriptive, properly formatted titles - Made chart descriptions more professional and consistent 4. **Grammatical improvements**: - Fixed article usage and punctuation - Improved sentence structure and clarity - Fixed section headings capitalization and consistency 5. **Fixed broken notes**: - Corrected `!! note` to `!!! note` for proper rendering ### For the ESM-2 pretraining documentation: 1. **Grammar and clarity improvements**: - Fixed article usage ("a ESM-2" → "an ESM-2") - Fixed formatting of numeric values (e.g., "1." → "1.0") - Fixed typos ("depreciation" → "deprecation") - Fixed "trainiing" → "training" 2. **Consistency in terminology**: - Standardized "BioNeMo" capitalization - Ensured consistent treatment of "ESM-2" references 3. **Structure and formatting**: - Improved spacing and paragraph breaks - Fixed section formatting and readability ### For the training-models documentation: 1. **Capitalization and consistency**: - Standardized capitalization of model sizes (8M, 650M, 3B) - Fixed capitalization of "ESM2", "Geneformer", "Python", "YAML" - Changed "WandB" to "Weights and Biases" consistently 2. **Formatting improvements**: - Changed code blocks consistently to include language tags - Added proper spacing and improved paragraph formatting - Fixed punctuation in lists and note sections 3. **Grammar and clarity**: - Added missing commas after introductory phrases - Fixed formatting of lists for better readability - Made bulleted explanations more consistent ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [x] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Timur Rvachov <trvachov@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com> Co-authored-by: lvojtku <lvojtku@nvidia.com>
1 parent c6cb24a commit effc955

File tree

11 files changed

+137
-145
lines changed

11 files changed

+137
-145
lines changed

docs/docs/datasets/CELLxGENE.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@
66

77
## Dataset attributes of version 2023-12-15
88

9-
Data was downloaded using the [CELLxGENE Discover Census version `2023-12-15`](https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_data_release_info.html#lts-2023-12-15). We first downloaded cellxgene census version 2023-12-15 using the `cellxgene_census` python API. We limited cell data to `organism=”Homo sapiens”`, with a non “na” `suspension_type`, `is_primary_data=True`, and `disease=”normal”` to limit to non-diseased tissues that are also the primary data source per cell to make sure that cells are only included once in the download. We tracked metadata including “assay”, “sex”, “development_stage”, “tissue_general”, “dataset_id” and “self_reported_ethnicity”. The metadata “assay”, “tissue_general”, and “dataset_id” were used to construct dataset splits into train, validation, and test sets. The training set represented 99% of the downloaded cells. We partitioned the data by dataset_id into a train set (99%) and a hold-out set (1%), to make sure that the hold-out datasets were independently collected single cell experiments, which helps evaluate generalizability to new future datasets. In this training split, we made sure that all “assay” and “tissue_general” labels were present in the training set so that our model would have maximal visibility into different tissues and assay biases. Finally the 1% hold-out set was split further into a validation and test set. This final split was mostly done randomly by cell, however we set aside a full dataset into the test split so that we could evaluate performance after training on a completely unseen dataset, including when monitoring the validation loss during training.
9+
Data was downloaded using the [CELLxGENE Discover Census version `2023-12-15`](https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_data_release_info.html#lts-2023-12-15). We first downloaded CELLxGENE census version 2023-12-15 using the `cellxgene_census` python API. We limited cell data to `organism="Homo sapiens"`, with a non "na" `suspension_type`, `is_primary_data=True`, and `disease="normal"` to limit to non-diseased tissues that are also the primary data source per cell to make sure that cells are only included once in the download. We tracked metadata including "assay", "sex", "development_stage", "tissue_general", "dataset_id" and "self_reported_ethnicity". The metadata "assay", "tissue_general", and "dataset_id" were used to construct dataset splits into train, validation, and test sets. The training set represented 99% of the downloaded cells. We partitioned the data by dataset_id into a train set (99%) and a hold-out set (1%), to make sure that the hold-out datasets were independently collected single cell experiments, which helps evaluate generalizability to new future datasets. In this training split, we made sure that all "assay" and "tissue_general" labels were present in the training set so that our model would have maximal visibility into different tissues and assay biases. Finally the 1% hold-out set was split further into a validation and test set. This final split was mostly done randomly by cell, however we set aside a full dataset into the test split so that we could evaluate performance after training on a completely unseen dataset, including when monitoring the validation loss during training.
1010

11-
These parameters resulted in 23.87 Million single cells collected from a variety of public datasets, all hosted by CZI cell x gene census. After the splitting procedure we had:
11+
These parameters resulted in 23.87 Million single cells collected from a variety of public datasets, all hosted by CZI CELLxGENE census. After the splitting procedure we had:
1212

1313
- 23.64 Million cells in the training split
1414
- 0.13 Million cells in the validation split
@@ -53,11 +53,11 @@ Different assays have different ranges of reported gene measurements. On the low
5353

5454
#### Dataset distribution
5555

56-
Dataset (eg a publication that produces data and uploads to cellxgene) leads to known batch effects due to different handling proceedures, collection procedures, etc. We stratify our training vs hold-out split by this covariate for this reason. Exploring the breakdown of datasets we see that the top 10 datsets represent approximately 10 million cells of the full cellxgene datset. The largest dataset alone has 4 million cells.
56+
Dataset (for example, a publication that produces data and uploads to CELLxGENE) leads to known batch effects due to different handling procedures, collection procedures, and more. Hence, we stratify our training rather than hold out split by this covariate. Exploring the breakdown of datasets, we see that the top 10 datasets represent approximately 10 million cells of the full CELLxGENE dataset. The largest dataset alone has 4 million cells.
5757

5858
![Top datasets make up a large fraction of cells](../assets/old_images/cellxgene/num_cells_by_dataset.png)
5959

60-
Looking at the makeup of these top datasets, we see that most represent single tissue categories predominately. Most of these tend to be nervous system datsets with the exception of one which is balanced between many cell types.
60+
Looking at the makeup of these top datasets, we see that they represent single tissue categories predominately. Most of these tend to be nervous system datasets, with the exception of one that is balanced between many cell types.
6161
![Top 9 datasets are largely biased toward single cell types](../assets/old_images/cellxgene/top9_datasets_tissue_distribution.png)
6262

6363
## References
@@ -87,7 +87,7 @@ Our training, validation and test data, including subsets made available for tes
8787
* Publication Reference: Cheng et al. (2018) Cell Reports; Publication: https://doi.org/10.1016/j.celrep.2018.09.006 Dataset Version: https://datasets.cellxgene.cziscience.com/912d943b-9060-4fd3-a12c-ad641a89f0e4.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/43d4bb39-21af-4d05-b973-4c1fed7b916c
8888
* Publication Reference: Cowan et al. (2020) Cell; Publication: https://doi.org/10.1016/j.cell.2020.08.013 Dataset Version: https://datasets.cellxgene.cziscience.com/b1989183-5808-46ab-87f5-978febb2d26e.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/2f4c738f-e2f3-4553-9db2-0582a38ea4dc
8989
* Publication Reference: Cowan et al. (2020) Cell; Publication: https://doi.org/10.1016/j.cell.2020.08.013 Dataset Version: https://datasets.cellxgene.cziscience.com/c0d3867e-1a7b-4e57-af62-c563f1934226.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/2f4c738f-e2f3-4553-9db2-0582a38ea4dc
90-
* Publication Reference: Dom\u00ednguez Conde et al. (2022) Science; Publication: https://doi.org/10.1126/science.abl5197 Dataset Version: https://datasets.cellxgene.cziscience.com/08f58b32-a01b-4300-8ebc-2b93c18f26f7.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/62ef75e4-cbea-454e-a0ce-998ec40223d3
90+
* Publication Reference: Domínguez Conde et al. (2022) Science; Publication: https://doi.org/10.1126/science.abl5197 Dataset Version: https://datasets.cellxgene.cziscience.com/08f58b32-a01b-4300-8ebc-2b93c18f26f7.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/62ef75e4-cbea-454e-a0ce-998ec40223d3
9191
* Publication Reference: Easter et al. (2024) Nat Commun; Publication: https://doi.org/10.1038/s41467-024-49037-y Dataset Version: https://datasets.cellxgene.cziscience.com/221dff56-a47d-4563-90ed-51b60e2f16d5.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/71f4bccf-53d4-4c12-9e80-e73bfb89e398
9292
* Publication Reference: Egozi et al. (2021) Nat Med; Publication: https://doi.org/10.1038/s41591-021-01586-1 Dataset Version: https://datasets.cellxgene.cziscience.com/e3a84fef-b6df-49b2-b0ca-ecaf444773ec.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/7651ac1a-f947-463a-9223-a9e408a41989
9393
* Publication Reference: Elmentaite et al. (2020) Developmental Cell; Publication: https://doi.org/10.1016/j.devcel.2020.11.010 Dataset Version: https://datasets.cellxgene.cziscience.com/3aedefc0-401a-4ee8-a1b5-a0ffc20e1ff2.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/17481d16-ee44-49e5-bcf0-28c0780d8c4a
@@ -282,7 +282,7 @@ Our training, validation and test data, including subsets made available for tes
282282
* Publication Reference: Smillie et al. (2019) Cell; Publication: https://doi.org/10.1016/j.cell.2019.06.029 Dataset Version: https://datasets.cellxgene.cziscience.com/6c483976-30de-4835-97f0-2b9bc93614e7.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/33d19f34-87f5-455b-8ca5-9023a2e5453d
283283
* Publication Reference: Smith et al. (2021) Proc. Natl. Acad. Sci. U.S.A.; Publication: https://doi.org/10.1073/pnas.2023333118 Dataset Version: https://datasets.cellxgene.cziscience.com/bf50dbfb-9ca0-4f0d-8deb-a1a810a0e313.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e02201d7-f49f-401f-baf0-1eb1406546c0
284284
* Publication Reference: Smith et al. (2021) Proc. Natl. Acad. Sci. U.S.A.; Publication: https://doi.org/10.1073/pnas.2023333118 Dataset Version: https://datasets.cellxgene.cziscience.com/ff7778bf-7a65-4d23-a9f4-b26c47926c28.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e02201d7-f49f-401f-baf0-1eb1406546c0
285-
* Publication Reference: Sol\u00e9-Boldo et al. (2020) Commun Biol; Publication: https://doi.org/10.1038/s42003-020-0922-4 Dataset Version: https://datasets.cellxgene.cziscience.com/bc8d7152-3b69-4153-9314-7342ae58fbde.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/c353707f-09a4-4f12-92a0-cb741e57e5f0
285+
* Publication Reference: Solé-Boldo et al. (2020) Commun Biol; Publication: https://doi.org/10.1038/s42003-020-0922-4 Dataset Version: https://datasets.cellxgene.cziscience.com/bc8d7152-3b69-4153-9314-7342ae58fbde.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/c353707f-09a4-4f12-92a0-cb741e57e5f0
286286
* Publication Reference: Stephenson et al. (2021) Nat Med; Publication: https://doi.org/10.1038/s41591-021-01329-2 Dataset Version: https://datasets.cellxgene.cziscience.com/46586a98-b75d-4557-9cc4-839fc28e67d5.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/ddfad306-714d-4cc0-9985-d9072820c530
287287
* Publication Reference: Stewart et al. (2019) Science; Publication: https://doi.org/10.1126/science.aat5031 Dataset Version: https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/120e86b4-1195-48c5-845b-b98054105eec
288288
* Publication Reference: Stewart et al. (2019) Science; Publication: https://doi.org/10.1126/science.aat5031 Dataset Version: https://datasets.cellxgene.cziscience.com/fe7e4408-7390-4f93-95aa-ffe472843421.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/120e86b4-1195-48c5-845b-b98054105eec

docs/docs/datasets/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ The BioNeMo Framework provides access to a variety of high-quality datasets for
44

55
| **Dataset** | **Modality** | **Uses** |
66
| -------------------------------------------------------- | -------------- | ------------------------------------------------ |
7-
| [CELLxGENE](./CELLxGENE.md) | Single Cell | Single-Cell Gene Expression
7+
| [CELLxGENE](./CELLxGENE.md) | Single Cell | Single-Cell Gene Expression |
88
| [UniProt](./uniprot.md) | Protein | Protein Sequence and Function Analysis |
99

1010
For more information about the datasets included in the BioNeMo Framework, refer to the Dataset Cards linked in the table above or the original sources referenced in the respective dataset descriptions.

docs/docs/datasets/uniprot.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ randomly chosen UniRef90 sequence from each.
2121

2222
## Data Availability
2323

24-
Two versions of the dataset are distributed, a full training dataset (~80Gb) and a 10,000 UniRef50 cluster random slice
25-
(~150Mb). To load and use the sanity dataset, the [bionemo.core.data.load][bionemo.core.data.load.load] function
26-
can be used to materialize the sanity dataset in the BioNeMo2 cache directory:
24+
Two versions of the dataset are distributed, a full training dataset (~80GB) and a 10,000 UniRef50 cluster random slice
25+
(~150MB). To load and use the sanity dataset, use the [bionemo.core.data.load][bionemo.core.data.load.load] function
26+
to materialize the sanity dataset in the BioNeMo2 cache directory:
2727

2828
```python
2929
from bionemo.core.data.load import load
@@ -36,7 +36,7 @@ sanity_data_dir = load("esm2/testdata_esm2_pretrain:2.0")
3636
* [Sanity Dataset](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/resources/esm2_pretrain_nemo2_testdata/files)
3737
* [Full Dataset]
3838

39-
## Reference
39+
## References
4040

4141
1. UniProt Consortium. (2023). UniProt: The universal protein knowledgebase in 2023. Nucleic Acids Research, 51(D1),
4242
D523–D531. doi:10.1093/nar/gkac1052

docs/docs/models/ESM-2/index.md

Lines changed: 20 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ These models are ready for commercial use.
1414

1515
### Third-Party Community Consideration
1616

17-
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-partys requirements
17+
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements
1818
for this application and use case [1]; see link to [Non-NVIDIA Model Card for ESM-2 3B model](
19-
https://huggingface.co/facebook/esm2_t36_3B_UR50D) and [non-NVIDIA Model Card for ESM-2 650M model](
19+
https://huggingface.co/facebook/esm2_t36_3B_UR50D) and [Non-NVIDIA Model Card for ESM-2 650M model](
2020
https://huggingface.co/facebook/esm2_t33_650M_UR50D)
2121

2222
### References
@@ -27,7 +27,7 @@ Santos Costa, A., 2023. Evolutionary-scale prediction of atomic-level protein st
2727

2828
[2] "UniProt: the universal protein knowledgebase in 2021." Nucleic acids research 49, no. D1 (2021): D480-D489.
2929

30-
[3] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for
30+
[3] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. BERT: Pre-training of deep bidirectional transformers for
3131
language understanding. arXiv preprint arXiv:1810.04805.
3232

3333
### Model Architecture
@@ -47,7 +47,7 @@ length 1022. Longer sequences are automatically truncated to this length.
4747

4848
### Output
4949

50-
**Output Type(s):** Embeddings (Amino-acid and sequence-level)
50+
**Output Type(s):** Embeddings (Amino acid and sequence-level)
5151

5252
**Output Parameters:** 1D
5353

@@ -63,15 +63,15 @@ acid.
6363

6464
**Supported Hardware Microarchitecture Compatibility**
6565

66-
* [Ampere]
67-
* [Hopper]
68-
* [Volta]
66+
* NVIDIA Ampere
67+
* NVIDIA Hopper
68+
* NVIDIA Volta
6969

7070
**[Preferred/Supported] Operating System(s)**
7171

72-
* [Linux]
72+
* Linux
7373

74-
### Model Version(s)
74+
### Model Versions
7575

7676
* [esm2/650m:2.0](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/esm2nv650m)
7777
* [esm2/3b:2.0](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/esm2nv3b)
@@ -81,30 +81,30 @@ acid.
8181
### Training Dataset
8282

8383
Original ESM-2 checkpoints from HuggingFace were trained with the UniProt 2021_04 sequence database. For more details on
84-
the training dataset, see Lin *et al.* 2023. The train / test splits used by the original authors were not distributed.
84+
the training dataset, see Lin *et al.* 2023. The train/test splits used by the original authors were not distributed.
8585
A pre-training database compiled by NVIDIA following a similar approach is described in [UniProt
86-
Dataset](../datasets/uniprot.md).
86+
Dataset](../../datasets/uniprot.md).
8787

8888
### Inference
8989

9090
**Engine:** BioNeMo, NeMo
9191

9292
**Test Hardware**
9393

94-
* [Ampere]
95-
* [Hopper]
96-
* [Volta]
94+
* NVIDIA Ampere
95+
* NVIDIA Hopper
96+
* NVIDIA Volta
9797

9898
## License
9999

100-
ESM-2 is as provided under the Apache 2.0 license.
100+
ESM-2 is provided under the Apache 2.0 license.
101101

102102
## Competitive Benchmarking
103103

104104
### Accuracy
105105

106106
A validation set of 328,360 UniRef50 representative sequences were randomly selected from UniRef 2024_03 (see [UniProt
107-
Dataset](../datasets/uniprot.md)). This validation set was used to ensure that the output of BioNeMo-converted
107+
Dataset](../../datasets/uniprot.md)). This validation set was used to ensure that the output of BioNeMo-converted
108108
checkpoints is consistent with their outputs when evaluated with the HuggingFace Transformers library.
109109

110110
| Checkpoint | HuggingFace | BioNeMo2 | Lin *et al.* 2023 |
@@ -123,24 +123,23 @@ checkpoints is consistent with their outputs when evaluated with the HuggingFace
123123

124124
![ESM-2 Single-Device Training Performance](../../assets/images/esm2/esm2_single_node_training_perf.png)
125125

126-
The pure-pytorch baseline (compiled with `torch.compile()`) raised an out-of-memory error for batch sizes larger than 16
127-
at the ESM2-650M model size. The `bionemo2` model could handle batch sizes of 46, reaching a model flops utilization of
126+
The pure-PyTorch baseline (compiled with `torch.compile()`) raised an out-of-memory error for batch sizes larger than 16
127+
at the ESM2-650M model size. The `bionemo2` model could handle batch sizes of 46, reaching a model FLOPs utilization of
128128
59.2% on an NVIDIA A100.
129129

130130
#### Model Scaling
131131

132132
![ESM-2 Model Scaling](../../assets/images/esm2/esm2_model_scaling.png)
133133

134134
Training ESM-2 at the 650M, 3B, and 15B model variants show improved performance with the BioNeMo2 framework over the
135-
pure-pytorch baseline. These experiments were conducted on 16x NVIDIA A100 or 16x NVIDIA H100 GPUs split across two
135+
pure-PyTorch baseline. These experiments were conducted on 16x NVIDIA A100 or 16x NVIDIA H100 GPUs split across two
136136
nodes. <sup>*</sup>*Note:* 15B model variants were trained on 64 GPUs with the BioNeMo2 framework.
137137

138138
#### Device Scaling
139139

140140
![ESM-2 Device Scaling](../../assets/images/esm2/esm2_device_scaling.png)
141141

142-
Training ESM-3B on 256 NVIDIA A100s on 32 nodes achieved 96.85% of the theoretical linear throughput expected from
143-
extrapolating single-node (8 GPU) performance, representing a model flops utilization of 60.6% at 256 devices.
142+
Training ESM-3B on 256 NVIDIA A100s on 32 nodes achieved 96.85% of the theoretical linear throughput expected from extrapolating single-node (8 GPU) performance, representing a model flops utilization of 60.6% at 256 devices.
144143

145144
### LoRA Fine-tuning Performace
146145

0 commit comments

Comments
 (0)