Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Commit 21a6e09

Browse files
authored
Merge pull request #572 from microsoft/staging
Staging to Master
2 parents 150909f + 2bc3203 commit 21a6e09

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+8781
-1773
lines changed

.flake8

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
# E402 module level import not at top of file
1212
# E731 do not assign a lambda expression, use a def
1313
# F821 undefined name 'get_ipython' --> from generated python files using nbconvert
14+
# E722: do not use bare except
15+
# E231: missing white space after "," --> black generates autoformat [,] which fails flake8
16+
ignore = E203, E266, W503, F403, F405, E402, E731, F821, E722, E231
1417

15-
ignore = E203, E266, W503, F403, F405, E402, E731, F821
1618
max-line-length = 88

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ The following is a summary of the commonly used NLP scenarios covered in the rep
5050
|-------------------------| ------------------- |-------|---|
5151
|Text Classification |BERT, XLNet, RoBERTa| Text classification is a supervised learning method of learning and predicting the category or the class of a document given its text content. |English, Hindi, Arabic|
5252
|Named Entity Recognition |BERT| Named entity recognition (NER) is the task of classifying words or key phrases of a text into predefined entities of interest. |English|
53-
|Text Summarization|BERTSum|Text summarization is a language generation task of summarizing the input text into a shorter paragraph of text.|English
53+
|Text Summarization|BERTSumExt <br> BERTSumAbs <br> UniLM (s2s-ft)|Text summarization is a language generation task of summarizing the input text into a shorter paragraph of text.|English
5454
|Entailment |BERT, XLNet, RoBERTa| Textual entailment is the task of classifying the binary relation between two natural-language texts, *text* and *hypothesis*, to determine if the *text* agrees with the *hypothesis* or not. |English|
5555
|Question Answering |BiDAF, BERT, XLNet| Question answering (QA) is the task of retrieving or generating a valid answer for a given query in natural language, provided with a passage related to the query. |English|
5656
|Sentence Similarity |BERT, GenSen| Sentence similarity is the process of computing a similarity score given a pair of text documents. |English|

SETUP.md

Lines changed: 69 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -58,31 +58,84 @@ You can specify the environment name as well with the flag `-n`.
5858
Click on the following menus to see how to install the Python GPU environment:
5959

6060
<details>
61-
<summary><strong><em>Python GPU environment on Linux, MacOS</em></strong></summary>
61+
<summary><strong><em>Python GPU environment</em></strong></summary>
6262

63-
Assuming that you have a GPU machine, to install the Python GPU environment, which by default installs the CPU environment:
63+
Assuming that you have a GPU machine, to install the Python GPU environment,
64+
1. Check the CUDA **driver** version on your machine by running
6465

65-
cd nlp-recipes
66-
python tools/generate_conda_file.py --gpu
67-
conda env create -n nlp_gpu -f nlp_gpu.yaml
68-
69-
</details>
66+
nvidia-smi
67+
The top of the output shows the CUDA **driver** version, which is 10.0 in the example below.
68+
+-----------------------------------------------------------------------------+
69+
| NVIDIA-SMI 410.79 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Driver Version: 410. &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;CUDA Version: 10.0 |
70+
|-------------------------------+----------------------+----------------------+
71+
2. Decide which cuda **runtime** version you should install.
72+
The cuda **runtime** version is the version of the cudatoolkit that will be installed in the conda environment in the next step, which should be <= the CUDA **driver** version found in step 1.
73+
Currently, this repo uses PyTorch 1.4.0 which is compatible with cuda 9.2 and cuda 10.1. The conda environment file generated in step 3 installs cudatoolkit 10.1 by default. If your CUDA **driver** version is < 10.1, you should add additional argument "--cuda_version 9.2" when calling generate_conda_files.py.
7074

71-
<details>
72-
<summary><strong><em>Python GPU environment on Windows</em></strong></summary>
75+
3. Install the GPU environment:
76+
If CUDA **driver** version >= 10.1
7377

74-
Assuming that you have an Azure GPU DSVM machine, here are the steps to setup the Python GPU environment:
75-
1. Make sure you have CUDA Toolkit version 9.0 above installed on your Windows machine. You can run the command below in your terminal to check.
76-
77-
nvcc --version
78-
If you don't have CUDA Toolkit or don't have the right version, please download it from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit)
78+
cd nlp-recipes
79+
python tools/generate_conda_file.py --gpu
80+
conda env create -n nlp_gpu -f nlp_gpu.yaml
7981

80-
2. Install the GPU environment.
82+
If CUDA **driver** version < 10.1
8183

8284
cd nlp-recipes
83-
python tools/generate_conda_file.py --gpu
85+
python tools/generate_conda_file.py --gpu --cuda_version 9.2
8486
conda env create -n nlp_gpu -f nlp_gpu.yaml
8587

88+
4. Enable mixed precision training (optional)
89+
Mixed precision training is particularly useful if your model takes a long time to train. It usually reduces the training time by 50% and produces the same model quality. To enable mixed precision training, run the following command
90+
91+
conda activate nlp_gpu
92+
git clone https://github.com/NVIDIA/apex.git
93+
cd apex
94+
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
95+
96+
**Troubleshooting**:
97+
If you run into an error message "RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.", you need to make sure your NVIDIA Cuda compiler driver (nvcc) version and your cuda **runtime** version are exactly the same. To check the nvcc version, run
98+
99+
nvcc -V
100+
101+
If the nvcc version is 10.0, it's recommended to upgrade to 10.1 and re-create your conda environment with cudatoolkit=10.1.
102+
103+
**Steps to upgrade CUDA **driver** version and nvcc version**
104+
We have tested the following steps. Alternatively, you can follow the official instructions [here](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
105+
a. Update apt-get and reboot your machine
106+
107+
sudo apt-get update
108+
sudo apt-get upgrade --fix-missing
109+
sudo reboot
110+
b. Download the CUDA toolkit .run file from https://developer.nvidia.com/cuda-10.1-download-archive-base based on your target platform. For example, on a Linux machine with Ubuntu 16.04, run
111+
112+
wget https://developer.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.105_418.39_linux.run
113+
114+
c. Upgrade CUDA driver by running
115+
116+
sudo sh cuda_10.1.105_418.39_linux.run
117+
First, accept the user agreement.
118+
![](https://nlpbp.blob.core.windows.net/images/upgrade_cuda_driver/1agree_to_user_agreement.PNG)
119+
Next, choose the components to install.
120+
It's possible that you already have NVIDIA driver 418.39 and CUDA 10.1, but nvcc 10.0. In this case, you can uncheck the "DRIVER" box and upgrade nvcc by re-installing CUDA toolkit only.
121+
![](https://nlpbp.blob.core.windows.net/images/upgrade_cuda_driver/2install_cuda_only.PNG)
122+
123+
If you choose to install all components, follow the instructions on the screen to uninstall existing NVIDIA driver and CUDA toolkit first.
124+
![](https://nlpbp.blob.core.windows.net/images/upgrade_cuda_driver/3install_all.PNG)
125+
Then re-run
126+
127+
sudo sh cuda_10.1.105_418.39_linux.run
128+
Select "Yes" to update the cuda symlink.
129+
![](https://nlpbp.blob.core.windows.net/images/upgrade_cuda_driver/4Upgrade_symlink.PNG)
130+
131+
d. Run the following commands again to make sure you have NVIDIA driver 418.39, CUDA driver 10.1 and nvcc 10.1
132+
133+
nvidia-smi
134+
nvcc -V
135+
136+
e. Repeat steps 3 & 4 to recreate your conda environment with cudatoolkit **runtime** 10.1 and apex installed for mixed precision training.
137+
138+
86139
</details>
87140

88141
### Register Conda Environment in DSVM JupyterHub

cgmanifest.json

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,18 @@
1818
}
1919
},
2020
"license": "Apache-2.0"
21+
},
22+
{
23+
"component": {
24+
"type": "git",
25+
"git": {
26+
"repositoryUrl": "https://github.com/nlpyang/PreSumm",
27+
"commitHash": "2df3312582a3a014aacbc1be810841705c67d06e"
28+
}
29+
},
30+
"license": "MIT License"
2131
}
32+
2233
],
2334
"Version": 1
24-
}
35+
}

examples/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This folder contains examples and best practices, written in Jupyter notebooks,
66
|---| ------------------------ | ------------------- |---|
77
|[Text Classification](text_classification)|Topic Classification|BERT, XLNet, RoBERTa, DistilBERT|en, hi, ar|
88
|[Named Entity Recognition](named_entity_recognition) |Wikipedia NER|BERT|en|
9-
|[Text Summarization](text_summarization)|News Summarization, Headline Generation|Extractive: BERTSumExt <br> Abstractive: WIP, ETA: Mar. 2020|en
9+
|[Text Summarization](text_summarization)|News Summarization, Headline Generation|Extractive: BERTSumExt <br> Abstractive: UniLM (s2s-ft)|en
1010
|[Entailment](entailment)|MultiNLI Natural Language Inference|BERT|en|
1111
|[Question Answering](question_answering) |SQuAD|BiDAF, BERT, XLNet, DistilBERT|en|
1212
|[Sentence Similarity](sentence_similarity)|STS Benchmark|BERT, GenSen|en|

examples/model_explainability/interpret_dnn_layers.ipynb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@
4646
"outputs": [],
4747
"source": [
4848
"import sys\n",
49+
"from tempfile import TemporaryDirectory\n",
4950
"\n",
5051
"sys.path.append(\"../../\")\n",
5152
"import json\n",
@@ -322,11 +323,12 @@
322323
"text = \"rare bird has more than enough charm to make it memorable.\"\n",
323324
"\n",
324325
"# get the tokenized words.\n",
325-
"tokenizer = BertTokenizer.from_pretrained(\"bert-base-uncased\")\n",
326+
"cache_dir = TemporaryDirectory().name\n",
327+
"tokenizer = BertTokenizer.from_pretrained(\"bert-base-uncased\", cache_dir=cache_dir)\n",
326328
"words = [\"[CLS]\"] + tokenizer.tokenize(text) + [\"[SEP]\"]\n",
327329
"\n",
328330
"# load BERT base model\n",
329-
"model = BertModel.from_pretrained(\"bert-base-uncased\").to(device)\n",
331+
"model = BertModel.from_pretrained(\"bert-base-uncased\", cache_dir=cache_dir).to(device)\n",
330332
"for param in model.parameters():\n",
331333
" param.requires_grad = False\n",
332334
"model.eval()\n",

0 commit comments

Comments
 (0)