TransformerLensOrg
diff --git a/‎demos/BERT.ipynb
Lines changed: 120 additions & 27 deletions b/‎demos/BERT.ipynb
Lines changed: 120 additions & 27 deletions
@@ -15,7 +15,7 @@
    "metadata": {},
    "source": [
     "# BERT in TransformerLens\n",
-    "This demo shows how to use BERT in TransformerLens for the Masked Language Modelling task."
+    "This demo shows how to use BERT in TransformerLens for the Masked Language Modelling and Next Sentence Prediction task."
    ]
   },
   {
@@ -29,16 +29,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Running as a Jupyter notebook - intended for development only!\n",
-      "The autoreload extension is already loaded. To reload it, use:\n",
-      "  %reload_ext autoreload\n"
+      "Running as a Jupyter notebook - intended for development only!\n"
      ]
     },
     {
@@ -92,7 +90,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [
     {
@@ -116,7 +114,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
@@ -136,7 +134,7 @@
        "<circuitsvis.utils.render.RenderedHTML at 0x13a9760d0>"
       ]
      },
-     "execution_count": 4,
+     "execution_count": 3,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -150,7 +148,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -159,12 +157,12 @@
     "\n",
     "from transformers import AutoTokenizer\n",
     "\n",
-    "from transformer_lens import HookedEncoder"
+    "from transformer_lens import HookedEncoder, BertNextSentencePrediction"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 5,
    "metadata": {},
    "outputs": [
     {
@@ -173,7 +171,7 @@
        "<torch.autograd.grad_mode.set_grad_enabled at 0x2a285a790>"
       ]
      },
-     "execution_count": 6,
+     "execution_count": 5,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -189,12 +187,12 @@
    "source": [
     "# BERT\n",
     "\n",
-    "In this section, we will load a pretrained BERT model and use it for the Masked Language Modelling task"
+    "In this section, we will load a pretrained BERT model and use it for the Masked Language Modelling and Next Sentence Prediction task"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [
     {
@@ -225,37 +223,132 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Use the \"[MASK]\" token to mask any tokens which you would like the model to predict."
+    "## Masked Language Modelling\n",
+    "Use the \"[MASK]\" token to mask any tokens which you would like the model to predict.  \n",
+    "When specifying return_type=\"predictions\" the prediction of the model is returned, alternatively (and by default) the function returns logits.  \n",
+    "You can also specify None as return type for which nothing is returned"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 7,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Prompt: The [MASK] is bright today.\n",
+      "Prediction: \"sun\"\n"
+     ]
+    }
+   ],
+   "source": [
+    "prompt = \"The [MASK] is bright today.\"\n",
+    "\n",
+    "prediction = bert(prompt, return_type=\"predictions\")\n",
+    "\n",
+    "print(f\"Prompt: {prompt}\")\n",
+    "print(f'Prediction: \"{prediction}\"')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can also input a list of prompts:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Prompt: ['The [MASK] is bright today.', 'She [MASK] to the store.', 'The dog [MASK] the ball.']\n",
+      "Prediction: \"['Prediction 0: sun', 'Prediction 1: went', 'Prediction 2: caught']\"\n"
+     ]
+    }
+   ],
+   "source": [
+    "prompts = [\"The [MASK] is bright today.\", \"She [MASK] to the store.\", \"The dog [MASK] the ball.\"]\n",
+    "\n",
+    "predictions = bert(prompts, return_type=\"predictions\")\n",
+    "\n",
+    "print(f\"Prompt: {prompts}\")\n",
+    "print(f'Prediction: \"{predictions}\"')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Next Sentence Prediction\n",
+    "To carry out Next Sentence Prediction, you have to use the class BertNextSentencePrediction, and pass a HookedEncoder in its constructor.  \n",
+    "Then, create a list with the two sentences you want to perform NSP on as elements and use that as input to the forward function.  \n",
+    "The model will then predict the probability of the sentence at position 1 following (i.e. being the next sentence) to the sentence at position 0."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sentence A: A man walked into a grocery store.\n",
+      "Sentence B: He bought an apple.\n",
+      "Prediction: \"The sentences are sequential\"\n"
+     ]
+    }
+   ],
    "source": [
-    "prompt = \"BERT: Pre-training of Deep Bidirectional [MASK] for Language Understanding\"\n",
+    "nsp = BertNextSentencePrediction(bert)\n",
+    "sentence_a = \"A man walked into a grocery store.\"\n",
+    "sentence_b = \"He bought an apple.\"\n",
     "\n",
-    "input_ids = tokenizer(prompt, return_tensors=\"pt\")[\"input_ids\"]\n",
-    "mask_index = (input_ids.squeeze() == tokenizer.mask_token_id).nonzero().item()"
+    "input = [sentence_a, sentence_b]\n",
+    "\n",
+    "predictions = nsp(input, return_type=\"predictions\")\n",
+    "\n",
+    "print(f\"Sentence A: {sentence_a}\")\n",
+    "print(f\"Sentence B: {sentence_b}\")\n",
+    "print(f'Prediction: \"{predictions}\"')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Inputting tokens directly\n",
+    "You can also input tokens instead of a string or a list of strings into the model, which could look something like this"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 10,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Prompt: BERT: Pre-training of Deep Bidirectional [MASK] for Language Understanding\n",
-      "Prediction: \"Systems\"\n"
+      "Prompt: The [MASK] is bright today.\n",
+      "Prediction: \"sun\"\n"
      ]
     }
    ],
    "source": [
-    "logprobs = bert(input_ids)[input_ids == tokenizer.mask_token_id].log_softmax(dim=-1)\n",
+    "prompt = \"The [MASK] is bright today.\"\n",
+    "\n",
+    "tokens = tokenizer(prompt, return_tensors=\"pt\")[\"input_ids\"]\n",
+    "logits = bert(tokens) # Since we are not specifying return_type, we get the logits\n",
+    "logprobs = logits[tokens == tokenizer.mask_token_id].log_softmax(dim=-1)\n",
     "prediction = tokenizer.decode(logprobs.argmax(dim=-1).item())\n",
     "\n",
     "print(f\"Prompt: {prompt}\")\n",
@@ -267,13 +360,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Better luck next time, BERT."
+    "Well done, BERT!"
    ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": ".venv",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },
@@ -287,7 +380,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.8"
+   "version": "3.10.15"
   },
   "orig_nbformat": 4
  },