Fix typos: casual -> causal (#102)

awgu · web-flow · commit 2e4d04aa1c50 · 2025-01-10T09:18:29.000-08:00
diff --git a/examples/flex_attn.ipynb b/examples/flex_attn.ipynb
@@ -325,7 +325,7 @@
     "The implementation using as a mask_mod:\n",
     "```Python\n",
     "The implementation using a mask_mod:\n",
-    "def casual_mask(b,h,q_idx, kv_idx):\n",
+    "def causal_mask(b, h, q_idx, kv_idx):\n",
     "    return q_idx >= kv_idx\n",
     "```\n",
     "As you can see they look very similar, both return scalar tensors. The key differences\n",
@@ -449,7 +449,7 @@
     "### Sliding Window Attention\n",
     "The [Mistral paper](https://arxiv.org/abs/2310.06825) has a very nice visual of this bias and describes it. In essence you define a fixed size \"SLIDING_WINDOW\" and for autogressive decoding you only allow `torch.abs(q_tokens - kv_tokens) < SLIDING_WINDOW` to attend to each other. Typically this is also combined with causal attention. We are going to do this through a a nice pattern, mask composition. Typically masking can can conceptually be done in pieces and then composed together.\n",
     "\n",
-    "We are going to write two mask_functions 1 for doing `casual-masking`, and one for doing `windowed-attention` and compose them together to produce the final mask_fn. As we know from earlier, mask_fns return boolean values where a value of `True` indicates that the element should take part in attention.\n"
+    "We are going to write two mask_functions 1 for doing `causal-masking`, and one for doing `windowed-attention` and compose them together to produce the final mask_fn. As we know from earlier, mask_fns return boolean values where a value of `True` indicates that the element should take part in attention.\n"
    ]
   },
   {