Skip to content

Commit 2e4d04a

Browse files
authored
Fix typos: casual -> causal (#102)
1 parent a710e18 commit 2e4d04a

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

examples/flex_attn.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -325,7 +325,7 @@
325325
"The implementation using as a mask_mod:\n",
326326
"```Python\n",
327327
"The implementation using a mask_mod:\n",
328-
"def casual_mask(b,h,q_idx, kv_idx):\n",
328+
"def causal_mask(b, h, q_idx, kv_idx):\n",
329329
" return q_idx >= kv_idx\n",
330330
"```\n",
331331
"As you can see they look very similar, both return scalar tensors. The key differences\n",
@@ -449,7 +449,7 @@
449449
"### Sliding Window Attention\n",
450450
"The [Mistral paper](https://arxiv.org/abs/2310.06825) has a very nice visual of this bias and describes it. In essence you define a fixed size \"SLIDING_WINDOW\" and for autogressive decoding you only allow `torch.abs(q_tokens - kv_tokens) < SLIDING_WINDOW` to attend to each other. Typically this is also combined with causal attention. We are going to do this through a a nice pattern, mask composition. Typically masking can can conceptually be done in pieces and then composed together.\n",
451451
"\n",
452-
"We are going to write two mask_functions 1 for doing `casual-masking`, and one for doing `windowed-attention` and compose them together to produce the final mask_fn. As we know from earlier, mask_fns return boolean values where a value of `True` indicates that the element should take part in attention.\n"
452+
"We are going to write two mask_functions 1 for doing `causal-masking`, and one for doing `windowed-attention` and compose them together to produce the final mask_fn. As we know from earlier, mask_fns return boolean values where a value of `True` indicates that the element should take part in attention.\n"
453453
]
454454
},
455455
{

0 commit comments

Comments
 (0)