Skip to content
afiaka87 edited this page Apr 12, 2021 · 5 revisions

Sparse Attention (MS deepspeed)

dalle = DALLE(
    # ...
    attn_types = ('sparse')  # cycles between these four types of attention
)


### Other attetion layers:

By default `DALLE` will use full attention for all layers, but you can specify the attention type per layer as follows.

- `full` full attention
- `axial_row` axial attention, along the rows of the image feature map
- `axial_col` axial attention, along the columns of the image feature map
- `conv_like` convolution-like attention, for the image feature map

```python
dalle = DALLE(
    # ...
    attn_types = ('full', 'axial_row', 'axial_col', 'conv_like')  # cycles between these four types of attention
)
Clone this wiki locally