Hi @unsky Please, can you check, did I read your Focal Loss formulas correctly? For CE, delta is: * `if (i == j)` then `delta = 1-p` * `if (i != j)` then `delta = -p` ---- For Focal Loss (when **gamm=2**), delta is: * `if (i == j)` then `delta = (1-p)*` `alpha * (1 - pt) * (2 * pt * log(pt) + pt - 1)` * `if (i != j)` then `delta = (-p)*` `alpha * (1 - pt) * (2 * pt * log(pt) + pt - 1)` Where are: * `pt = softmax(i)` - is a probability of the correct class id. * `p = softmax(j)` where is `i = label` truth class id.