Also would be a cool problem to have imo, pretty simple, you would have to compute the forward pass in fp16, but keep grads in fp32