I tried using bert large instead of bert in the original code, and modified three parameters (hidden size=1024, hidden layers=24, attention heads=16) in bert config.
Here's the error log:
https://gist.github.com/AeroXi/d4d273da9f443c0f2cf9f6d6872eeffe
My device is 4 1080Ti
Maybe I can skip domain adaption and just extract features? However, the generated filename starts with "bert" instead of "bert_da", I can't use it directly even changed the correct filename when training r2c. Should I make other modification?