I have a .vtp 3d mesh model, and 2d rendered images in different views. Is it possible to train the real fusion based on such data? Thank you :-)