Hello, many method uses coordinates scale into 0-1, but input image will be resized and padded, for example, llava. This seems doesn't make sense. What's the right process way in MLLM in terms of OVD or DOD task?