Hi authors, thank you for your excellent work on RAFT.
I’d like to point out a small potential misstatement in the paper (Section 3.1 Feature Extraction). The paper states:
we estimate a dense displacement field (f¹, f²) which maps each pixel (u, v) in I₂ to its corresponding coordinates (u′, v′) = (u + f¹(u), v + f²(v)) in I₂
This sentence is a bit confusing because optical flow is typically defined as mapping from I₁ to I₂, i.e., each pixel (u, v) I₁ is mapped to a corresponding location in I₂ via:
(u', v') = (u + f^1(u,v), v + f^2(u,v))
That is, the flow field is defined on pixels in I₁, not in I₂.
Therefore, it seems the sentence should be corrected to something like:
"we estimate a dense displacement field (f¹, f²) which maps each pixel (u, v) in I₁ to its corresponding coordinates (u′, v′) = (u + f¹(u), v + f²(v)) in I₂."
Please let me know if this interpretation is correct or if there's a specific reason why the reference frame was I₂ in this sentence.
Thanks again for the great work and for sharing the code!