add bilinear upsampling #262

CarloLucibello · 2020-12-30T14:31:38Z

Co-authored-by: @ltjkoomen

Follow up to FluxML/Flux.jl#1180
I tried to preserve authorship with the commit message but it looks like I wasn't successful.
If someone knows how to do it, please tell me, it would be good to give @ltjkoomen
the credit he deserves for such good work (I did only a few minor changes)

Only 2d bilinear upsampling is implemented with the (hopefully future proof) API

bilinear_upsampling(x, k::NTuple)
∇bilinear_upsampling(Δ, k::NTuple)

TODO:

CarloLucibello · 2020-12-30T18:57:49Z

@maxfreu this works also on GPU! Can you benchmark it against your implementation?
It would be nice to have just one generic implementation (provided it is fast)

maxfreu · 2020-12-30T23:51:45Z

Hey, please give me anpther few days until I‘m back home mid next week, then I‘ll check. Have a good start into the new year! :)

CarloLucibello · 2020-12-31T06:32:09Z

happy holidays!

DhairyaLGandhi · 2020-12-31T15:33:53Z

There's a number of review comments in the Flux PR which would also be good to go over before merging this

CarloLucibello · 2020-12-31T16:04:59Z

I think I have already addressed them

src/upsample.jl

maxfreu · 2021-01-02T23:30:15Z

I think the function should be renamed to upsample_bilinear_2d, so that later upsample_nearest and upsample_xyz_Nd can be added and nicely group together. I also don't like the lowercase 2d, but lowercase is more consistent, so better keep it this way.

When I checked the performance, the hand written kernel was 3x faster in the forward pass, didnt test backward. But that was measured only with one size. I will report again when I had the time to benchmark properly. Furthermore I don't think the pytorch kernel corrects the edges of the gradient. But maybe whats good enough for them also works for us.

src/upsample.jl

CarloLucibello · 2021-01-03T09:23:41Z

I think the function should be renamed to upsample_bilinear_2d, so that later upsample_nearest and upsample_xyz_Nd can be added and nicely group together. I also don't like the lowercase 2d, but lowercase is more consistent, so better keep it this way

I can rename to upsample_bilinear, no need to add 2d since we can use dispatch to cover the different cases

Furthermore I don't think the pytorch kernel corrects the edges of the gradient. But maybe whats good enough for them also works for us.

I can remove the edge correction, but that will cause the gradient test to fail. I would like to see the discussion that led them to that decision before going through the same path

mcabbott · 2021-01-03T09:25:09Z

I think the function should be renamed to upsample_bilinear_2d, so that later upsample_nearest

Doesn't "bilinear" imply 2D? Then it could be just upsample_bilinear, and (as you say) upsample<tab> may one day list other options.

maxfreu · 2021-01-04T11:41:36Z

I just benchmarked on GPU using two different sizes which I took from the original U-Net, so I suspect they are representative:

196x196x128x1, time in ms

	this PR	pytoch-translation
forward	8.1	0.6
backward	3	3.3

32x32x1024x1, time in ms

	this PR	pytoch-translation
forward	2.3	0.5
backward	16.2	5.6

GPU: Nvidia 1080Ti, CUDA,jl 2.3, CUDA 11.0

Note that the upsampling and gradient results are not the same, they differ at sharp edges in the image. The pytorch version indeed doesn't perform corrections at the image border, but I couldn't trace any discussion in the pytorch github.

Further thinking about the API, I suggest the following:

upsample_bilinear(x, scale::NTuple)
∇upsample_bilinear(Δ, x)

That way we can easily recover the original image size, even if we later allow real scaling factors; we don't have to hope that the rounding to integer image size goes right.

mcabbott · 2021-01-04T12:03:30Z

Maybe ∇upsample_bilinear(Δ, size::Tuple) would be even better, in case x can be freed?

I guess using upsample_bilinear(x, size::Tuple) for the forward function would imply that arbitrary scales are accepted. Haven't absorbed whether that would be hard or easy here.

maxfreu · 2021-01-05T12:16:25Z

Maybe ∇upsample_bilinear(Δ, size::Tuple) would be even better, in case x can be freed?

Yes, that's even better.

I guess using upsample_bilinear(x, size::Tuple) for the forward function would imply that arbitrary scales are accepted. Haven't absorbed whether that would be hard or easy here.

Yes, that would imply that arbitrary, non-integer scales are accepted, which is not the case yet. However in DL you most often want to upsample by a factor of 2 to compensate pooling. So I would keep the scale Tuple in and restrict it to Int for a start. Later we can maybe add a keyword size to allow specifying an exact output size.

To subsume:

upsample_bilinear(x, scale::NTuple{2, Int})
# later maybe sth like:
upsample_bilinear(x, scale=(1,1); sz=Union{nothing, NTuple})

∇upsample_bilinear(Δ, sz::NTuple{4, Int}) # sz=size(x)

After these changes I'd say it's almost ready to merge. Maybe the test should be extended to c,n != 1, but then it's good to go. Performance improvements and specialized GPU stuff can come later.

By the way: would it be acceptable that CPU and GPU implementation produce slightly different results?

DhairyaLGandhi · 2021-01-05T14:56:31Z

Let's keep the edge correction, seems like a more complete implementation.

DhairyaLGandhi · 2021-01-05T14:58:48Z

@maxfreu could you suggest the performance improvements here before we move forward

src/upsample.jl

DhairyaLGandhi · 2021-01-05T15:03:56Z

src/upsample.jl

+    weight = similar(Δ, eltype(Δ), (size(kern)..., n_chan, n_chan))
+    weight .= 0
+
+    for i in 1:n_chan


Use cat and repeat here?

Does repeat support GPU Arrays?

I'm not sure that weight can be created anew by using cat and repeat

maybe also use zero(weight)? or is .= 0 type stable?

Something like this?

julia> t = zeros(3,3,3,3); julia> w = rand(3,3) 3×3 Array{Float64,2}: 0.966086 0.0129205 0.315534 0.823242 0.245357 0.179242 0.123778 0.372735 0.869323 julia> for i in 1:3 t[:,:,i,i] = w end julia> cat(Iterators.repeated(w, 3)..., dims = (3,4)) == t true

Co-authored-by: ltjkoomen

maxfreu · 2021-01-05T15:55:05Z

@maxfreu could you suggest the performance improvements here before we move forward

I have not much to say about the current implementation, as I didn't think into it. @ltjkoomen would be of help here. Note however that the forward pass is single-threaded on CPU, whereas backward is multi-threaded through conv + nnpack. So multi-threading would be of help.
Regarding GPU I can only offer to PR my pytorch port to CUDA. But probably my skills are not sufficient to make the results bit-wise equal to this code.

CarloLucibello · 2021-01-05T16:22:06Z

@maxfreu your benchmarks show that your implementation is much faster on gpu, both in the forward pass (where there is no edge correction) and in the backward. Having discrepancies between gpu and cpu versions is not ideal though, unless they are very small. Any chance your port could be made to run also on cpu?

CarloLucibello · 2021-01-05T16:27:21Z

I prefer to stick to ∇upsample_bilinear(Δ, k) instead of ∇upsample_bilinear(Δ, size(x)), because internally it would be using k in any case and I don't like to pass a factor in the forward and the input size in the back, is a bit confusing.
I don't see any problem in passing the scale k also in the non-integer case,
the equation y = floor(x * k), has a unique integer solution x for k >= 1 and any y.

maxfreu · 2021-01-05T16:41:57Z

Any chance your port could be made to run also on cpu?

Well, there is always this which we can slam in...

the equation y = floor(x * k), has a unique integer solution x for k >= 1 and any y.

convinced. Keep k.

Edit: The CPU kernel implementations for umsampling can be found here - almost 600 lines of code, covering different methods and dimension orderings. I wonder how this would look like in julia - but I think that's overkill for a start.

src/upsample.jl

CarloLucibello · 2021-01-07T07:18:44Z

Edit: The CPU kernel implementations for umsampling can be found here - almost 600 lines of code, covering different methods and dimension orderings. I wonder how this would look like in julia - but I think that's overkill for a start.

Yeah, I think for now it is important to introduce a ton of functionality we miss, decent performance is good enough, we can further optimize later. Moreover, I've not ported c++ code for the last couple of years, I'm very happy about that and don't want to ruin it 😄

DhairyaLGandhi · 2021-01-07T07:26:00Z

I would like the implementation to be correct to its academic paper with a good performance benchmark or known issues. Optimising for performance later becomes much more easier that way, otherwise in my experience it's a much slower process

CarloLucibello · 2021-01-07T07:26:11Z

This is good to go, we can merge when test complete

DhairyaLGandhi · 2021-01-07T07:30:46Z

This is doing a bunch of allocation and since nnlib is supposed to be where performance is tuned, we should try to understand where and how we can reduce that

src/upsample.jl

CarloLucibello · 2021-01-07T07:42:00Z

This is doing a bunch of allocation and since nnlib is supposed to be where performance is tuned, we should try to understand where and how we can reduce that

I don't have the time (and honestly also the will) to do this. My intention here is to add some functionality and don't let the work in FluxML/Flux.jl#1180 go wasted. Anyone is very welcome to improve on it later, but this is not going to be me.
We can file an issue with a reminder about checking performance, although some benchmarking has already done by @maxfreu above. Again, as I said, we miss a lot of functionality in Flux, and this is a show stopper for many, much more than performance issues.

maxfreu · 2021-01-07T12:43:38Z

My intention here is to add some functionality and don't let the work in FluxML/Flux.jl#1180 go wasted.

I would second that.

Remember to change the name to upsample_bilinear.

maxfreu · 2021-01-07T16:11:17Z

Sorry to annoy you, but I see a 7x performance decrease (15ms -> 110ms for 32x32x1024x1) in ∇bilinear_upsample when cats are added 🚫 🐱

maxfreu · 2021-01-08T09:33:58Z

Hey, I wanted to thank everybody for the effort, especially @CarloLucibello! I learned a lot :-) When I find the time I can try to port the C++ stuff, and bring CPU and GPU implementations in line.

CarloLucibello · 2021-01-08T09:39:21Z

thank you for the review! It would be nice to leverage julia ecosystem for this, e.g. ImageTransformations.jl or Interpolations.jl, but I fear it won't happen anytime soon, as we need gpu compatible and differentiable operators

CarloLucibello marked this pull request as draft December 30, 2020 14:32

CarloLucibello marked this pull request as ready for review December 30, 2020 18:50

CarloLucibello mentioned this pull request Dec 30, 2020

PyTorch feature parity FluxML/Flux.jl#1431

Open

92 tasks

CarloLucibello commented Dec 31, 2020

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

maxfreu reviewed Jan 2, 2021

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

DhairyaLGandhi reviewed Jan 5, 2021

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

DhairyaLGandhi reviewed Jan 5, 2021

View reviewed changes

CarloLucibello added 5 commits January 5, 2021 16:42

add bilinear upsampling

5703cc8

Co-authored-by: ltjkoomen

refactoring; working forward

69d6242

make it work

92de1d8

cl/upsample

1cef99a

small fix

38108ba

cleanup

1ac7d2c

CarloLucibello force-pushed the cl/upsample branch from 937fd65 to 1ac7d2c Compare January 5, 2021 16:19

DhairyaLGandhi reviewed Jan 5, 2021

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

src/upsample.jl Show resolved Hide resolved

FluxML deleted a comment from CarloLucibello Jan 5, 2021

CarloLucibello added 2 commits January 5, 2021 19:11

use cat

1437c3d

remove cat

82e30c5

fix cat

1dba5db

DhairyaLGandhi reviewed Jan 7, 2021

View reviewed changes

src/upsample.jl Show resolved Hide resolved

CarloLucibello added 2 commits January 7, 2021 08:34

more cat

39cf0ca

where T

caca2c8

CarloLucibello force-pushed the cl/upsample branch from 106ea6c to caca2c8 Compare January 7, 2021 07:35

change name; remove cat

147f03d

CarloLucibello merged commit acb6916 into master Jan 8, 2021

CarloLucibello mentioned this pull request Jan 8, 2021

compatibility with Flux JuliaMath/Interpolations.jl#396

Open

maxfreu mentioned this pull request Jan 25, 2021

improve bilinear upsampling #266

Merged

4 tasks

CarloLucibello deleted the cl/upsample branch June 15, 2023 17:06

Uh oh!

add bilinear upsampling #262

add bilinear upsampling #262

Uh oh!

Conversation

CarloLucibello commented Dec 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CarloLucibello commented Dec 30, 2020

Uh oh!

maxfreu commented Dec 30, 2020

Uh oh!

CarloLucibello commented Dec 31, 2020

Uh oh!

DhairyaLGandhi commented Dec 31, 2020

Uh oh!

CarloLucibello commented Dec 31, 2020

Uh oh!

Uh oh!

maxfreu commented Jan 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CarloLucibello commented Jan 3, 2021

Uh oh!

mcabbott commented Jan 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxfreu commented Jan 4, 2021

Uh oh!

mcabbott commented Jan 4, 2021

Uh oh!

maxfreu commented Jan 5, 2021

Uh oh!

DhairyaLGandhi commented Jan 5, 2021

Uh oh!

DhairyaLGandhi commented Jan 5, 2021

Uh oh!

Uh oh!

DhairyaLGandhi Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

maxfreu Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

CarloLucibello Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

maxfreu Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

DhairyaLGandhi Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

maxfreu commented Jan 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CarloLucibello commented Jan 5, 2021

Uh oh!

CarloLucibello commented Jan 5, 2021

Uh oh!

maxfreu commented Jan 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CarloLucibello commented Jan 7, 2021

Uh oh!

DhairyaLGandhi commented Jan 7, 2021

Uh oh!

CarloLucibello commented Jan 7, 2021

Uh oh!

DhairyaLGandhi commented Jan 7, 2021

Uh oh!

Uh oh!

CarloLucibello commented Jan 7, 2021

Uh oh!

maxfreu commented Jan 7, 2021

Uh oh!

maxfreu commented Jan 7, 2021

CarloLucibello commented Dec 30, 2020 •

edited

Loading

maxfreu commented Jan 2, 2021 •

edited

Loading

mcabbott commented Jan 3, 2021 •

edited

Loading

maxfreu commented Jan 5, 2021 •

edited

Loading

maxfreu commented Jan 5, 2021 •

edited

Loading