-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Updated Tensorflow.Net to 0.70.2 with Tensorflow 2.7.0. #7472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
NumSharp replaced with Tensorflow.NumPy. TensorShape replaced with Shape, Shape object has dimensions as 64 bit long, check added for casting to 32 bit int alsoTensor constructor using SafeTensorHandle/DangerousGetHandle and TF_DataType not required when casting. Added StringTensorFactory to wrap addition tensorflow.dll methods required to create Tensors from string based input.
@dotnet-policy-service agree |
CI failing is interesting, I wonder if the dotnet9.0 runtime needs to be included in the initial build steps? I installed the runtime manually to the |
Continuing CI issues, without access to logs we're limited in how far we can proceed with fixes. |
/azp run MachineLearning-CI |
Azure Pipelines successfully started running 1 pipeline(s). |
Sorry for the delay here, I'll have a look and see what failures you're hitting and figure out how we can get this working. Thank you for your contribution - this update is something we've wanted to get in. |
CI failures are all due to packages not mirrored to our build feeds. https://dev.azure.com/dnceng-public/public/_build/results?buildId=1066692&view=logs&j=80b813b5-9a08-5859-11a8-dc0e5b556e52&t=99848337-6ccc-53eb-9c14-1b676ae001b9
Let me help get those mirrored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully we get better build results now. We'll need to decide on final version.
eng/Versions.props
Outdated
@@ -67,9 +67,9 @@ | |||
<ParquetDotNetVersion>2.1.3</ParquetDotNetVersion> | |||
<PlotlyNETCSharpVersion>0.11.1</PlotlyNETCSharpVersion> | |||
<SharpZipLibVersion>1.4.2</SharpZipLibVersion> | |||
<TensorflowDotNETVersion>0.20.1</TensorflowDotNETVersion> | |||
<TensorflowDotNETVersion>0.70.2</TensorflowDotNETVersion> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this version and not latest (0.150.01
)? Was there an issue with the latest package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a time critical update and we needed to get to a version that could be deployed without requiring WSL support to enable full use of the GPU. I believe that 0.100.x releases still works without WSL but it seemed quite a big jump to go from 0.70.x and to jump Tensorflow from 2.7 to 2.10. We could investigate though to see if there are any breaking changes involved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should be on the latest supported so that we ensure we can take any future updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into issues going all the way to the latest TensorFlow.NET. I was able to absorb the breaking API changes, but they removal of strong name signing as well as taking on a dependency without a strong name will break consumption on .NETFramework. I filed SciSharp/TensorFlow.NET#1296 to track those.
I rolled back to a version without those breaks, hit one other break around disposal of status, and rolled back further to 0.100.2.
I think this is the latest version that will still work and it's just one release ahead of what you were using 🙃
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7472 +/- ##
=======================================
Coverage 68.98% 68.99%
=======================================
Files 1482 1482
Lines 273880 273901 +21
Branches 28254 28256 +2
=======================================
+ Hits 188941 188977 +36
+ Misses 77553 77534 -19
- Partials 7386 7390 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
docs/samples/Microsoft.ML.Samples.GPU/Microsoft.ML.Samples.GPU.csproj
Outdated
Show resolved
Hide resolved
…issue Tensorflow.NET stopped being strong name signed in 0.110.2. See SciSharp/TensorFlow.NET#1296
After this version a dependency was introduced on OneOf which is not strong name signed and breaks use on .NETFramework.
We can't use newer than 0.100.4 due to strong naming issues. There's a bug in the Session.Dispose introduced by SciSharp/TensorFlow.NET@a7c9a75 (in 0.100.4) which was later fixed in SciSharp/TensorFlow.NET@58de537 but that's after the strong naming regression.
… update-tensorflow-issue-7471
Ok, I think this addresses all the feedback. @Crichen Can you have a look at what I did and see if this still would work for you? If so we can bring in another reviewer to get this in. |
@ericstj I assume the failing test https://helixr1107v0xdeko0k025g8.blob.core.windows.net/dotnet-machinelearning-refs-pull-7472-merge-57e26a202b58496b9e/Microsoft.ML.Tests/1/console.e9146afc.log?helixlogtype=result is unrelated to tensor flow, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
..NET Framework tests are not failing, but crashing on exit. Finalizer seems to
|
Sigh, that's this ThreadLocal https://github.com/SciSharp/TensorFlow.NET/blob/0ee50d319e5539f15b13f8909fd246c18819d840/src/TensorFlowNET.Core/tensorflow.cs#L47 Which is accessed in the finalizer -- https://github.com/SciSharp/TensorFlow.NET/blob/0ee50d319e5539f15b13f8909fd246c18819d840/src/TensorFlowNET.Core/Sessions/BaseSession.cs#L301 This bug was introduced in SciSharp/TensorFlow.NET@ec340ee#diff-cb5a758e3cc3589393346616092a8e8cb3ab5f0bf833897526f765dff28486e2L294. It had actually been previously fixed in SciSharp/TensorFlow.NET@43625ab, but was later regressed. It's partially fixed with SciSharp/TensorFlow.NET@a7c9a75, but that change introduced a problem because Let me see if we can somehow workaround this. |
I wonder if we just pick up 0.100.4 and then patch it to set the |
would be great to have this merged...Is very annoying being limited to use old NVIDIA cards |
@ericstj many thanks for looking at this for us. I'm on holiday next week, but back in on the 21st and I'll check out the branch and have a look over. |
The TensorFlow.NET session finalizer has a bug where it will crash if run after the finalizer for the `tensorflow` type. Avoid that by ensuring we dispose all Session objects.
…chen/machinelearning into update-tensorflow-issue-7471
I think I have worked around all the crashes. What's happening is buggy finalizer in TF for Session will sometimes crash - depending on the order that the GC decides to finalize objects. So long as we Dispose all Session objects we'll avoid this. The problems is that ML.NET has rather loose rules for disposal. Here's the best I can summarize.
I found a case of both 1 and 3 in our tests where we weren't disposing, which was causing the objects to hit the finalizer and crash. |
I did a scrub of the TF.NET Codebase for other instances of crashing finalizers. I found one in |
Fixes #7471
NumSharp replaced with Tensorflow.NumPy.
TensorShape replaced with Shape, Shape object has dimensions as 64 bit long, check added for casting to 32 bit int alsoTensor constructor using SafeTensorHandle/DangerousGetHandle and TF_DataType not required when casting.
Added StringTensorFactory to wrap addition tensorflow.dll methods required to create Tensors from string based input.
We are excited to review your PR.
So we can do the best job, please check:
Fixes #nnnn
in your description to cause GitHub to automatically close the issue(s) when your PR is merged.