Skip to content

Updated Tensorflow.Net to 0.70.2 with Tensorflow 2.7.0. #7472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

Crichen
Copy link

@Crichen Crichen commented May 23, 2025

Fixes #7471

NumSharp replaced with Tensorflow.NumPy.
TensorShape replaced with Shape, Shape object has dimensions as 64 bit long, check added for casting to 32 bit int alsoTensor constructor using SafeTensorHandle/DangerousGetHandle and TF_DataType not required when casting.

Added StringTensorFactory to wrap addition tensorflow.dll methods required to create Tensors from string based input.

We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

NumSharp replaced with Tensorflow.NumPy.
TensorShape replaced with Shape, Shape object has dimensions as 64 bit long, check added for casting to 32 bit int alsoTensor constructor using SafeTensorHandle/DangerousGetHandle and TF_DataType not required when casting.

Added StringTensorFactory to wrap addition tensorflow.dll methods required to create Tensors from string based input.
@Crichen
Copy link
Author

Crichen commented May 23, 2025

@Crichen please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@dotnet-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@dotnet-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@dotnet-policy-service agree company="Microsoft"

Contributor License Agreement

@dotnet-policy-service agree

@Crichen
Copy link
Author

Crichen commented May 23, 2025

CI failing is interesting, I wonder if the dotnet9.0 runtime needs to be included in the initial build steps?
Following the standard build process: 6.0.36, 8.0.16 and 10.0.0-preview.3.25171.5 are installed to the machinelearning\.dotnet folder. Running .\build.cmd -test -integrationTest resulted in errors with referencing 9.0.

I installed the runtime manually to the machinelearning\.dotnet location and this was then resolved. Very unsure of where in the tooling pipeline this would need to be updated.

@Crichen
Copy link
Author

Crichen commented May 26, 2025

Continuing CI issues, without access to logs we're limited in how far we can proceed with fixes.

@ericstj
Copy link
Member

ericstj commented Jun 12, 2025

/azp run MachineLearning-CI

@ericstj ericstj self-assigned this Jun 12, 2025
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ericstj ericstj self-requested a review June 12, 2025 20:12
@ericstj
Copy link
Member

ericstj commented Jun 12, 2025

Sorry for the delay here, I'll have a look and see what failures you're hitting and figure out how we can get this working. Thank you for your contribution - this update is something we've wanted to get in.

@ericstj
Copy link
Member

ericstj commented Jun 17, 2025

CI failures are all due to packages not mirrored to our build feeds. https://dev.azure.com/dnceng-public/public/_build/results?buildId=1066692&view=logs&j=80b813b5-9a08-5859-11a8-dc0e5b556e52&t=99848337-6ccc-53eb-9c14-1b676ae001b9

/__w/1/s/src/Microsoft.ML.Console/Microsoft.ML.Console.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.AutoML/Microsoft.ML.AutoML.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.CodeGenerator/Microsoft.ML.CodeGenerator.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.AutoML.Interactive/Microsoft.ML.AutoML.Interactive.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples.GPU/Microsoft.ML.Samples.GPU.csproj : error NU1603: Warning As Error: Microsoft.ML.Samples.GPU depends on SciSharp.TensorFlow.Redist-Linux-GPU (>= 2.7.0) but SciSharp.TensorFlow.Redist-Linux-GPU 2.7.0 was not found. SciSharp.TensorFlow.Redist-Linux-GPU 2.11.1 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples.GPU/Microsoft.ML.Samples.GPU.csproj : error NU1101: Unable to find package SciSharp.TensorFlow.Redist-Linux-GPU-primary. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples.GPU/Microsoft.ML.Samples.GPU.csproj : error NU1101: Unable to find package SciSharp.TensorFlow.Redist-Linux-GPU-fragment1. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples.GPU/Microsoft.ML.Samples.GPU.csproj : error NU1101: Unable to find package SciSharp.TensorFlow.Redist-Linux-GPU-fragment2. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples.GPU/Microsoft.ML.Samples.GPU.csproj : error NU1101: Unable to find package SciSharp.TensorFlow.Redist-Linux-GPU-fragment3. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples.GPU/Microsoft.ML.Samples.GPU.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.Fairlearn.Tests/Microsoft.ML.Fairlearn.Tests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples/Microsoft.ML.Samples.csproj : error NU1603: Warning As Error: Microsoft.ML.Samples depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.Samples/Microsoft.ML.Samples.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.PerformanceTests/Microsoft.ML.PerformanceTests.csproj : error NU1603: Warning As Error: Microsoft.ML.PerformanceTests depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.AutoML.Samples/Microsoft.ML.AutoML.Samples.csproj : error NU1603: Warning As Error: Microsoft.ML.AutoML.Samples depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/docs/samples/Microsoft.ML.AutoML.Samples/Microsoft.ML.AutoML.Samples.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.PerformanceTests/Microsoft.ML.PerformanceTests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.TensorFlow.Tests/Microsoft.ML.TensorFlow.Tests.csproj : error NU1603: Warning As Error: Microsoft.ML.TensorFlow.Tests depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.TensorFlow.Tests/Microsoft.ML.TensorFlow.Tests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.Core.Tests/Microsoft.ML.Core.Tests.csproj : error NU1603: Warning As Error: Microsoft.ML.Core.Tests depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.Core.Tests/Microsoft.ML.Core.Tests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.CodeGenerator.Tests/Microsoft.ML.CodeGenerator.Tests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.Benchmarks.Tests/Microsoft.ML.Benchmarks.Tests.csproj : error NU1603: Warning As Error: Microsoft.ML.PerformanceTests depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.Benchmarks.Tests/Microsoft.ML.Benchmarks.Tests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.AutoML.Tests/Microsoft.ML.AutoML.Tests.csproj : error NU1603: Warning As Error: Microsoft.ML.AutoML.Tests depends on SciSharp.TensorFlow.Redist (>= 2.7.0) but SciSharp.TensorFlow.Redist 2.7.0 was not found. SciSharp.TensorFlow.Redist 2.16.0 was resolved instead. [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/test/Microsoft.ML.AutoML.Tests/Microsoft.ML.AutoML.Tests.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.Vision/Microsoft.ML.Vision.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.TensorFlow/Microsoft.ML.TensorFlow.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.DnnAnalyzer/Microsoft.ML.DnnAnalyzer/Microsoft.ML.DnnAnalyzer.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]
/__w/1/s/src/Microsoft.ML.Fairlearn/Microsoft.ML.Fairlearn.csproj : error NU1101: Unable to find package MethodBoundaryAspect.Fody. No packages exist with this id in source(s): darc-pub-dotnet-maintenance-packages-ab95a1f1, dotnet-eng, dotnet-libraries, dotnet-libraries-transport, dotnet-public, dotnet-tools, dotnet5-roslyn, dotnet8, dotnet9, mlnet-assets, mlnet-daily, vs-buildservices [/__w/1/s/Microsoft.ML.sln]

Let me help get those mirrored.

Copy link
Member

@ericstj ericstj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully we get better build results now. We'll need to decide on final version.

@@ -67,9 +67,9 @@
<ParquetDotNetVersion>2.1.3</ParquetDotNetVersion>
<PlotlyNETCSharpVersion>0.11.1</PlotlyNETCSharpVersion>
<SharpZipLibVersion>1.4.2</SharpZipLibVersion>
<TensorflowDotNETVersion>0.20.1</TensorflowDotNETVersion>
<TensorflowDotNETVersion>0.70.2</TensorflowDotNETVersion>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this version and not latest (0.150.01)? Was there an issue with the latest package?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a time critical update and we needed to get to a version that could be deployed without requiring WSL support to enable full use of the GPU. I believe that 0.100.x releases still works without WSL but it seemed quite a big jump to go from 0.70.x and to jump Tensorflow from 2.7 to 2.10. We could investigate though to see if there are any breaking changes involved?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should be on the latest supported so that we ensure we can take any future updates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into issues going all the way to the latest TensorFlow.NET. I was able to absorb the breaking API changes, but they removal of strong name signing as well as taking on a dependency without a strong name will break consumption on .NETFramework. I filed SciSharp/TensorFlow.NET#1296 to track those.

I rolled back to a version without those breaks, hit one other break around disposal of status, and rolled back further to 0.100.2.

I think this is the latest version that will still work and it's just one release ahead of what you were using 🙃

Copy link

codecov bot commented Jun 17, 2025

Codecov Report

Attention: Patch coverage is 76.33588% with 31 lines in your changes missing coverage. Please review.

Project coverage is 68.99%. Comparing base (71e1280) to head (e99dcc1).

Files with missing lines Patch % Lines
src/Microsoft.ML.TensorFlow/TensorflowUtils.cs 55.88% 14 Missing and 1 partial ⚠️
src/Microsoft.ML.Vision/DnnRetrainTransform.cs 61.53% 8 Missing and 2 partials ⚠️
src/Microsoft.ML.TensorFlow/TensorflowTransform.cs 85.00% 3 Missing ⚠️
...rc/Microsoft.ML.TensorFlow/TensorTypeExtensions.cs 0.00% 0 Missing and 2 partials ⚠️
.../Microsoft.ML.Vision/ImageClassificationTrainer.cs 97.56% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7472   +/-   ##
=======================================
  Coverage   68.98%   68.99%           
=======================================
  Files        1482     1482           
  Lines      273880   273901   +21     
  Branches    28254    28256    +2     
=======================================
+ Hits       188941   188977   +36     
+ Misses      77553    77534   -19     
- Partials     7386     7390    +4     
Flag Coverage Δ
Debug 68.99% <76.33%> (+<0.01%) ⬆️
production 63.28% <74.79%> (+<0.01%) ⬆️
test 89.45% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ft.ML.TensorFlow.Tests/TensorFlowEstimatorTests.cs 98.52% <100.00%> (+0.01%) ⬆️
...t/Microsoft.ML.TensorFlow.Tests/TensorflowTests.cs 91.64% <100.00%> (ø)
.../Microsoft.ML.Vision/ImageClassificationTrainer.cs 92.17% <97.56%> (+0.03%) ⬆️
...rc/Microsoft.ML.TensorFlow/TensorTypeExtensions.cs 65.38% <0.00%> (+30.76%) ⬆️
src/Microsoft.ML.TensorFlow/TensorflowTransform.cs 85.57% <85.00%> (ø)
src/Microsoft.ML.Vision/DnnRetrainTransform.cs 61.01% <61.53%> (ø)
src/Microsoft.ML.TensorFlow/TensorflowUtils.cs 72.80% <55.88%> (-0.13%) ⬇️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ericstj added 7 commits July 9, 2025 13:19
…issue

Tensorflow.NET stopped being strong name signed in 0.110.2.  See SciSharp/TensorFlow.NET#1296
After this version a dependency was introduced on OneOf which is not strong name signed and breaks use on .NETFramework.
We can't use newer than 0.100.4 due to strong naming issues.

There's a bug in the Session.Dispose introduced by SciSharp/TensorFlow.NET@a7c9a75 (in 0.100.4) which was later fixed in SciSharp/TensorFlow.NET@58de537 but that's after the strong naming regression.
@ericstj
Copy link
Member

ericstj commented Jul 9, 2025

Ok, I think this addresses all the feedback. @Crichen Can you have a look at what I did and see if this still would work for you? If so we can bring in another reviewer to get this in.

@ericstj ericstj requested a review from tarekgh July 10, 2025 18:34
@tarekgh
Copy link
Member

tarekgh commented Jul 10, 2025

Copy link
Member

@tarekgh tarekgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ericstj
Copy link
Member

ericstj commented Jul 10, 2025

..NET Framework tests are not failing, but crashing on exit. Finalizer seems to be double-disposing accessing another object outside it's graph which has already been finalized/disposed.

Unhandled exception: System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'The ThreadLocal object has been disposed.'.
   at System.Threading.ThreadLocal`1.GetValueSlow()
   at Tensorflow.BaseSession.DisposeUnmanagedResources(IntPtr handle)
   at Tensorflow.DisposableObject.Dispose(Boolean disposing)
   at Tensorflow.DisposableObject.Finalize()
System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'The ThreadLocal object has been disposed.'.
   at System.Threading.ThreadLocal`1.GetValueSlow()
   at Tensorflow.BaseSession.DisposeUnmanagedResources(IntPtr handle)
   at Tensorflow.DisposableObject.Dispose(Boolean disposing)
   at Tensorflow.DisposableObject.Finalize()

@ericstj
Copy link
Member

ericstj commented Jul 10, 2025

Sigh, that's this ThreadLocal https://github.com/SciSharp/TensorFlow.NET/blob/0ee50d319e5539f15b13f8909fd246c18819d840/src/TensorFlowNET.Core/tensorflow.cs#L47

Which is accessed in the finalizer -- https://github.com/SciSharp/TensorFlow.NET/blob/0ee50d319e5539f15b13f8909fd246c18819d840/src/TensorFlowNET.Core/Sessions/BaseSession.cs#L301
That'll fail if the TF object and it's ThreadLocal's are finalized first.

This bug was introduced in SciSharp/TensorFlow.NET@ec340ee#diff-cb5a758e3cc3589393346616092a8e8cb3ab5f0bf833897526f765dff28486e2L294.

It had actually been previously fixed in SciSharp/TensorFlow.NET@43625ab, but was later regressed.

It's partially fixed with SciSharp/TensorFlow.NET@a7c9a75, but that change introduced a problem because _status was not set in all cases. That was fixed in SciSharp/TensorFlow.NET@58de537 but we can't take that due to the strong name bugs.

Let me see if we can somehow workaround this.

@ericstj
Copy link
Member

ericstj commented Jul 10, 2025

I wonder if we just pick up 0.100.4 and then patch it to set the _status value of the session.

@rblanca
Copy link

rblanca commented Jul 11, 2025

would be great to have this merged...Is very annoying being limited to use old NVIDIA cards

@Crichen
Copy link
Author

Crichen commented Jul 11, 2025

@ericstj many thanks for looking at this for us. I'm on holiday next week, but back in on the 21st and I'll check out the branch and have a look over.

ericstj added 3 commits July 11, 2025 12:10
The TensorFlow.NET session finalizer has a bug where it will crash if run after the finalizer for the `tensorflow` type.

Avoid that by ensuring we dispose all Session objects.
@ericstj
Copy link
Member

ericstj commented Jul 11, 2025

I think I have worked around all the crashes. What's happening is buggy finalizer in TF for Session will sometimes crash - depending on the order that the GC decides to finalize objects.

So long as we Dispose all Session objects we'll avoid this.

The problems is that ML.NET has rather loose rules for disposal. Here's the best I can summarize.

  1. A TensorFlowModel contains a Session and should be disposed if this your final object.
  2. A TensorFlowEstimator contains the TensorFlowModel, but itself is not disposable. If this is your final object, then you should maintain the lifetime via the TensorFlowModel. The EstimatorChain doesn't have plumbing for IDisosable, presumably because it's an intermediate object that should be fit to a transformer.
  3. A TensorFlowTransformer contains a copy of the session from the TensorFlow model. This object is Disposable, as is a TransformerChain. If this (or a chain) is your final object then you can use it to manage lifetime.

I found a case of both 1 and 3 in our tests where we weren't disposing, which was causing the objects to hit the finalizer and crash.

@ericstj
Copy link
Member

ericstj commented Jul 11, 2025

I did a scrub of the TF.NET Codebase for other instances of crashing finalizers. I found one in EagerResourceDeleter but we don't use that. Just so happens that was already reported, so I dropped a note about root cause in SciSharp/TensorFlow.NET#984.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants