Azure Web App - SqlMainDom Lock and the ListeningLoop #12449

marcemarc · 2022-05-23T08:16:02Z

marcemarc
May 23, 2022

Have been asked about this issue three times now, from different people, one most recently in the forum (https://our.umbraco.com/forum/using-umbraco-and-getting-started/109069-umbraco-site-keeps-crashing-due-to-nucachecontentdb)

They have Load Balancing on Azure Web App configured correctly (usually that's what I've been initially asked to verify)

But every now and then their front end Umbraco Web App starts to fail and doesn't recover, have a look at the forum post for the details.

Anyway, the commonality in these three instances, were the SQL Server's DTU's were maxed out / SQL at 100%

Therefore the server couldn't obtain the SqlMainDom Lock, via issuing the SQL request, and so would keep trying every two seconds...

... the problem being that the SQL request being executed every two seconds kept the SQL Server 'maxed out' and so the attempt to get the lock was persisting the conditions that were preventing the MainDom lock from being retrieved :-P

(https://github.com/umbraco/Umbraco-CMS/pull/7984/files#diff-957698acc7a6f623e4da4b23707bdffc4e0cde27a6eb9956fdfbb9e4d691eebcR134)

There has been quite a bit of work around the SQLMainDom lock for V9/V10 and latest version of the listening loop is here:

Umbraco-CMS/src/Umbraco.Infrastructure/Runtime/SqlMainDomLock.cs

Line 152 in c0c9c50

private void ListeningLoop()

Workarounds involve scaling up the SQL instance! or moving the SQL MainDom lock to use another SQL instance (#11075) which minimises the chance of the issue occurring...

... but reason I've opened a discussion, rather than an issue is v8 is feature completes! and of course I've not seen this yet on a V9/V10 site so we might need to wait for somebody to be hosting V10 on an Azure Web site, and have these issues before we can do anything to resolve them...

... but I was wondering, (and this is the feature/idea) - if we were to 'count' the number of times the ListeningLoop executed... could we perhaps increase the timeout in between attempts, eg if we are on our fifth attempt, maybe wait 10 seconds before trying again, 10 attempts 30 seconds...etc or after 1000 attempts, maybe stop looping and stop the application? Would that give the SQL Server chance to recover?

or have I completely misunderstood what's going on and this all sort of won't happen with V10!

p-m-j · 2022-06-02T07:59:31Z

p-m-j
Jun 2, 2022

9.4.0 introduced the global setting MainDomReleaseSignalPollingInterval (default 2000ms) which allows a little more control than was there previously.

Exponential backoff for SqlMainDomLock sounds like a good suggestion, however I am hoping that SqlMainDomLock is no longer required as 9.4.0 also introduced FileSystemMainDomLock which uses lock files on disk instead of the database to control MainDom status, this works cross platform for both on the metal / azure app services unlike the default implementation in v9 (MainDomSemaphoreLock).

In Umbraco 10 FileSystemMainDomLock is the default implementation everywhere unless configured otherwise.

1 reply

marcemarc Jun 7, 2022
Author

Thanks @p-m-j ... #h5yr - be great to get this resolved for large sites on Azure, fingers crossed for the FileSystemMainDomLock!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Azure Web App - SqlMainDom Lock and the ListeningLoop #12449

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Azure Web App - SqlMainDom Lock and the ListeningLoop #12449

Uh oh!

marcemarc May 23, 2022

Replies: 1 comment · 1 reply

Uh oh!

p-m-j Jun 2, 2022

Uh oh!

marcemarc Jun 7, 2022 Author

marcemarc
May 23, 2022

Replies: 1 comment 1 reply

p-m-j
Jun 2, 2022

marcemarc Jun 7, 2022
Author