Azure Web App - SqlMainDom Lock and the ListeningLoop #12449
Replies: 1 comment 1 reply
-
9.4.0 introduced the global setting Exponential backoff for In Umbraco 10 |
Beta Was this translation helpful? Give feedback.
-
9.4.0 introduced the global setting Exponential backoff for In Umbraco 10 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Have been asked about this issue three times now, from different people, one most recently in the forum (https://our.umbraco.com/forum/using-umbraco-and-getting-started/109069-umbraco-site-keeps-crashing-due-to-nucachecontentdb)
They have Load Balancing on Azure Web App configured correctly (usually that's what I've been initially asked to verify)
But every now and then their front end Umbraco Web App starts to fail and doesn't recover, have a look at the forum post for the details.
Anyway, the commonality in these three instances, were the SQL Server's DTU's were maxed out / SQL at 100%
Therefore the server couldn't obtain the SqlMainDom Lock, via issuing the SQL request, and so would keep trying every two seconds...
... the problem being that the SQL request being executed every two seconds kept the SQL Server 'maxed out' and so the attempt to get the lock was persisting the conditions that were preventing the MainDom lock from being retrieved :-P
(https://github.com/umbraco/Umbraco-CMS/pull/7984/files#diff-957698acc7a6f623e4da4b23707bdffc4e0cde27a6eb9956fdfbb9e4d691eebcR134)
There has been quite a bit of work around the SQLMainDom lock for V9/V10 and latest version of the listening loop is here:
Umbraco-CMS/src/Umbraco.Infrastructure/Runtime/SqlMainDomLock.cs
Line 152 in c0c9c50
Workarounds involve scaling up the SQL instance! or moving the SQL MainDom lock to use another SQL instance (#11075) which minimises the chance of the issue occurring...
... but reason I've opened a discussion, rather than an issue is v8 is feature completes! and of course I've not seen this yet on a V9/V10 site so we might need to wait for somebody to be hosting V10 on an Azure Web site, and have these issues before we can do anything to resolve them...
... but I was wondering, (and this is the feature/idea) - if we were to 'count' the number of times the ListeningLoop executed... could we perhaps increase the timeout in between attempts, eg if we are on our fifth attempt, maybe wait 10 seconds before trying again, 10 attempts 30 seconds...etc or after 1000 attempts, maybe stop looping and stop the application? Would that give the SQL Server chance to recover?
or have I completely misunderstood what's going on and this all sort of won't happen with V10!
Beta Was this translation helpful? Give feedback.
All reactions