Slow /.well-known/openid-configuration endpoints #17
-
Which version of Duende IdentityServer are you using? have a 99th percentile performance of ~1 sec with peaks up to 20 secs during ~ 5.45 AM GMT+1 During peaks to ~20 secs, other endpoints slow down considerably as well To Reproduce Expected behavior Log output/exception with stacktrace
Additional context We found DuendeArchive/Support#1361 but no improvement occurred after updating Azure.Core to 1.44. NuGet Versions with ~20ms 99th percentile (Pre 8th January)
NuGet Versions with ~1sec 99th percentile (Post 8th January)
We have added aggressive caching with a custom During the time between 14:30 - 14:40, I ran a dummy app to poll the discovery endpoint in an interval - latency was perfectly fine there. After stopping the dummy app, the regular traffic calling the endpoint started seeing latencies of ~1 sec again. The relevant caching code for var key = $"{_memoryKeyPrefix}{baseUrl}-{issuerUri}";
if (_memoryCache.TryGetValue(key, out Dictionary<string, object>? discoveryDocument) && discoveryDocument != null)
return discoveryDocument;
discoveryDocument = await base.CreateDiscoveryDocumentAsync(baseUrl, issuerUri);
var expiration = _timeProvider.GetUtcNow().AddSeconds(Options.Discovery.ResponseCacheInterval ?? _defaultExpirationInSeconds);
_memoryCache.Set(key, discoveryDocument, expiration);
return discoveryDocument; |
Beta Was this translation helpful? Give feedback.
Replies: 13 comments 36 replies
-
It indeed looks like something happened in that upgrade, yes. Looking at the example trace for the GET request indicates that the problem is outside of the IdentityServer pipeline. The *) To be 100% correct, there are a few infrastructure level things that are done outside of that block. The first is that the activity only fires if the path matches an IdentityServer endpoint => the endpoint resolution happens outside of the block. It's a simple for loop with only in memory dependencies and I cannot imagine how that would take close to a second. Also if you are using the dynamic providers feature, handling of those is outside of the My overall feeling here based on the diagnostics shared is that it's something happening before or after the actual IdentityServer code is invoked. Did you do any code changes as part of the upgrade? Any changes to infrastructure? |
Beta Was this translation helpful? Give feedback.
-
Thanks for your response. There is no pre / post processing there but I'll try to get more diagnostics from the NET Core pipeline. The strange thing is, that its only affecting one region in Azure - North Europe has about 20% traffic of West Europe but no issues at all. I'll try to reproduce locally as we see the problem in 3 different environments. I'll also further investigate transitive dependencies Also note, all other protocol endpoints (/introspect, /token etc.) behave normal. Its just the well-known one. |
Beta Was this translation helpful? Give feedback.
-
Checking App Insights today, we've seen the following: /well-known Endpoints have very high latency while other Duende Protocol Endpoints (Introspect, Token) increase as well, just not as much. I'll investige further within the ASP.NET Core pipeline but if you could check on your end as well @AndersAbel to see if there is a difference between the endpoints. We're using YARP as proxy in front of identity server, I'll check if there is a problem there. |
Beta Was this translation helpful? Give feedback.
-
Thanks for sharing those stats. The ./well-known endpoints are actually the ones I consider most simple in their implementation. There is less code to run and less storage/database access. The token endpoint in comparison is more complex, but also (as far as I remember) utilizes all storage/config that the ./well-known endpoints to. I do not doubt that this is a problem, but to properly troubleshoot we would need full activity traces that shows timing all the way from the client's requests to how it is handled on the server side. The only tangible data point we have so far is the one I referenced above and that one shows that the execution of the discovery endpoint class only takes up a fraction of the total time. Are you using the dynamic providers feature? |
Beta Was this translation helpful? Give feedback.
-
I've opened another issue for YARP and after collecting metrics and checking with the team there, it seems that YARP does not cause the latency. Metrics suggest that the request is immediatly forwarded to the network stack. We do not use dynamic providers. As I've mentioned above, we started service the In comparison, the I'll enable ASP.NET core telemetry to see where the additional time comes from. |
Beta Was this translation helpful? Give feedback.
-
I think that is the right next step. Right now we don't know where the time is spent and for any performance issue metrics is the only way to solve it. There are things in the discovery endpoint as well as in the IdentityServer endpoint selection/routing that could potentially cause timing issues (never say never in these cases until it is proved). The only thing I can say is that the numbers shared so far indicate that the issue is outside of the IdentityServer middleware. That doesn't mean IdentityServer is not to blame - we won't know until we have metrics that show where the issue is. |
Beta Was this translation helpful? Give feedback.
-
I'll set metrics logging up, that will require some code changes to use the new OpenTelemetry packages. We noticed that those long running requests come in pairs within a range of 10 ms from the same client /.well-known/openid-configuration one of them finishes within the expected latency, the other is at 1 sec, as if there is a lock / resource contest. |
Beta Was this translation helpful? Give feedback.
-
(note: we're moving this issue to our new community discussions) |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
IdentityServer 7.2 was just released with a preview feature that can cache the output of the discovery endpoint. Can you please try that and see if it makes a difference? |
Beta Was this translation helpful? Give feedback.
-
@AndersAbel @maartenba After enabling more log categories, I was able to pin point this ~1 sec latency increase: It comes from the AspNet Core CORS Middleware. "Cache hit for https://xxx.ch" , origins from Category: Duende.IdentityServer.Services.DefaultCache I'll have a look at that code and see what it does. Update: We deployed a fix (Re-ordering Asp.Net CORS middleware in request pipeline and providing a AllowAll = true service for The 99P is not yet recovered but latency has pretty much flatlined. As we didn't deploy anything CORS related since february 2024, it wasn't obvious. We enabled a preflight cache to speed up the
That might have had a negative impact we didn't start seeing until late 2024. It is also documented as footnote: |
Beta Was this translation helpful? Give feedback.
-
@mathitharmalingam-aqi looks like this may also be something to look into on your end: https://github.com/orgs/DuendeSoftware/discussions/17#discussioncomment-12750421 |
Beta Was this translation helpful? Give feedback.
-
@martinescalantehughes I see you commented above, do you have a similar issue? As we are still working on investigating this it would be great if you could share some information about your setup and what issues you see. |
Beta Was this translation helpful? Give feedback.
@AndersAbel @maartenba After enabling more log categories, I was able to pin point this ~1 sec latency increase:
It comes from the AspNet Core CORS Middleware.
"Cache hit for https://xxx.ch" , origins from Category: Duende.IdentityServer.Services.DefaultCache
"No CORS policy found for the specified request.", origins from Categroy: Microsoft.AspNetCore.Cors.Infrastructure.CorsMiddleware
I'll have a look at that code and see what it does.
Update: We deployed a fix (Re-ordering Asp.Net CORS middleware in request pipeline and providing a AllowAll = true service for
DefaultCorsPolicyService
That fixed the issue:
The 99P is not yet recovered but latency has pretty much flatlined. As we didn'…