Skip to content

Define ServiceScopeConfig in ServiceSettings #3464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

jaellio
Copy link
Contributor

@jaellio jaellio commented Mar 11, 2025

@istio-testing
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@istio-testing istio-testing added do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 11, 2025
@jaellio
Copy link
Contributor Author

jaellio commented Mar 11, 2025

/test all

@jaellio jaellio marked this pull request as ready for review March 11, 2025 18:25
@jaellio jaellio requested a review from a team as a code owner March 11, 2025 18:25
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Mar 11, 2025
@jaellio jaellio requested a review from keithmattix March 11, 2025 18:26
@jaellio jaellio requested a review from keithmattix March 12, 2025 21:51
Copy link
Contributor

@keithmattix keithmattix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like what we got general consensus on in https://docs.google.com/document/d/1Wg6sx9ZUJL4AsHj5wM1kMx3E436s5wg2qoMqoI-bqbQ/edit?tab=t.0 across multiple TOC and WG meetings so I'm approving

@@ -444,6 +444,47 @@ message MeshConfig {
//
// For example: foo.bar.svc.cluster.local, *.baz.svc.cluster.local
repeated string hosts = 2;

// Scope configuration to be applied to matching services.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the relation between this new setting and the existing hosts/settings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that users set one or the other. A oneOf might be better to express that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point ServiceScopes will only apply to ambient multicluster. In traditional multicluster today the default availability is global. For ambient multicluster, the default availability is local. For alpha, I think it is unnecessary to support ServiceScopes for both configurations operating with different default availabilities

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible there could be conflict where the host is foo.bar.svc.cluster.local and the namespace or service has istio.io/global label enabled?

Also, do we need to support both namespace and service level? what if they have conflicts?

Copy link
Contributor Author

@jaellio jaellio May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if ServiceScopeConfig is defined, even with a 0 value, the host should be ignored in ambient mode. @keithmattix was that your understanding?

Service selectors overwrite namespace selectors. This behavior should be consistent with the ambient enablement API

//
// ```yaml
// serviceSettings:
// serviceScopes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// serviceScopes:
// - serviceScopes:

serviceSettings is a list of MeshConfig_ServiceSettings. Not entirely sure we need the double nested list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keithmattix do you have a preference? I am in favor of it not being double nested

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm looking at the proto, I almost feel like ServiceScopes should just be a sibling to ServiceSettings. The other fields in service settings don't make sense for service scope, and eventually, I think the latter will supplant the former

Copy link
Contributor Author

@jaellio jaellio Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be harder to enforce oneof though...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough...in that case yeah we don't want the extra nesting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another thought: I'm not sure if we can do oneOf because new versions of istiod reading old meshconfig would no longer be able to parse it correctly...

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Apr 1, 2025
jaellio added 8 commits April 16, 2025 22:43
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
@jaellio jaellio force-pushed the jaellio/srvcapandexportapi branch from ad386cf to c9111e4 Compare April 16, 2025 23:39
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Apr 16, 2025
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 16, 2025
@jaellio jaellio removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 16, 2025
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 16, 2025
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
@jaellio
Copy link
Contributor Author

jaellio commented Apr 17, 2025

@louiscryan Could you please take a look?

@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 17, 2025
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
@jaellio jaellio requested review from linsun and costinm April 24, 2025 20:01
@linsun
Copy link
Member

linsun commented Apr 25, 2025

cc @stevenctl to review as well.

@louiscryan is on vacation hopefully back soon!

Copy link
Member

@linsun linsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a comment few days ago, would like it to be answered.

@linsun
Copy link
Member

linsun commented Apr 30, 2025

Some update from today's WG discussion: We felt the namespace label should not be the default. The default should be service label, e.g. only when the global label is used the service is going global. Advanced Istio mesh operator/admin can enable the namespace label as needed, which should also enable overwrite at the service level.

@jaellio
Copy link
Contributor Author

jaellio commented May 1, 2025

left a comment few days ago, would like it to be answered.

Thanks for your comment! Was thinking this API change made sense to discuss in the working group meeting today with a broader audience.

@jaellio
Copy link
Contributor Author

jaellio commented May 1, 2025

Summarizing the WG conversion and feedback:

  1. We need to define the default behavior when no selectors are defined in the ServiceScopeConfig (i.e. the 0 value ServiceScopeConfig)
    • There was some consensus that a 0 value ServiceScopeConfig was equivalent to an opt in service selector for the istio.io/global label. A workload/service is only considered global when the istio.io/global is set on the corresponding service otherwise it is local.
  2. Establish enablement consistency with the Ambient enablement and mutating webhook API
    • The Ambient enablement API defines selectors in an array of EnablementSelector tuples (either a namespace selector or a service selector)
    • This allows the selectors to be "OR"ed
  3. Define selector precedence and common potential conflict scenarios
    • Any defined service selectors overwrite any namespace selections
    • The service selectors are "OR"ed and the namespace selectors "OR"ed (TBD)
    • Allow opt out service and namespace selectors (TBD)
  4. Define most common usage patterns and their respective user roles for global ns/service enablement
    1. Explicit service opt-in - Pattern: Every service with this label should be global, User: App dev(?)
    2. Services global by default, opt out by namespace - Pattern: Every namespace except namespaces with this label are global, User: Admin
    3. Explicit namespace opt-in - Pattern: Every namespace with this label should global, User: Admin (?)
    4. Services global by default, opt out by namespace and opt out by service - Pattern: Every namespace except namespaces with this label are global. Every service in a global namespace except services with this label are considered global. Every service in a non global namespace except services with the label are considered not global, User: App dev
    5. Explicit namespace opt-in, opt out by service - Pattern: Every namespace with this label should global. Every service in a global namespace except services with this label are global, User: App dev
  5. Add more examples in the API documentation

@linsun, @louiscryan, @costinm, @keithmattix

Copy link
Contributor

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I have (with this and few other Istio APIs) is the use of a term like 'scope' or 'policy' in a very narrow, specific API. Almost nobody outside a small group with knowledge of Istio history will understand 'scope' to mean that requests for a service will be load balanced across multiple clusters. It's about the same with calling it 'policy'. Sure - it is a load balancing policy and you can think of it as a 'scope', but the term has far different semantics in normal use.

The concept of having a config that defines namespace and service selectors and attaches configs to it is great - but not sure how consistent it is - webhooks I believe use
namespaceSelector:
matchLabels:
namespace-label: namespace
but I have not seen service selectors.

I don't know what better name we can use instead of scope - but if the plan is to use the same pattern for other things, maybe a very narrow "MultiClusterLoadBalancing" would be easier to understand.

@costinm
Copy link
Contributor

costinm commented May 2, 2025

Thinking about it - the same service may be available in all clusters and have the same semantics/behavior. The only thing that is different for multi-cluster is that requests are load balanced to remote clusters - i.e. local load balancing versus global load balancing (multi-cluster is not entirely correct since we still support WorkloadEntry and endpoints not in a cluster).

On the same topic - we may want to consider 'regional' semantics at some point, since many databases are regional - so a service that would break with global load balancing would still work great with regional LB ( still multi-cluster and not cluster-local ).

@keithmattix
Copy link
Contributor

As proposed, The API is designed in such a way to allow other scopes like regional in the future 👍🏾

@costinm
Copy link
Contributor

costinm commented May 2, 2025 via email

@keithmattix
Copy link
Contributor

In my mind, the existing locality features in DR control how the data plane is going to distribute traffic as a client. This API allows users to configure:

  1. which remote services control planes will tell their proxies about
  2. What services are available through the east west gateway (if there is one)

So one is client oriented and the other is service oriented. There's a proposal for another API to handle more client side features, and on that I definitely agree with you that we want to make sure there is harmony with existing load balancing features

@costinm
Copy link
Contributor

costinm commented May 2, 2025 via email

@keithmattix
Copy link
Contributor

Let me phrase it this way: service scope is effectively a more general ServiceExport API. Meaning, that it allows a mesh admin to configure a labeling scheme for app developers to opt into their service being routed to by services in other clusters. That's the fundamental role here; the rest (e.g what the control plane filters, what gets exposed in the e/w gateway) are all implementation details

@costinm
Copy link
Contributor

costinm commented May 2, 2025 via email

@keithmattix
Copy link
Contributor

@costinm Ah I see what you mean now. Yes, this API is a producer setting and a cluster 1's meshconfig will not have an affect on anything cluster2 does. That's a good callout on routing; I got my wires crossed on the control filtering point. We should not use this setting to do anything on the client cluster's side, you're correct

@costinm
Copy link
Contributor

costinm commented May 2, 2025

Ok, so how does a client decide to route to local vs remote endpoints ?

And as producer API, what does it do ? Prevent the service from getting called by remote clients? I doubt it can. Configure the east/west gateway to forward to the service(s) ?

How about renaming this to some 'LabelConfig' and the field to 'ewGateway' or something clearly associated with what it does, and documenting it clearly on what it does, not vague 'global scope'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants