-
Notifications
You must be signed in to change notification settings - Fork 571
Define ServiceScopeConfig in ServiceSettings #3464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
/test all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like what we got general consensus on in https://docs.google.com/document/d/1Wg6sx9ZUJL4AsHj5wM1kMx3E436s5wg2qoMqoI-bqbQ/edit?tab=t.0 across multiple TOC and WG meetings so I'm approving
@@ -444,6 +444,47 @@ message MeshConfig { | |||
// | |||
// For example: foo.bar.svc.cluster.local, *.baz.svc.cluster.local | |||
repeated string hosts = 2; | |||
|
|||
// Scope configuration to be applied to matching services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the relation between this new setting and the existing hosts/settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that users set one or the other. A oneOf might be better to express that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point ServiceScopes
will only apply to ambient multicluster. In traditional multicluster today the default availability is global. For ambient multicluster, the default availability is local. For alpha, I think it is unnecessary to support ServiceScopes
for both configurations operating with different default availabilities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible there could be conflict where the host is foo.bar.svc.cluster.local and the namespace or service has istio.io/global label enabled?
Also, do we need to support both namespace and service level? what if they have conflicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if ServiceScopeConfig
is defined, even with a 0 value, the host should be ignored in ambient mode. @keithmattix was that your understanding?
Service selectors overwrite namespace selectors. This behavior should be consistent with the ambient enablement API
mesh/v1alpha1/config.proto
Outdated
// | ||
// ```yaml | ||
// serviceSettings: | ||
// serviceScopes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// serviceScopes: | |
// - serviceScopes: |
serviceSettings is a list of MeshConfig_ServiceSettings. Not entirely sure we need the double nested list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keithmattix do you have a preference? I am in favor of it not being double nested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm looking at the proto, I almost feel like ServiceScopes should just be a sibling to ServiceSettings. The other fields in service settings don't make sense for service scope, and eventually, I think the latter will supplant the former
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be harder to enforce oneof though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough...in that case yeah we don't want the extra nesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had another thought: I'm not sure if we can do oneOf because new versions of istiod reading old meshconfig would no longer be able to parse it correctly...
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
ad386cf
to
c9111e4
Compare
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
@louiscryan Could you please take a look? |
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
cc @stevenctl to review as well. @louiscryan is on vacation hopefully back soon! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a comment few days ago, would like it to be answered.
Some update from today's WG discussion: We felt the namespace label should not be the default. The default should be service label, e.g. only when the global label is used the service is going global. Advanced Istio mesh operator/admin can enable the namespace label as needed, which should also enable overwrite at the service level. |
Thanks for your comment! Was thinking this API change made sense to discuss in the working group meeting today with a broader audience. |
Summarizing the WG conversion and feedback:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One concern I have (with this and few other Istio APIs) is the use of a term like 'scope' or 'policy' in a very narrow, specific API. Almost nobody outside a small group with knowledge of Istio history will understand 'scope' to mean that requests for a service will be load balanced across multiple clusters. It's about the same with calling it 'policy'. Sure - it is a load balancing policy and you can think of it as a 'scope', but the term has far different semantics in normal use.
The concept of having a config that defines namespace and service selectors and attaches configs to it is great - but not sure how consistent it is - webhooks I believe use
namespaceSelector:
matchLabels:
namespace-label: namespace
but I have not seen service selectors.
I don't know what better name we can use instead of scope - but if the plan is to use the same pattern for other things, maybe a very narrow "MultiClusterLoadBalancing" would be easier to understand.
Thinking about it - the same service may be available in all clusters and have the same semantics/behavior. The only thing that is different for multi-cluster is that requests are load balanced to remote clusters - i.e. local load balancing versus global load balancing (multi-cluster is not entirely correct since we still support WorkloadEntry and endpoints not in a cluster). On the same topic - we may want to consider 'regional' semantics at some point, since many databases are regional - so a service that would break with global load balancing would still work great with regional LB ( still multi-cluster and not cluster-local ). |
As proposed, The API is designed in such a way to allow other scopes like regional in the future 👍🏾 |
The problem is - we have another API for locality and load balancing
policies.
You may call it 'scope' - but that's really what it does, balances traffic
for a service either to
same-cluster or other cluster endpoints. And it looks like it could support
regional and maybe
same zone (in multi-zone clusters) - that's overlapping with the locality
LB feature.
My main point is that the naming - and the description of the feature -
obscures the main function
of the API, which is to control load balancing. Yes, it solves our problem
of configuring 'global'
LB for some services and namespaces. Marking a service as 'local' does not
mean it can't
be called remotely (by a client who can get the IP).
Try asking your favourite LLM to describe 'service scope', and compare the
result with what
this feature does. Next - ask what 'global' means. The service is certainly
not 'global' - will
be load balanced to all the endpoints in the mesh, which in turn can be
just 2 clusters in
the same zone.
Naming and documentation are very important - the feature itself (allow
users to control
a certain aspect of load balancing at namespace/service level) is very
useful, introducing a new
concept of 'scope' and a new meaning for 'global' is not, and the evolution
of the API into
'regional', 'zonal' would conflict with other APIs. To be honest it may be
better than our current
LB API, so not against merging this PR - just concerned with naming and
docs.
…On Thu, May 1, 2025 at 8:17 PM Keith Mattix II ***@***.***> wrote:
*keithmattix* left a comment (istio/api#3464)
<#3464 (comment)>
As proposed, The API is designed in such a way to allow other scopes like
regional in the future 👍🏾
—
Reply to this email directly, view it on GitHub
<#3464 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2WS63GIIL7SD6RIACT24LPU3AVCNFSM6AAAAABYXQCKRCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNBWGIZDGNZQHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
In my mind, the existing locality features in DR control how the data plane is going to distribute traffic as a client. This API allows users to configure:
So one is client oriented and the other is service oriented. There's a proposal for another API to handle more client side features, and on that I definitely agree with you that we want to make sure there is harmony with existing load balancing features |
On Fri, May 2, 2025 at 8:28 AM Keith Mattix II ***@***.***> wrote:
*keithmattix* left a comment (istio/api#3464)
<#3464 (comment)>
In my mind, the existing locality features in DR control how the data
plane is going to distribute traffic as a client. This API allows users to
configure:
1. which remote services control planes will tell their proxies about
I don't really know what it means - and the docs don't explain this either.
What is a 'remote service' ? If I have service A in both cluster1 and
cluster2 - is A.namespace.svc.cluster.local a 'remote service' ?
The control plane in both cluster1 and cluster2 will obviously tell the
proxies about service A, regardless of the scope setting.
Only difference is that load balancing will send traffic for service A only
on the local cluster.
Even if service A is only in cluster1 - would be very dangerous to claim
that this setting will make it 'locally scoped' - since it can still
be accessed from cluster2. Users don't really care how we implement it or
what the control plane does - if we tell them the
service is locally scoped they will not understand it's just an internal
detail - but service can still be accessed from any other cluster.
1. What services are available through the east west gateway (if there
is one)
The E/W gateway ( if there is one) is also a topology/implementation
detail - the service owner or the admin in cluster1 doesn't need
to know if cluster2 has an E/W gateway or not. If cluster1 defines A as
'global' - it doesn't impact the E/W gateway in cluster2, only
the load balancing decisions in cluster A.
If the intent is to configure the E/W gateway in cluster2 - the setting
should be in cluster2 and apply to the gateway.
Again - using broad terms with a meaning that is very tied to the
implementation and a very specific interpretation is a big problem -
if 'scope' is for allowing access via E/W gateway ( but ignoring all other
access paths ) - people will be confused. Much better to use
a separate API and be very clear where it applies ( i.e. attached to /
specific to a particular gateway ).
I didn't even realize this API is for the gateway - most of the focus was
on the semantics of cluster.local.
So one is client oriented and the other is service oriented. There's a
proposal for another API to handle more client side features, and on that I
definitely agree with you that we want to make sure there is harmony with
existing load balancing features
I also don't know what 'service oriented' means and how is it different
from 'client oriented'. If I set 'scope' as 'global' in cluster1 - does it
impact the workloads in cluster 1 sending
traffic to cluster2 ? If yes, it's client (consumer) setting. A setting in
cluster1 can't impact the E/W gateway in cluster2.
Sending traffic for serviceA to both clusters is a client decision, and
service oriented (it applies to the clients in the cluster where the
setting is defined and impact the consumer side).
If this was intended as a producer side config - but narrowly scoped to the
E/W gateway behavior, without impacting the load balancing - it's even more
confusing.
… Message ID: ***@***.***>
|
Let me phrase it this way: service scope is effectively a more general ServiceExport API. Meaning, that it allows a mesh admin to configure a labeling scheme for app developers to opt into their service being routed to by services in other clusters. That's the fundamental role here; the rest (e.g what the control plane filters, what gets exposed in the e/w gateway) are all implementation details |
On Fri, May 2, 2025 at 11:05 AM Keith Mattix II ***@***.***> wrote:
*keithmattix* left a comment (istio/api#3464)
<#3464 (comment)>
Let me phrase it this way: service scope is effectively a more general
ServiceExport API. Meaning, that it allows a mesh admin to configure a
labeling scheme for app developers to opt into their service being routed
to by services in other clusters. That's the fundamental role here; the
rest (e.g what the control plane filters, what gets exposed in the e/w
gateway) are all implementation details
If it is a general ServiceExport - call it 'serviceExport instead of taking
a word that means many things (scope). ServiceExport is a producer setting
- and like all producer settings
it impacts the configs on the producer cluster, i.e. how the E/W gateway or
authz is done - while routing in Istio is usually a consumer setting. So I
still don't understand...
Are you saying that the labeling scheme in cluster1 would override or be
merged with the labeling in cluster 2 ? Anything that crosses the cluster
boundary has huge risks (security, reliability, etc) -
and needs to be extremely clearly documented, which doesn't seem the case
for this API.
My assumption was that this API will not cross the boundary - so a
ServiceScopeConfig in cluster1 would apply to cluster1, and impact how
workloads in cluster1 behave. If this is not the
case, please update the doc - and it's a much bigger problem than naming
and doc clarity.
We don't have any 'multicluster mesh admin' - Istio can have 1 or more
control planes per cluster, each with its own config impacting the control
plane ( revision ) - and not
impacting other control planes and certainly not other clusters. A naming
scheme defined in one control plane usually impacts that Istiod and cluster
- not other clusters.
The exceptions (WorkloadEntry) are very narrow and we spent a lot of time
and docs to get them to work - they impact security behavior and much more,
I didn't realize this
API was in this category.
Message ID: ***@***.***>
… |
@costinm Ah I see what you mean now. Yes, this API is a producer setting and a cluster 1's meshconfig will not have an affect on anything cluster2 does. That's a good callout on routing; I got my wires crossed on the control filtering point. We should not use this setting to do anything on the client cluster's side, you're correct |
Ok, so how does a client decide to route to local vs remote endpoints ? And as producer API, what does it do ? Prevent the service from getting called by remote clients? I doubt it can. Configure the east/west gateway to forward to the service(s) ? How about renaming this to some 'LabelConfig' and the field to 'ewGateway' or something clearly associated with what it does, and documenting it clearly on what it does, not vague 'global scope' |
Part of adding support for Ambient mulitcluster.
Based on https://docs.google.com/document/d/1Wg6sx9ZUJL4AsHj5wM1kMx3E436s5wg2qoMqoI-bqbQ/edit?tab=t.0#heading=h.nehszyorutrv
#3463