Skip to content

Network policies #2294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Network policies #2294

wants to merge 2 commits into from

Conversation

ormergi
Copy link
Contributor

@ormergi ormergi commented May 19, 2025

What this PR does / why we need it:
In a scenario where a cluster networking has been hardened and futures a default deny-all network policy that affects CNAO operator pods, it will not be able to operate properly.

This PR introduce network-policies for CNAO operator allowing it to operate, complying with restrictions in form of deny all network-policy.

Special notes for your reviewer:
In order to have coverage for the introduced network-policies, and additional deny-all network-policies is installed as part of the cluster-sync flow.
Having a deny-all network-policy installed, allow the introduced network-policies to be tested in a development and test environments, locally and CI.

Since not all CNAO component provide network-policy, the deny-all network policy is configured to affect CNAO operator pods only.
Once all relevant components provide network-policies, we can change the deny-all network-policy to affect all pods in CNAO namespace.

The following work will be done in follow-up PRs:

  • Network-policies for CNAO components will be done in follow-up PRs, including kubemacpool, kubesecondarydns and kubvirt-ipam-controller.
  • Adjust CNAO (operator) network-policies to support OCP.

Release note:

Introduce network-policies allowing CNAO to operate in restricted env

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels May 19, 2025
@kubevirt-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign oshoval for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot requested review from oshoval and phoracek May 19, 2025 07:54
@ormergi ormergi force-pushed the network-policies branch 5 times, most recently from 3299e75 to f42238c Compare May 21, 2025 11:58
@ormergi
Copy link
Contributor Author

ormergi commented May 21, 2025

Changes: rebase

@@ -254,6 +269,9 @@ func main() {
kubevirtIpamControllerImage := flag.String("kubevirt-ipam-controller-image", components.KubevirtIpamControllerImageDefault, "The kubevirtipamcontroller-image managed by CNA")
dumpOperatorCRD := flag.Bool("dump-crds", false, "Append operator CRD to bottom of template. Used for csv-generator")
inputFile := flag.String("input-file", "", "Not used for csv-generator")
clusterDNSNamespace := flag.String("cluster-dns-namespace", "kube-system", "The namespace name where the cluster DNS endpoint reside")
Copy link
Collaborator

@oshoval oshoval May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

i am against it, as discussed offline

we have IsOpenShift all over for binary values (values that can be either k8s set of values or OpenShift)
please see render functions

this kind of change involves more repositories and makes the code less self contained, and then we have 2 ways of doing stuff we are doing for long time in one way

EDIT - fwiw i might have more comments, but imho this issue should first reach agreement this way or other please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this part for now, so that this PR only adds network-policies for CNAO.
Supporting other platformes (OCP) will be done in a follow up.
PTAL

Copy link
Collaborator

@oshoval oshoval May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks,

btw about the previous comment, cnao must stay dual as standalone
having that other method require a layer that will customize it, which also means this is undesired
for example either cluster-sync or even worse, deploy via manifests would stop being dual

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 21, 2025
@ormergi ormergi force-pushed the network-policies branch from f42238c to b01b4bd Compare May 21, 2025 12:41
@oshoval
Copy link
Collaborator

oshoval commented May 21, 2025

/hold cancel

will add it to my Q, i do have comments please

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 21, 2025
@oshoval
Copy link
Collaborator

oshoval commented May 21, 2025

some of the comments here kubevirt/kubesecondarydns#99 affect this PR as well, would you kindly consider them ?

if required i will amend them here, but once i reach it

i would even say they affect the other PRs potentially, as most of those have stuff in common

"k8s.io/utils/ptr"
)

func GetNetworkPolices(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets amend please templates/cluster-network-addons/VERSION/operator.yaml.in
with this manifest
no need to go gen it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oshoval why do you want to use template.in instead of the components.go? Mostly curious.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manifests are easier to maintain and read,
Maybe it should exists on other manifest yaml file if thats what you mean, the interesting thing here is that it should be in a manifest imo

fwiw i didnt finish to review yet
need to check for example if we need to make it dual (k8s / OpenShift), and then to think deeper, unless it is already dual

@oshoval
Copy link
Collaborator

oshoval commented May 21, 2025

Since not all CNAO component provide network-policy, the deny-all network policy is configured to affect CNAO operator pods only.
Once all relevant components provide network-policies, we can change the deny-all network-policy to affect all pods in CNAO namespace.

maybe just add it at the end at once ?
stuff might change in between in case we find adjustments
meanwhile we will be protected on each repo by its own
and at the end wrap it on CNAO itself

EDIT - you can keep it as now as well, but indeed best to add the rest at once as you wrote and not step by step

"k8s.io/utils/ptr"
)

func GetNetworkPolices(
Copy link
Collaborator

@oshoval oshoval May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that now we install it always, we need a condition please for OpenShift as there it is not relevant with those values, else it means we always deploy something that doesnt fit

the right way is to not just have if OpenShift then not install, but to just install the right one please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this code following this comment #2294 (comment)

@ormergi ormergi force-pushed the network-policies branch from b01b4bd to 1ad782d Compare May 21, 2025 14:23
@ormergi
Copy link
Contributor Author

ormergi commented May 21, 2025

some of the comments here kubevirt/kubesecondarydns#99 affect this PR as well, would you kindly consider them ?

if required i will amend them here, but once i reach it

Please post your comment here so discussion will take place in this PR changes context.

i would even say they affect the other PRs potentially, as most of those have stuff in common

If you meant for adding network-policy in kubemacpool, and ipam-ext they are already merged.

@ormergi
Copy link
Contributor Author

ormergi commented May 21, 2025

Since not all CNAO component provide network-policy, the deny-all network policy is configured to affect CNAO operator pods only.
Once all relevant components provide network-policies, we can change the deny-all network-policy to affect all pods in CNAO namespace.

maybe just add it at the end at once ? stuff might change in between in case we find adjustments meanwhile we will be protected on each repo by its own and at the end wrap it on CNAO itself

EDIT - you can keep it as now as well, but indeed best to add the rest at once as you wrote and not step by step

I think we better have the deny-all np (affecting cnao pods) installed, it prevents the next dev to remove or change the network-policies in a way it wont work as expected.
Once we have all components consumed in this repo, the deny-all selector can be removed add affect all the pods in CNAO namespace.

@ormergi
Copy link
Contributor Author

ormergi commented May 21, 2025

Changes: resolve #2294 (review) #2294 (review)
@oshoval PTAL

@ormergi ormergi force-pushed the network-policies branch from 1ad782d to 8c6ced7 Compare May 21, 2025 14:49
@ormergi
Copy link
Contributor Author

ormergi commented May 21, 2025

Changes: rebase

@ormergi
Copy link
Contributor Author

ormergi commented May 21, 2025

/tes pull-e2e-cluster-network-addons-operator-monitoring-k8s

@ormergi
Copy link
Contributor Author

ormergi commented May 21, 2025

Changes: simplify install deny-all network-policy script

- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
Copy link
Collaborator

@oshoval oshoval May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is problematic, it wont be dual right ?
same on L59 and whatever parameters that need to be dual

(this is partial review, but it seems the PR is not ready yet as it is now if so)

as long as there is no chicken and egg, best that CNAO will be able to be dualic and wont need any other repo for that,
but worth to understand if CNAO can create this NP, or someone else that is already whitelisted should, because CNAO cant do anything before it is created and then we can think deeper
In such case i would say best to do as done with KSD, but didnt think deeper yet (EDIT - not needed here, because it is not API, but rather static automated detection)
another option in case of deadlock is pretty like you wanted, but with one parameter, as we done with allow- multus param (but then need to check deeper on easy dual deployment via manifests as good as possible please)

Copy link
Contributor Author

@ormergi ormergi May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is problematic, it wont be dual right ?
same on L59 and whatever parameters that need to be dual

Yep, those manifests must have placeholders to allow the templator to set the right values, to make these manifest work in other platforms. Similar to previous iteration but more simple.

ormergi added 2 commits May 22, 2025 14:58
In a scenario where a cluster networking has been hardened and futures
a default deny-all network policy that affects KSD namespace,
CNAO will not be able to operate properly.

Provide network-policies in order to comply with such restrictions,
it should allow CNAO operate when a deny-all network-policy is in place.

This commit add the following network-polices:
1. Allow egress to cluster DNS
   Enable interaction with cluster DNS, w/o this policy the CNAO operator
   cannot establish connection with the cluster API.
   Affect pods with name=cluster-network-addons-operator label.
2. Allow egress to kube-apiserver
   Enable interaction with the cluster API, w/o this policy CNAO
   operator cannot r/w VMs/Pods etc...
   Affect pods with name=cluster-network-addons-operator label.
3. Allow ingress to metrics endpoint
   Enable interacting with CNAO metric endpoint, w/o this policy its
   impossible to collect CNAO metrics.
   Affect pods with name=cluster-network-addons-operator label.
   The metric number is taken from the selected pod "metrics" port
   number (8443).

This commit includes generated files following 'make gen-manifests'.

Testing the network-policies will be done in follow-up commit.

Signed-off-by: Or Mergi <ormergi@redhat.com>
The project network-policies (NP) are installed as part of cluster-sync flow.
In order to have coverage for these network-policies, the tested env should
simulate network restrictions, such as deny-all network-policy.

Install deny-all network-policy as part of the cluster-sync flow, blocking
egress/ingress traffic affecting CNAO operator pods.

Having the deny-all NP installed as part of the sync flow enable
coverage both in development env and tests, locally and CI.

Once all components provider network-policies the installed NP can be
set to affect all pods in CNAO namespace.

Signed-off-by: Or Mergi <ormergi@redhat.com>
@ormergi
Copy link
Contributor Author

ormergi commented May 22, 2025

Changes: resolve #2294 (comment)

  • Set placeholders in network-policies, and have the templator fill them.

@oshoval @RamLavi PTAL

Copy link

Comment on lines +274 to +278
ClusterDNSPlacement: components.ClusterDNSPlacement{
Namespace: *clusterDNSNamespace,
LabelKey: *clusterDNSLabelKey,
LabelValue: *clusterDNSLabelValue,
},
Copy link
Collaborator

@oshoval oshoval May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before we continue with this approach

Do we know there is chicken and egg issue for sure ?
is there any way to fix it nicely if so (Render style, without external params as this ones), not sure but maybe you can think of

If chicken and egg issue is real:

  1. We don't need 3 params, one is enough, isOpenShift, and internally the one that consume those vars can set the needed fields
  2. Need to think please about additional manifest to release on releases page,
    maybe one for OpenShift, because the one we release now will have the default values.
  3. Need to make sure cluster-sync can get a parameter that allow easily to deploy it on KCLI and such please

(partial review)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another option is that on OpenShift, HCO is the one that need to deploy the NP, not CNAO
Please check out about this option
it will reduce lot of hassle, and important to consider

cc @RamLavi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know there is chicken and egg issue for sure ? is there any way to fix it nicely if so (Render style, without external params as this ones), not sure but maybe you can think of

If chicken and egg issue is real:

1. We don't need 3 params, one is enough, isOpenShift, and internally the one that consume those vars can set the needed fields

The templator tool run locally and not as part of the controller in runtime, it doesnt interact with the cluster-api, its merely generate yamls.

2. Need to think please about additional manifest to release on releases page,
   maybe one for OpenShift, because the one we release now will have the default values.

It seem inevitable to introduce additional manifest for OCP.

3. Need to make sure cluster-sync can get a parameter that allow easily to deploy it on KCLI and such please

In case we have an additional manifest for OCP, we can introduce a parameter to cluster-sync to make it work and deploy the OCP manifest.

The concerns you raise in (2) and (3) are true for any project that need to provider network-policies and support OCP (due to the fact that the manifest containers env specific details).

I hope we can address (2) and (3) in a follow up PR.

Copy link
Collaborator

@oshoval oshoval May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not rush it
HCO can be a valid option, and not sure we can postpone stuff to follow-up (it is a regression and i am against it), but step by step please
if we see it is not valid, we can back to discuss this option deeper please (if have quite few things to tell about them if we reach it)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @nunnatsa we might need to discuss this deeper please

Copy link
Collaborator

@oshoval oshoval May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold
we need to make sure about all the options and their impact
EDIT - design doc might help here

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 22, 2025
@oshoval
Copy link
Collaborator

oshoval commented May 26, 2025

Since the NP is required for OpenShift best practices
We can omit it from the manifests on the upstream release page.

This solves one of the major problems, allowing CNAO to stay independent repository and easy deployable in both modes.
We can just document how to generate those in case needed (or to have a yaml example in one of the docs).

We still need to take care about cluster-sync, and so on of course.

@ormergi ormergi marked this pull request as draft June 12, 2025 11:29
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants