You can write your custom CNI plugin for k8s -
Every CNI has usually a binary and a daemon, binary - create Pod NIC and act as IPAM, where as daemon - adds routing / iptables rules on host to manage pod-pod communication
Create VM snapshot (using Centos7 minimal) which has below binaries and configs.
Install jq and bridge-utils on centos
yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum -y install jq
yum -y install bridge-utils wget
systemctl disable firewalld && systemctl stop firewalld
/etc/selinux/config
SELINUX=permissive
Configure repo
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://yum.kubernetes.io/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
yum install -y docker kubelet kubeadm kubectl kubenetes-cni
systemctl enable docker && systemctl start docker
systemctl enable kubelet && systemctl start kubelet
sysctl -w net.bridge.bridge-nf-call-iptables=1
echo "net.bridge.bridge-nf-call-iptables=1" > /etc/sysctl.d/k8s.conf
swapoff -a && sed -i '/ swap / s/^/#/' /etc/fstab
- Copy custom-cni to /opt/cni/bin/custom-cni
Typically any CNI script has ADD / DEL verbs
Below script is to create veth on host for every pod and assign Pod IPs based on Pod CIDR range of each node and link that to a virtual bridge
Script contains -
a) Create a bridge on node
b) Script to generate IP from POD CIDR range provided in kubeadm
c) Create veth on host (node) which will be mapped to pod eth0
d) Assign above veth nic to network namespace created by kubelet (using Docker runtime)
e) Assign IP address to network namespace (pod) this is IPAM
You can see CNI logs as configured in the script at this place => /var/log/cni.log
NOTE - There is a custom script to generate IP of POD (acting as IPAM), which stores a file in /tmp/ and increment by +1 everytime pod is created
- Copy 10-custom-cni.conf to /etc/cni/net.d/10-custom-cni.conf
Change pod cidr range on every node (Eg Node1 = 10.240.0.0/24 , Node2 = 10.240.1.0/24)
- Initiate k8s cluster setup on master using kubeadm
kubeadm init --pod-network-cidr=10.240.0.0/24 (Note this Pod CIDR range is for entire cluster and should cover node specific pod CIDR range configured in 10-custom-cni.conf file /24 range )
So below are pod CIDR ranges configured -
Node1 (10.240.1.0/24)
Node2 (10.240.0.0/24)
This is to show packet flow (actual data) from Pod to Pod (on same host and different hosts)
kubectl apply -f pods.yaml
two pods will land on Node2 (alpine and nginx1)
one pod will land on Node1 (nginx2)
- ADD static iptable rules to enable Pod to Pod communication (on same host)
Eg - Add On Node2
eth0 (in Pod A’s netns) → vethA → br0 → vethB → eth0 (in Pod B’s netns)
iptables -A FORWARD -s 10.240.0.0/24 -j ACCEPT (/24 entire node pod cidr)
iptables -A FORWARD -d 10.240.0.0/24 -j ACCEPT
Forward chain is responsible for allowing such chain
kubectl exec alpine1 -- wget -qO- 10.240.0.8 (Pod to Pod on same node, assuming alpine and nginx1 is running on Node2)
Similarly add on Node1
iptables -A FORWARD -s 10.240.1.0/24 -j ACCEPT (/24 entire node pod cidr)
iptables -A FORWARD -d 10.240.1.0/24 -j ACCEPT
- ADD route to Allow communication across hosts
Ideal flow from CNI plugin, since I don't have vxlan (IPIP encapsulation) so I have used source nat (MASQUERADE) which does same thing
eth0 (in Pod A’s netns) → vethA → br0 → vxlan0 → physical network [underlay] → vxlan0 → br0 → vethB → eth0 (in Pod C’s netns)
ip route add 10.240.1.0/24 via 10.0.2.14 dev enp0s3 (add in Node2)
Above command is good for baremetal / local VMs (VBox), but for Cloud Providers like GCP you need to setup custom routes at Layer2
gcloud compute routes create k8s-node2 --destination-range 10.240.1.0/24 --network k8s --next-hop-address 10.0.2.14
Similarly Azure has User Defined Routes and AWS has Route Tables
Route any packet for node1 podcidr (10.240.1.0/24) to node1 ip via device enp0
ip route add 10.240.0.0/24 via 10.0.2.15 dev enp0s3 (add in Node1)
On GCP
gcloud compute routes create k8s-node1 --destination-range 10.240.0.0/24 --network k8s --next-hop-address 10.0.2.15
Route any packet for node2 podcidr (10.240.0.0/24) to node2 ip via device enp0
- Allow outgoing internet from Pods by adding NAT rule in iptables
On Node2
Below is the flow from Pod A to get Internet, so I have added POSTROUTING nat rule
eth0 (in Pod A’s netns) → vethA → br0 → (NAT) → eth0 (physical device) → Internet
iptables -t nat -A POSTROUTING -s 10.240.0.0/24 ! -o cni0 -j MASQUERADE
Add in POSTROUTING chain (last chain) to evaluate outgoing packet, and need to add in linux NAT table and making sure only those packet which are not going out to cni bridge
MASQUERADE meaning apply to source nat (replace source IP with host IP)
kubectl exec alpine1 -- ping 8.8.8.8 (Assuming alpine pod running on node2)
Kubernetes Services are not part of CNI plugin and managed natively via kube-proxy so if Pod to Pod communication sorted then Services > Pod communication just work straight away, below Service IP will be reachable from both Nodes (Hosts / Pods)
kubectl label pods nginx2 app=nginx2
kubectl expose pod nginx2 --name=nginx2 --port=8000 --target-port=80
iptables -S FORWARD (to see FORWARD chain)
iptables -t nat -L KUBE-SERVICES (to see kube-proxy -s and -d nating)
Using tshark to see source and destination of packet along with protocol
tshark -i cni0 -T fields -e ip.src -e ip.dst -e frame.protocols -E header=y
For ptp binary and using host-local IPAM
ip netns add demo (Create a network namespace on your host)
When you install kubernetes-cni YUM package it adds lots of default CNI binaries at this location /opt/cni/bin/ . So for this example I copied ptp and host-local to /root/cni/ and created a config file for my CNI plugin
/root/cni/conf << EOF
{
"cniVersion": "0.4.0",
"name": "democni",
"type": "ptp",
"ipam": {
"type": "host-local",
"subnet": "10.20.0.0/24"
}
}
EOF
Run below commands and watch route in demo netns using exec command (ip netns exec demo route -n)
CNI_COMMAND=VERSION ./ptp < conf (To find CNI versions supported by binary)
CNI_COMMAND=ADD CNI_NETNS=/var/run/netns/demo CNI_IFNAME=demoeth0 CNI_PATH=/root/cni CNI_CONTAINERID=12345678 ./ptp < conf
CNI_COMMAND=DEL CNI_NETNS=/var/run/netns/demo CNI_IFNAME=demoeth0 CNI_PATH=/root/cni CNI_CONTAINERID=12345678 ./ptp < conf
I did similar work here which is confined to only container network - https://github.com/ronak-agarwal/rocker
Setup k8s on Centos7 - https://medium.com/@genekuo/setting-up-a-multi-node-kubernetes-cluster-on-a-laptop-69ae3e3d0f7c
https://events19.linuxfoundation.org/wp-content/uploads/2018/07/Packet_Walks_In_Kubernetes-v4.pdf
https://www.stackrox.com/post/2020/01/kubernetes-networking-demystified/