Description
Description
When restarting the Flannel pod (via the rke2-canal
pod in an RKE2 cluster), all rules in the FLANNEL-POSTRTG
chain of the iptables
NAT table are deleted. This results in a temporary loss of external connectivity for pods on the same node until the Flannel pod is fully restarted and the rules are reconciled.
This behavior disrupts workloads that rely on external connectivity and appears to be related to how Flannel handles rule reconciliation during startup.
Steps to Reproduce
- Start monitoring the
FLANNEL-POSTRTG
chain in theiptables
NAT table:watch -n0.1 "iptables -t nat -L FLANNEL-POSTRTG -n -v"
- Run a continuous curl loop from a pod on the same node to an external endpoint (e.g., https://www.google.com) to monitor connectivity:
while true; do curl -o /dev/null -s -w 'Establish Connection: %{time_connect}s TTFB: %{time_starttransfer}s Total: %{time_total}s\n\n' https://www.google.com; done
- Delete the Flannel pod (or rke2-canal pod in RKE2):
kubectl delete pod -n kube-system <flannel-pod-name>
- Observe that all rules in the
FLANNEL-POSTRTG
chain are deleted until the Flannel pod is fully restarted.
Observed Behavior
- During the restart of the Flannel pod, the
FLANNEL-POSTRTG
chain in the iptables NAT table is emptied. - External connectivity for pods on the same node is lost until the Flannel pod completes its startup and reconciles the rules.
Example of FLANNEL-POSTRTG
Chain Before Restart:
Chain FLANNEL-POSTRTG (1 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0 mark match 0x4000/0x4000 /* flanneld masq */
40 3000 RETURN all -- * * 10.42.0.0/24 10.42.0.0/16 /* flanneld masq */
0 0 RETURN all -- * * 10.42.0.0/16 10.42.0.0/24 /* flanneld masq */
13 780 RETURN all -- * * !10.42.0.0/16 10.42.0.0/24 /* flanneld masq */
8 480 MASQUERADE all -- * * 10.42.0.0/16 !224.0.0.0/4 /* flanneld masq */ random-fully
0 0 MASQUERADE all -- * * !10.42.0.0/16 10.42.0.0/16 /* flanneld masq */ random-fully
Example of FLANNEL-POSTRTG
Chain After Restart:
Chain FLANNEL-POSTRTG (1 references)
pkts bytes target prot opt in out source destination
Expected Behavior
The FLANNEL-POSTRTG
chain should not be emptied during the restart of the Flannel pod. Existing rules should remain intact to avoid disruption of external connectivity for pods.
Environment
Flannel Version: v0.26.5
RKE2 Version: v1.31.7~rke2r1
iptables Version: v1.8.10
OS: Rocky 9 (5.14.0-503.40.1.el9_5.x86_64)
Additional Context
This issue does not occur with other CNI implementations, such as Cilium, which uses a kube-proxy-free replacement and does not rely on iptables in the same way.
Reference issue: rancher/rke2#8151