Skip to content

External Connectivity Loss During Flannel Pod Restart #2237

Open
@devasmith

Description

@devasmith

Description

When restarting the Flannel pod (via the rke2-canal pod in an RKE2 cluster), all rules in the FLANNEL-POSTRTG chain of the iptables NAT table are deleted. This results in a temporary loss of external connectivity for pods on the same node until the Flannel pod is fully restarted and the rules are reconciled.

This behavior disrupts workloads that rely on external connectivity and appears to be related to how Flannel handles rule reconciliation during startup.

Steps to Reproduce

  1. Start monitoring the FLANNEL-POSTRTG chain in the iptables NAT table:
    watch -n0.1 "iptables -t nat -L FLANNEL-POSTRTG -n -v"
  2. Run a continuous curl loop from a pod on the same node to an external endpoint (e.g., https://www.google.com) to monitor connectivity:
    while true; do curl -o /dev/null -s -w 'Establish Connection: %{time_connect}s  TTFB: %{time_starttransfer}s  Total: %{time_total}s\n\n' https://www.google.com; done
  3. Delete the Flannel pod (or rke2-canal pod in RKE2):
    kubectl delete pod -n kube-system <flannel-pod-name>
  4. Observe that all rules in the FLANNEL-POSTRTG chain are deleted until the Flannel pod is fully restarted.

Observed Behavior

  • During the restart of the Flannel pod, the FLANNEL-POSTRTG chain in the iptables NAT table is emptied.
  • External connectivity for pods on the same node is lost until the Flannel pod completes its startup and reconciles the rules.

Example of FLANNEL-POSTRTG Chain Before Restart:

Chain FLANNEL-POSTRTG (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match 0x4000/0x4000 /* flanneld masq */
   40  3000 RETURN     all  --  *      *       10.42.0.0/24         10.42.0.0/16         /* flanneld masq */
    0     0 RETURN     all  --  *      *       10.42.0.0/16         10.42.0.0/24         /* flanneld masq */
   13   780 RETURN     all  --  *      *      !10.42.0.0/16         10.42.0.0/24         /* flanneld masq */
    8   480 MASQUERADE  all  --  *      *       10.42.0.0/16        !224.0.0.0/4          /* flanneld masq */ random-fully
    0     0 MASQUERADE  all  --  *      *      !10.42.0.0/16         10.42.0.0/16         /* flanneld masq */ random-fully

Example of FLANNEL-POSTRTG Chain After Restart:

Chain FLANNEL-POSTRTG (1 references)
 pkts bytes target     prot opt in     out     source               destination

Expected Behavior

The FLANNEL-POSTRTG chain should not be emptied during the restart of the Flannel pod. Existing rules should remain intact to avoid disruption of external connectivity for pods.

Environment

Flannel Version: v0.26.5
RKE2 Version: v1.31.7~rke2r1
iptables Version: v1.8.10
OS: Rocky 9 (5.14.0-503.40.1.el9_5.x86_64)

Additional Context

This issue does not occur with other CNI implementations, such as Cilium, which uses a kube-proxy-free replacement and does not rely on iptables in the same way.

Reference issue: rancher/rke2#8151

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions