Skip to content

ExaBGP restart & reload race condition #1172

@koef

Description

@koef

Hello ExaBGP Team,
Firstly, I'd like to express my appreciation for your exceptional product.

We utilize Ansible for installing and configuring ExaBGP in our setups. Below is our Ansible 'exabgp' role:

---
- name: Install packages
  ansible.builtin.apt:
    name:
      - exabgp
    state: present

- name: Configure ExaBGP
  ansible.builtin.template:
    src: exabgp.conf.j2
    dest: /etc/exabgp/exabgp.conf
    mode: 0644
  notify: Reload ExaBGP

- name: Enable and start ExaBGP
  ansible.builtin.systemd:
    name: exabgp
    enabled: true
    state: started

And here's the handler 'Reload ExaBGP':

---
- name: Reload ExaBGP
  ansible.builtin.systemd:
    name: exabgp
    state: reloaded
    enabled: true

Unfortunately, we've noticed an issue. Our monitoring system detected that 'exabgp.service' was restarted: "Systemd's exabgp.service restarted 1 times on node1."

This issue can be reproduced using the 'systemctl restart exabgp && systemctl reload exabgp' command. On my Ubuntu 22.04, the result is as follows:

# systemctl restart exabgp && systemctl reload exabgp
Job for exabgp.service failed because a fatal signal was delivered to the control process.
See "systemctl status exabgp.service" and "journalctl -xeu exabgp.service" for details.

The journal log provides this information:

Aug 07 13:57:38 node1 systemd[1]: Starting ExaBGP...
Aug 07 13:57:38 node1 systemd[1]: Started ExaBGP.
Aug 07 13:57:38 node1 systemd[1]: Reloading ExaBGP...
Aug 07 13:57:38 node1 systemd[1]: Reloaded ExaBGP.
Aug 07 13:57:38 node1 systemd[1]: exabgp.service: Main process exited, code=killed, status=10/USR1
Aug 07 13:57:38 node1 systemd[1]: exabgp.service: Failed with result 'signal'.
Aug 07 13:57:38 node1 systemd[1]: exabgp.service: Scheduled restart job, restart counter is at 1.
Aug 07 13:57:38 node1 systemd[1]: Stopped ExaBGP.
Aug 07 13:57:39 node1 systemd[1]: Starting ExaBGP...
Aug 07 13:57:39 node1 systemd[1]: Started ExaBGP.
...

From the logs, the issue arises because exabgp doesn't have sufficient time to start before systemd sends 'USR1'.

To address this, we applied the workaround by overriding the default exabgp unit:

cat /etc/systemd/system/exabgp.service.d/override.conf
[Service]
ExecStartPost=/bin/sleep 2

We'd appreciate you letting us know if there's a better solution.

To Reproduce

Steps to reproduce the behavior:
systemctl restart exabgp && systemctl reload exabgp

Expected behavior

Reload command sent immediately after restarting the service doesn't lead to failing and restarting the service a second time.

Environment:

  • OS: Ubuntu 22.04.3 LTS
  • Version 4.2.17

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions