Skip to content

nbd: avoid race with udev during setup #1741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

michaelolbrich
Copy link
Contributor

@michaelolbrich michaelolbrich commented Jun 19, 2025

With the NBD_CFLAG_DISCONNECT_ON_CLOSE flag set, the nbd device will be
automatically removed when there are no more users after it was was
opened the first time.

The idea here is, that the nbd device is created, and then immediately
mounted (possibly with dm-verity or dm-crypt in between). Wenn the
bundle ist later unmounted then the nbd device is automatically removed.

Unfortunately this is racy: udev will see the new device and open it
immediately. If it closes the device before it is mounted (or dm-verity
or dm-crypt is configured) then the nbd device is removed immediately
and mounting will fail.

So instead create the nbd device without NBD_CFLAG_DISCONNECT_ON_CLOSE,
open it to keep it busy and then reconfigure the device to set
NBD_CFLAG_DISCONNECT_ON_CLOSE.
The file descriptor is kept until the bundle is mounted.

Note that with this change, the nbd device will not be fully removed if
rauc dies between creating the nbd device and reconfiguring it, or if
reconfiguring fails.

It's just a file descriptor to keep the device in use. There is nothing
loopback specific.
It will be reused in the nbd case so rename it to something more
generic.

Signed-off-by: Michael Olbrich <m.olbrich@pengutronix.de>
With the NBD_CFLAG_DISCONNECT_ON_CLOSE flag set, the nbd device will be
automatically removed when there are no more users after it was was
opened the first time.

The idea here is, that the nbd device is created, and then immediately
mounted (possibly with dm-verity or dm-crypt in between). Wenn the
bundle ist later unmounted then the nbd device is automatically removed.

Unfortunately this is racy: udev will see the new device and open it
immediately. If it closes the device before it is mounted (or dm-verity
or dm-crypt is configured) then the nbd device is removed immediately
and mounting will fail.

So instead create the nbd device without NBD_CFLAG_DISCONNECT_ON_CLOSE,
open it to keep it busy and then reconfigure the device to set
NBD_CFLAG_DISCONNECT_ON_CLOSE.
The file descriptor is kept until the bundle is mounted.

Note that with this change, the nbd device will not be fully removed if
rauc dies between creating the nbd device and reconfiguring it, or if
reconfiguring fails.

Signed-off-by: Michael Olbrich <m.olbrich@pengutronix.de>
Copy link

codecov bot commented Jun 19, 2025

Codecov Report

Attention: Patch coverage is 73.68421% with 5 lines in your changes missing coverage. Please review.

Project coverage is 84.50%. Comparing base (6eaf514) to head (b22fba9).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
src/nbd.c 64.28% 5 Missing ⚠️

❌ Your patch check has failed because the patch coverage (73.68%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1741      +/-   ##
==========================================
- Coverage   84.52%   84.50%   -0.02%     
==========================================
  Files          76       76              
  Lines       22350    22362      +12     
==========================================
+ Hits        18891    18898       +7     
- Misses       3459     3464       +5     
Flag Coverage Δ
service=false 81.00% <73.68%> (-0.02%) ⬇️
service=true 84.46% <73.68%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ejoerns
Copy link
Member

ejoerns commented Jun 19, 2025

Oh, an interesting finding! 👍 Haven't seen it myself yet in a system, but the problem description makes sense to me.

Maybe we should also add a small comment in the code on why the open and nbd reconfiguration is required?

@jluebbe jluebbe self-requested a review June 23, 2025 09:53
@jluebbe jluebbe added this to the Release v1.15 milestone Jun 23, 2025
Copy link
Member

@jluebbe jluebbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this introduce dependencies on specific kernel versions?

@@ -203,6 +200,27 @@ gboolean r_nbd_setup_device(RaucNBDDevice *nbd_dev, GError **error)

nbd_dev->dev = g_strdup_printf("/dev/nbd%"G_GUINT32_FORMAT, nbd_dev->index);

*devicefd = open(nbd_dev->dev, O_RDWR|O_CLOEXEC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps open read-only?

res = FALSE;
g_set_error(error, G_IO_ERROR, G_IO_ERROR_FAILED, "netlink send_sync failed");
goto out;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to wait/process the response here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants