Skip to content

DAOS-17661 control: Maintain hugepage allocations with nvme-rebind #16493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: tanabarr/control-hugemem-no-fragment
Choose a base branch
from

Conversation

tanabarr
Copy link
Contributor

@tanabarr tanabarr commented Jun 9, 2025

The dmg storage nvme-rebind command can be used when, during non-VMD
hotplug, a "new" SSD is hot-plugged into a slot that previously
contained a faulty SSD. Errors related to creating a new SPDK I/O
channel on dmg storage replace nvme have been attributed to the
inadvertent shrinking of SPDK hugepage kernel allocations during the
nvme-rebind call. This change addresses the problem by maintaining the
number of hugepages allocated during nvme-rebind.

Features: control

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@tanabarr tanabarr requested review from a team as code owners June 9, 2025 20:35
Copy link

github-actions bot commented Jun 9, 2025

Ticket title is 'Command to rebind NVMe SSD to userspace driver shrinks hugepage allocation'
Status is 'In Review'
Labels: 'SPDK,hotplug'
https://daosio.atlassian.net/browse/DAOS-17661

@tanabarr tanabarr requested review from mjmac, kjacque and knard38 June 9, 2025 20:58
@tanabarr tanabarr self-assigned this Jun 9, 2025
@tanabarr tanabarr added control-plane work on the management infrastructure of the DAOS Control Plane go Pull requests that update Go code labels Jun 9, 2025
@tanabarr tanabarr force-pushed the tanabarr/control-hugemem-no-fragment branch 2 times, most recently from 784e865 to a9b320d Compare June 12, 2025 12:08
@tanabarr tanabarr force-pushed the tanabarr/control-hugemem-no-fragment branch from a9b320d to a1bda37 Compare June 13, 2025 19:59
Features: control
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr force-pushed the tanabarr/control-nvmerebind-hugepages-fix branch from 27734b3 to d687fab Compare June 13, 2025 20:09
@daosbuild3
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16493/2/execution/node/1382/log

@daosbuild3
Copy link
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16493/2/execution/node/1337/log

@daosbuild3
Copy link
Collaborator

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16493/2/execution/node/1427/log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
control-plane work on the management infrastructure of the DAOS Control Plane go Pull requests that update Go code
Development

Successfully merging this pull request may close these issues.

4 participants