Skip to content

DAOS-17628 bio: flush WAL header before unmap #16478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

NiuYawei
Copy link
Contributor

@NiuYawei NiuYawei commented Jun 5, 2025

In bio_wal_checkpoint(), we shouldn't unmap the checkpoint-ed region before flushing WAL header (to make the last checkpoint-ed ID persistent), otherwise, if the engine is interrupted in between the unmap and flush, next WAL replay on engine start will replay from the stale checkpoint-ed ID where the WAL tx data is already cleared by unmap.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

In bio_wal_checkpoint(), we shouldn't unmap the checkpoint-ed region
before flushing WAL header (to make the last checkpoint-ed ID
persistent), otherwise, if the engine is interrupted in between the
unmap and flush, next WAL replay on engine start will replay from
the stale checkpoint-ed ID where the WAL tx data is already cleared
by unmap.

Signed-off-by: Niu Yawei <yawei.niu@hpe.com>
@NiuYawei NiuYawei requested review from a team as code owners June 5, 2025 03:08
Copy link

github-actions bot commented Jun 5, 2025

Ticket title is 'CP: N0001 engines up, but pool service lost'
Status is 'In Progress'
Labels: 'request_for_2.6.4'
https://daosio.atlassian.net/browse/DAOS-17628

Copy link
Contributor

@liw liw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonably to me (I don't know this code well).

@daosbuild3
Copy link
Collaborator

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16478/1/execution/node/1401/log

@daosbuild3
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16478/1/execution/node/1345/log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants