Skip to content

DAOS-16170 control: Ignore EngineDied event for old incarnation #16511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kjacque
Copy link
Contributor

@kjacque kjacque commented Jun 14, 2025

It is possible to be forwarded an EngineDied event late, after the engine has re-joined. This can incorrectly re-mark the rank as Errored.

  • Include incarnation in engine-related events.
  • Print incarnation in logs if provided.
  • Do not update member if engine died event is for old incarnation.

Features: control

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

It is possible to be forwarded an EngineDied event late, after the
engine has re-joined. This can incorrectly re-mark the rank as
Errored.

- Include incarnation in engine-related events.
- Print incarnation in logs if provided.
- Do not update member if engine died event is for old incarnation.

Features: control

Signed-off-by: Kris Jacque <kris.jacque@hpe.com>
@kjacque kjacque self-assigned this Jun 14, 2025
Copy link

Ticket title is 'recovery/cat_recov_core.py:CatRecovCoreTest.test_daos_cat_recov_core - server was not found in its expected state - 17 TEST(S) FAILED'
Status is 'In Progress'
Labels: 'ci_2.6_daily,ci_master_daily,daily_test,scrubbed_2.8'
Job should run at elevated priority (1)
https://daosio.atlassian.net/browse/DAOS-16170

@github-actions github-actions bot added the priority Ticket has high priority (automatically managed) label Jun 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority Ticket has high priority (automatically managed)
Development

Successfully merging this pull request may close these issues.

1 participant