-
Notifications
You must be signed in to change notification settings - Fork 318
DAOS-17534 dtx: race between DTX aggregation and container close #16504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Ticket title is 'EMRG src/dtx/dtx_common.c:720 dtx_batched_commit() Assertion '!dbca->dbca_deregister' failed' |
a912f53
to
49281b0
Compare
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16504/5/execution/node/1449/log |
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16504/5/execution/node/1468/log |
49281b0
to
3fc3bbf
Compare
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16504/6/execution/node/1413/log |
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16504/6/execution/node/1399/log |
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16504/6/testReport/ |
dtx_aggregation_pool() logic may yield because of sched_req_put(). Then someone may close related container during the yield. If DTX aggregation logic does not check the race with close before adding the container back to the DTX aggregation list (per pool), then it may trigger assertion of "D_ASSERT(!dbca->dbca_deregister)" during subsequent DTX batched commit or DTX aggregation process. On the other hand, DTX aggregation logic needs to hold reference on the dbca structure to avoid being freed during DTX aggregation. Signed-off-by: Fan Yong <fan.yong@hpe.com>
3fc3bbf
to
8227c86
Compare
dtx_aggregation_pool() logic may yield because of sched_req_put(). Then someone may close related container during the yield. If DTX aggregation logic does not check the race with close before adding the container back to the DTX aggregation list (per pool), then it may trigger assertion of "D_ASSERT(!dbca->dbca_deregister)" during subsequent DTX batched commit or DTX aggregation process.
On the other hand, DTX aggregation logic needs to hold reference on the dbca structure to avoid being freed during DTX aggregation.
Steps for the author:
After all prior steps are complete: