Skip to content

Make accelerator fragmentation option configurable #4478

Closed
@kyujin-cho

Description

@kyujin-cho

Motivation  

  • We need this to ensure atomicity of accelerator’s hardware perspective.

Required Features

  • Add new field under scaling_group.scheduler_opts JSON column indicating whether to allow fragmentation of accelerators supporting fractional shares
    • This should be an opt-in feature
  • Check the newly created field on agent-side accelerator scheduling process and cancel the kernel creation process if there isn’t enough resource to avoid resource fragmentation

Impact  

  • Administrators will be able to select whether to allow fragmentation or not on a scaling group basis

Testing Scenarios  

  • Create a Backend.AI cluster consisted of multiple mock accelerators
  • Try to create a session with accelerator forced to be fragmentized
  • Verify kernel creation being blocked when the new option is enabled

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions