Skip to content

feat(BA-1416): make resource fragmentation configurable #4533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 29, 2025

Conversation

kyujin-cho
Copy link
Member

@kyujin-cho kyujin-cho commented May 28, 2025

resolves #4478 (BA-1416).
This PR introduces new parameter allow_fractional_resource_fragmentation, on session creation API's resource_opts input. If the value is not specified on creation API request, it will fall back to to default value as specified on the target scaling group's scheduler_opts.allow_fractional_resource_fragmentation value, which eventually defaults as true.
Setting this as the (ultimate) default true will continue to work as before - allowing fragmentation of AI resources defined as fractional (e.g. cuda.shares). Specifying this as false will block the fragmentation with FractionalResourceFragmented error (see screenshot below).
Screenshot 2025-05-28 at 5 22 49 PM
Testing this PR requires a Backend.AI cluster with two or more accelerators with fractional scaling enabled. If you do not have access to such hardware, using mock accelerator will be also sufficient.

Checklist: (if applicable)

  • Mention to the original issue

📚 Documentation preview 📚: https://sorna--4533.org.readthedocs.build/en/4533/


📚 Documentation preview 📚: https://sorna-ko--4533.org.readthedocs.build/ko/4533/

@github-actions github-actions bot added size:L 100~500 LoC comp:manager Related to Manager component comp:agent Related to Agent component comp:client Related to Client component comp:common Related to Common component labels May 28, 2025
@github-actions github-actions bot added the area:docs Documentations label May 28, 2025
@kyujin-cho kyujin-cho marked this pull request as ready for review May 28, 2025 09:08
@kyujin-cho kyujin-cho requested a review from HyeockJinKim May 28, 2025 09:09
@HyeockJinKim HyeockJinKim added this pull request to the merge queue May 29, 2025
Merged via the queue into main with commit adb0be5 May 29, 2025
29 checks passed
@HyeockJinKim HyeockJinKim deleted the feat/fragmentation-configurable branch May 29, 2025 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:docs Documentations comp:agent Related to Agent component comp:client Related to Client component comp:common Related to Common component comp:manager Related to Manager component size:L 100~500 LoC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make accelerator fragmentation option configurable
2 participants