Support for Nvidia CPU Time-Slicing #7003

Galadros · 2025-05-27T17:38:29Z

Galadros
May 27, 2025

Nvidia has introduced a feature called time-slicing on GPUs (see here and here). However, this feature doesn't natively support memory-isolation between replicas- Unlike Multi-Instance GPU (MIG), there is no memory or fault-isolation between replicas, but for some workloads this is better than not being able to share at all..

As far as I can tell, ONNX doesn't currently have support for safely managing GPU memory while working with GPU time-slicing; I've previously encountered errors that read CUDA failure 2: out of memory which were then followed by aberrant behavior. I believe this was caused by multiple instances of my service interfering with each other's stored memory. Is safely managing GPU memory while using GPU time-slicing something that folks have considered supporting for ONNX, or have I missed some existing support?

(See https://bruce-lee-ly.medium.com/nvidia-gpu-virtual-memory-management-7fdc4122226b for reference).

justinchuby · 2025-06-03T04:44:51Z

justinchuby
Jun 3, 2025
Maintainer

If you are referring to onnx runtime, you may post on https://github.com/microsoft/onnxruntime/issues (this project is for the ONNX standard itself)

1 reply

Galadros Jun 3, 2025
Author

Thanks for letting me know!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Nvidia CPU Time-Slicing #7003

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Support for Nvidia CPU Time-Slicing #7003

Uh oh!

Uh oh!

Galadros May 27, 2025

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

justinchuby Jun 3, 2025 Maintainer

Uh oh!

Galadros Jun 3, 2025 Author

Galadros
May 27, 2025

Replies: 1 comment 1 reply

justinchuby
Jun 3, 2025
Maintainer

Galadros Jun 3, 2025
Author