Replies: 1 comment 1 reply
-
If you are referring to onnx runtime, you may post on https://github.com/microsoft/onnxruntime/issues (this project is for the ONNX standard itself) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Nvidia has introduced a feature called time-slicing on GPUs (see here and here). However, this feature doesn't natively support memory-isolation between replicas-
Unlike Multi-Instance GPU (MIG), there is no memory or fault-isolation between replicas, but for some workloads this is better than not being able to share at all.
.As far as I can tell, ONNX doesn't currently have support for safely managing GPU memory while working with GPU time-slicing; I've previously encountered errors that read
CUDA failure 2: out of memory
which were then followed by aberrant behavior. I believe this was caused by multiple instances of my service interfering with each other's stored memory. Is safely managing GPU memory while using GPU time-slicing something that folks have considered supporting for ONNX, or have I missed some existing support?(See https://bruce-lee-ly.medium.com/nvidia-gpu-virtual-memory-management-7fdc4122226b for reference).
Beta Was this translation helpful? Give feedback.
All reactions