If I run my test case it will use basically only the CPU and would take around 50s.
\nBut I force to use only CUDAExecutionProvider
as:
providers: list[tuple[str, dict]] = [(\"CPUExecutionProvider\", {})]\n if provider >= 0:\n providers = [(\"CUDAExecutionProvider\", {\"device_id\": provider})]
I get these warnings:
\n2024-09-03 20:26:34.417215913 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.\n2024-09-03 20:26:34.417281814 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.\n
However it's much faster, like 5x faster, using the GPU, and takes only 10s.
\nSo, what can I do here? Are there better options to tell runtime to use the GPU preferably? Or how to silence these warnings?
","upvoteCount":2,"answerCount":2,"acceptedAnswer":{"@type":"Answer","text":"There are logging entries that should tell you which kernels were assigned to CPU. Check your logging level. If those are shape related, then you are fine. Otherwise, file an issue with ONNXRuntime (not onnx).
","upvoteCount":1,"url":"https://github.com/onnx/onnx/discussions/6341#discussioncomment-10593804"}}}-
I migrated my model from TF to ONNX. providers: list[tuple[str, dict]] = [("CPUExecutionProvider", {})]
if provider >= 0:
providers.append(("CUDAExecutionProvider", {"device_id": provider})) If I run my test case it will use basically only the CPU and would take around 50s. But I force to use only providers: list[tuple[str, dict]] = [("CPUExecutionProvider", {})]
if provider >= 0:
providers = [("CUDAExecutionProvider", {"device_id": provider})] I get these warnings:
However it's much faster, like 5x faster, using the GPU, and takes only 10s. So, what can I do here? Are there better options to tell runtime to use the GPU preferably? Or how to silence these warnings? |
Beta Was this translation helpful? Give feedback.
-
There are logging entries that should tell you which kernels were assigned to CPU. Check your logging level. If those are shape related, then you are fine. Otherwise, file an issue with ONNXRuntime (not onnx). |
Beta Was this translation helpful? Give feedback.
-
While logging assigned kernels is a step in the right direction, it doesn’t fully address the underlying issues with fallback mechanisms in ONNX Runtime. Specifically:
Suggested Solutions:
Current Behavior: Proposed Code Example: import onnxruntime as ort
# Create session options
session_options = ort.SessionOptions()
# Disable CPU fallback
session_options.disable_cpu_ep_fallback = True
# Initialize ONNX Runtime session with CUDAExecutionProvider
try:
session = ort.InferenceSession(
'model.onnx',
sess_options=session_options,
providers=['CUDAExecutionProvider']
)
print("ONNX Runtime session initialized successfully with CUDAExecutionProvider.")
except ort.RuntimeException as e:
print(f"Session initialization failed: {e}")
# Actionable logging can provide better insights here. Proposed Default Behavior:
Sample Log Output: When fallback is enabled:
Logging assigned kernels is insufficient for addressing fallback issues comprehensively. Disabling fallback by default, improving logging transparency, and enhancing documentation will empower users to better manage performance and resource allocation. Let’s prioritize these changes to improve ONNX Runtime for production use cases. Sincerely, |
Beta Was this translation helpful? Give feedback.
There are logging entries that should tell you which kernels were assigned to CPU. Check your logging level. If those are shape related, then you are fine. Otherwise, file an issue with ONNXRuntime (not onnx).