CUDAExecutionProvider and CPUExecutionProvider #6341

alanwilter · 2024-09-03T19:34:54Z

alanwilter
Sep 3, 2024

I migrated my model from TF to ONNX.
I have this bit of code:

    providers: list[tuple[str, dict]] = [("CPUExecutionProvider", {})]
    if provider >= 0:
        providers.append(("CUDAExecutionProvider", {"device_id": provider}))

If I run my test case it will use basically only the CPU and would take around 50s.

But I force to use only CUDAExecutionProvider as:

    providers: list[tuple[str, dict]] = [("CPUExecutionProvider", {})]
    if provider >= 0:
        providers = [("CUDAExecutionProvider", {"device_id": provider})]

I get these warnings:

2024-09-03 20:26:34.417215913 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-09-03 20:26:34.417281814 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

However it's much faster, like 5x faster, using the GPU, and takes only 10s.

So, what can I do here? Are there better options to tell runtime to use the GPU preferably? Or how to silence these warnings?

Answered by yuslepukhin

Sep 9, 2024

There are logging entries that should tell you which kernels were assigned to CPU. Check your logging level. If those are shape related, then you are fine. Otherwise, file an issue with ONNXRuntime (not onnx).

View full answer

yuslepukhin · 2024-09-09T17:01:01Z

yuslepukhin
Sep 9, 2024
Collaborator

There are logging entries that should tell you which kernels were assigned to CPU. Check your logging level. If those are shape related, then you are fine. Otherwise, file an issue with ONNXRuntime (not onnx).

0 replies

thiago4int · 2025-01-05T16:15:23Z

thiago4int
Jan 5, 2025

While logging assigned kernels is a step in the right direction, it doesn’t fully address the underlying issues with fallback mechanisms in ONNX Runtime. Specifically:

Silent fallbacks impact performance and debugging by reverting operations to the CPU EP without user notification.
Lack of user control prevents predictable and optimized behavior, especially in GPU-heavy workloads.

Suggested Solutions:

Disable Fallback by Default:
Fallbacks should be explicitly opt-in, ensuring that operations fail predictably when an execution provider (EP) is unable to support them. This avoids silent performance degradation.

Current Behavior:
Fallbacks are enabled by default, requiring explicit disabling via the disable_cpu_ep_fallback session option.

Proposed Code Example:

import onnxruntime as ort

# Create session options
session_options = ort.SessionOptions()

# Disable CPU fallback
session_options.disable_cpu_ep_fallback = True

# Initialize ONNX Runtime session with CUDAExecutionProvider
try:
    session = ort.InferenceSession(
        'model.onnx',
        sess_options=session_options,
        providers=['CUDAExecutionProvider']
    )
    print("ONNX Runtime session initialized successfully with CUDAExecutionProvider.")
except ort.RuntimeException as e:
    print(f"Session initialization failed: {e}")
    # Actionable logging can provide better insights here.

Proposed Default Behavior:
Set disable_cpu_ep_fallback = True as the default to ensure users are aware of execution provider limitations.

Verbose and Actionable Logging:
Logging assigned kernels is a start, but logs should include:

The operation that triggered the fallback.
The reason for the fallback (e.g., unsupported operation or insufficient resources).
Suggested resolutions for unsupported operations.

Sample Log Output:
[ONNX Runtime Error] Operation MatMul not supported on CUDAExecutionProvider. Fallback to CPUExecutionProvider is disabled.
Suggestion: Update the model or execution provider to support this operation.

When fallback is enabled:
[ONNX Runtime Warning] Operation MatMul not supported on CUDAExecutionProvider. Falling back to CPUExecutionProvider.
3. Enhanced Documentation:
The official ONNX Runtime documentation should:

Clearly describe default fallback behavior and how to disable it.
Provide examples of common fallback scenarios and their resolutions.
Include step-by-step guidance for configuring runtime behavior.

Logging assigned kernels is insufficient for addressing fallback issues comprehensively. Disabling fallback by default, improving logging transparency, and enhancing documentation will empower users to better manage performance and resource allocation. Let’s prioritize these changes to improve ONNX Runtime for production use cases.

Sincerely,
Thiago

3 replies

justinchuby Jan 7, 2025
Maintainer

cc @sophies927 could you help relay this message? Thanks!

sophies927 Jan 7, 2025

Adding @MaanavD (I no longer work on ORT).

MaanavD Jan 7, 2025

Following @justinchuby - pinged the EP team regarding this feedback :)
Ty for ping @sophies927

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDAExecutionProvider and CPUExecutionProvider #6341

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

CUDAExecutionProvider and CPUExecutionProvider #6341

Uh oh!

Uh oh!

alanwilter Sep 3, 2024

Replies: 2 comments · 3 replies

Uh oh!

yuslepukhin Sep 9, 2024 Collaborator

Uh oh!

thiago4int Jan 5, 2025

Uh oh!

justinchuby Jan 7, 2025 Maintainer

Uh oh!

sophies927 Jan 7, 2025

Uh oh!

MaanavD Jan 7, 2025

alanwilter
Sep 3, 2024

Replies: 2 comments 3 replies

yuslepukhin
Sep 9, 2024
Collaborator

thiago4int
Jan 5, 2025

justinchuby Jan 7, 2025
Maintainer