Skip to content

feat: Asynchronous process_chat_payload in chat completion #13027

Open
@tth37

Description

@tth37

Check Existing Issues

Related: #13007

Problem Description

The /api/chat/completions endpoint supports two primary modes of operation:

  1. Synchronous (stream=False): Typically invoked via direct HTTP requests, this mode processes the entire request and returns the complete response in a single HTTP transaction.
  2. Asynchronous (stream=True): Primarily used by the frontend UI via WebSocket, this mode is expected to return immediately with a task_id. This task_id allows the frontend to receive status updates, stream the response incrementally via the WebSocket connection, and crucially, enables early stopping of the generation process initiated by the user.

While the asynchronous (stream=True) mode functions as expected for standard chat interactions (returning the task_id promptly), this expected behavior breaks when features requiring substantial pre-processing, such as Web Search or Tool Use, are enabled. Instead of returning immediately, it waits for the process_chat_payload phase (which includes potentially long-running operations like web searches or tool executions) to complete before returning the task_id.

Image

This synchronous behavior during the payload processing phase leads to two significant issues: (both reported in discussions)

  1. Delayed Early Stopping: The frontend does not receive the task_id until after web search/tool execution finishes. This prevents users from stopping the request during this initial, potentially lengthy (30-60s+), phase.
  2. Network Timeouts: The extended wait time for the endpoint to respond increases the risk of network errors, such as gateway timeouts or client-side request timeouts, degrading the user experience.

Cause Analysis

The chat completion process can be broadly divided into two phases:

  1. process_chat_payload: Handles request preprocessing, including web searches, tool calls, and injecting results into the context for the language model.
  2. process_chat_response: Handles the actual generation of the AI response by LLM and streams results back via WebSocket.

Currently, process_chat_response is correctly handled asynchronously using create_task, as seen:

# Handle as a background task
async def post_response_handler(response, events):

However the process_chat_payload remains a synchronous function, user have to wait until process_chat_payload finishes and then they can receive the background task_id. Things get worse when web search feature is enabled as it might take up to 30s-60s, in this period user cannot early stop the request, and facing the risk of connection timeout.

Desired Solution you'd like

For asynchronous api calls, refactor the chat_completion handler in main.py to make the entire processing pipeline (both payload processing and response generation) asynchronous from the start. This can be achieved by wrapping all time-consuming logic within a single background task created immediately upon receiving the request. test_async_chat_completion

async def all_time_consuming_jobs(request, form_data, user, metadata, model):
    form_data, metadata, events = await process_chat_payload(
        request, form_data, user, metadata, model
    )
    response = await chat_completion_handler(request, form_data, user)
    await process_chat_response( # don't create_task inside `process_chat_response`
        request, response, form_data, user, metadata, model, events, tasks
    )

task_id, _ = create_task(
    all_time_consuming_jobs(request, form_data, user, metadata, model),
    id=metadata["chat_id"],
)

Further Considerations

This simple patch is technically working, however there might still lots of work to be done:

  • Identifying Synchronous/Asynchronous Requests in main.py
  • Error handling: Correct and robust error handling during the two phases
  • Early Stopping Behavior: The frontend logic of early stopping when web search has not finished
  • etc.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions