Skip to content

When calling models where max_output_token is much smaller than max_prompt_token, the output may sometimes be truncated. During truncation, if the agent's output ends with a space, it might cause abnormal responses from the model side. #6790

Open
@bbbaka

Description

@bbbaka

What happened?

When a LLM's max_output_token is much smaller than max_prompt_token, during certain generation tasks such as coding and writing, the output may be truncated. Truncation might occur at spaces or line breaks, and in such cases, the original message will be added to the model_context. One solution to handle truncation is to add the output to the context as a prompt, allowing the LLM to continue generating. However, this approach may lead to the previously mentioned issues, for example:

openai.BadRequestError: Error code: 400 - {'success': False, 'message': '模型提供方错误', 'data': None, 'code': 'MPE-001', 'detailMessage': '{"error": {"type": "llm_call_failed", "message": "{\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\",\"message\":\"messages: final assistant content cannot end with trailing whitespace\"}}\n", "treace_id": "213e066e17521531374748281e7f22"}}, traceId: 213e066e17521531374748281e7f22'}

Would you consider the following solution to address this issue?

Image

Which packages was the bug in?

Python AgentChat (autogen-agentchat>=0.4.0)

AutoGen library version.

Python 0.6.4

Other library version.

No response

Model used

No response

Model provider

None

Other model provider

No response

Python version

None

.NET version

None

Operating system

None

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions