Skip to content

[Bug] Qwen3Moe(ForCausalLM) does not respect UNSLOTH_RETURN_HIDDEN_STATES when loss and labels are given. #3000

@Killusions

Description

@Killusions

Qwen3Moe(ForCausalLM) does not respect UNSLOTH_RETURN_HIDDEN_STATES when loss and labels are given.

Newest versions of everything, python 3.11, cuda 12.6.

The reason can be found in the patched code:

...
elif self.loss_function.__name__.endswith("ForCausalLMLoss") and labels is not None:
...
        # ========= OLD non fused =========
        # logits = self.lm_head(hidden_states[:, slice_indices, :].to(lm_head_weight.device))
 else:
        logits = self.lm_head(hidden_states[:, slice_indices, :])
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions