[Help] [Metis] Voice Conversion Irreproducible

**[Help] [Metis] Voice Conversion Irreproducible**

## Problem Overview

The example code in `models/tts/metis/metis_infer_vc.py` is incorrect and cannot be run as-is. Specifically:

- It loads `ft.json` via `load_config`, which is unrelated to voice conversion.
- It attempts to load `metis_vc.safetensors`, which does not exist in the [HuggingFace repo](https://huggingface.co/amphion/Metis/tree/main/metis_vc). Only the following two files are available:
  - `metis_vc_lora_16.safetensors`
  - `metis_vc_lora_16_adapter.safetensors`

## Steps Taken

1. Referred to the example usage for TTS here: https://github.com/open-mmlab/Amphion/tree/main/models/tts/metis#2-example-usaage  
2. Modified the code to a voice conversion (VC) version, as follows:

    ```python
    device = "cuda:0"
    metis_cfg = load_config("./models/tts/metis/config/vc.json")

    base_ckpt_dir = snapshot_download(
        "amphion/metis",
        repo_type="model",
        local_dir="./models/tts/metis/ckpt",
        allow_patterns=["metis_base/model.safetensors"],
    )
    lora_ckpt_dir = snapshot_download(
        "amphion/metis",
        repo_type="model",
        local_dir="./models/tts/metis/ckpt",
        allow_patterns=["metis_vc/metis_vc_lora_16.safetensors"],
    )
    adapter_ckpt_dir = snapshot_download(
        "amphion/metis",
        repo_type="model",
        local_dir="./models/tts/metis/ckpt",
        allow_patterns=["metis_vc/metis_vc_lora_16_adapter.safetensors"],
    )

    base_ckpt_path = os.path.join(base_ckpt_dir, "metis_base/model.safetensors")
    lora_ckpt_path = os.path.join(lora_ckpt_dir, "metis_vc/metis_vc_lora_16.safetensors")
    adapter_ckpt_path = os.path.join(adapter_ckpt_dir, "metis_vc/metis_vc_lora_16_adapter.safetensors")

    metis = Metis(
        base_ckpt_path=base_ckpt_path,
        lora_ckpt_path=lora_ckpt_path,
        adapter_ckpt_path=adapter_ckpt_path,
        cfg=metis_cfg,
        device=device,
        model_type="vc",
    )

    prompt_speech_path = "./models/tts/metis/wav/vc/prompt.wav"
    source_speech_path = "./models/tts/metis/wav/vc/source.wav"

    n_timesteps = 20
    cfg = 1.0

    gen_speech = metis(
        prompt_speech_path=prompt_speech_path,
        source_speech_path=source_speech_path,
        cfg=cfg,
        n_timesteps=n_timesteps,
        model_type="vc",
    )

    sf.write("./models/tts/metis/wav/vc/gen.wav", gen_speech, 24000)
    ```

3. Used the example WAV files in `models/tts/metis/wav/vc/`.

## Expected Outcome

Expected to generate intelligible and high-quality converted speech, similar to the samples on the [demo page](https://metis-demo.github.io/#metis-vc).

## Actual Outcome

The generated audio is very low quality and does not contain any human voice &mdash; it's mostly noise. This makes the current VC pipeline irreproducible.

## Environment Information

- **Operating System:** Ubuntu 20.04.5 LTS  
- **Python Version:** 3.10.16
- **Driver & CUDA Version:** Driver 470.103.01 & CUDA 11.4  
- **Error Messages and Logs:** No runtime errors, but the model output is unusable.

## Additional Context

Please provide the correct inference code used to generate the demo samples at https://metis-demo.github.io/#metis-vc. It would be especially helpful if you could:

- Fix the example script at `metis_infer_vc.py`
- Clearly specify which checkpoint files are required
- Share the hyperparameters (`cfg`, `n_timesteps`, etc.) and audio preprocessing steps used in your demos

Thanks for your work. I am more than excited to use Metis VC once this is resolved.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Help] [Metis] Voice Conversion Irreproducible #437

Problem Overview

Steps Taken

Expected Outcome

Actual Outcome

Environment Information

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Help] [Metis] Voice Conversion Irreproducible #437

Description

Problem Overview

Steps Taken

Expected Outcome

Actual Outcome

Environment Information

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions