How to convert a large model(e.g., with hundreds of billions of parameters) from Hugging Face to NeMo format ? #14102
Unanswered
gaojingwei
asked this question in
Q&A
Replies: 2 comments
-
Beta Was this translation helpful? Give feedback.
0 replies
-
To solve the memory issue, I tried increasing the swap by 512GB, but I received a new error:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
During the process of converting a model from Hugging Face to NeMo format, I encountered an issue: if the model is too large, it throws an "Cannot allocate memory" error. This does not happen with smaller models. My model has 600 billion parameters (approximately 1.2TB in size), and I'm running it on an A100 server with 2TB of RAM. The convert_state() completes without errors, but it fails during the save_nemo() step. The error message is shown in the screenshot below:
The error is likely caused by os.fork() when creating a subprocess, as it duplicates the parent process's address space, leading to an out-of-memory issue.
I tried setting asynchronous saving to False, but I still got the same error.
The NeMo Docker image version I'm using is 2504.
Beta Was this translation helpful? Give feedback.
All reactions