Need Guideance for NMT models Quantization and prunning #12798
Unanswered
syedhamza671
asked this question in
Q&A
Replies: 1 comment
-
Hi, thanks for the question. We provide general quantization and pruning support for LLMs (some resources: https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/quantization/quantization.html, https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/pruning/pruning.html, https://github.com/NVIDIA/NeMo/tree/main/tutorials/llm/llama/pruning-distillation). However, the focus of this work has been on encoder-only or decoder-only models. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I couldn't find any blog , tutorial or documentation for NeMo NMT model's quantization and prunning. If anyone can help it would mean a lot.
Beta Was this translation helpful? Give feedback.
All reactions