and using a smaller model, because I noticed in the NeMo code:
\n
\n NeMo/nemo/lightning/_strategy_lib.py\n
\n\n Line 92\n in\n dc08edd\n
\n\n | init_mpi_proc_group=getattr(parallel_config, \"tp_comm_overlap\", False) | \n
Thanks!
","upvoteCount":1,"answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"Hi, MPI is used by default to bootstrap the user buffers (see the TransformerEngine documentation here). However, NCCL bootstrap should also be supported now. You can try setting tp_comm_bootstrap_backend=\"nccl\"
here.
-
Hello,
and using a smaller model, because I noticed in the NeMo code: NeMo/nemo/lightning/_strategy_lib.py Line 92 in dc08edd I am not very familiar with training or HPC applications, but why does this feature require MPI, and cannot use NCCL? I read this blog post to try and understand what tensor parallel communication overlap is, but I can't figure out why I need MPI for it. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi, MPI is used by default to bootstrap the user buffers (see the TransformerEngine documentation here). However, NCCL bootstrap should also be supported now. You can try setting |
Beta Was this translation helpful? Give feedback.
Hi, MPI is used by default to bootstrap the user buffers (see the TransformerEngine documentation here). However, NCCL bootstrap should also be supported now. You can try setting
tp_comm_bootstrap_backend="nccl"
here.