Skip to content
\n

and using a smaller model, because I noticed in the NeMo code:
\n

\n
\n

\n NeMo/nemo/lightning/_strategy_lib.py\n

\n

\n Line 92\n in\n dc08edd\n

\n
\n
\n \n\n \n \n \n \n
init_mpi_proc_group=getattr(parallel_config, \"tp_comm_overlap\", False)
\n
\n
\n
\nI am not very familiar with training or HPC applications, but why does this feature require MPI, and cannot use NCCL? I read this blog post to try and understand what tensor parallel communication overlap is, but I can't figure out why I need MPI for it.

\n

Thanks!

","upvoteCount":1,"answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"

Hi, MPI is used by default to bootstrap the user buffers (see the TransformerEngine documentation here). However, NCCL bootstrap should also be supported now. You can try setting tp_comm_bootstrap_backend=\"nccl\" here.

","upvoteCount":1,"url":"https://github.com/NVIDIA/NeMo/discussions/11849#discussioncomment-13016899"}}}

Why does Tensor-parallel Communication Overlap require MPI? #11849

Answered by ashors1
mjkpolo asked this question in Q&A
Discussion options

You must be logged in to vote

Hi, MPI is used by default to bootstrap the user buffers (see the TransformerEngine documentation here). However, NCCL bootstrap should also be supported now. You can try setting tp_comm_bootstrap_backend="nccl" here.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ashors1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants