-
Notifications
You must be signed in to change notification settings - Fork 3k
Insights: NVIDIA/NeMo
Overview
Could not load contribution data
Please try again later
48 Pull requests merged by 20 people
-
Update AVLM finetune example for vanilla fine-tuning
#14232 merged
Jul 15, 2025 -
cherry-pick fix eval beam search ctc script
#14242 merged
Jul 15, 2025 -
Fix for MCore dist ckpt loading #14229
#14239 merged
Jul 15, 2025 -
fix eval_beamsearch_ngram_ctc script
#14238 merged
Jul 15, 2025 -
Fix decoding with ngpu-lm when training (#13994)
#13995 merged
Jul 15, 2025 -
[Performance script] FSDP-UBR related recipe update (#14208)
#14233 merged
Jul 15, 2025 -
Fix for MCore dist ckpt loading
#14229 merged
Jul 15, 2025 -
Cherry pick
[automodel] fix loss_mask pad token (14150)
intor2.4.0
#14227 merged
Jul 15, 2025 -
Cherry pick
Fix Llama Nemotron Nano Importer (14222)
intor2.4.0
#14226 merged
Jul 15, 2025 -
Cherry pick
Fix the forward when final_loss_mask is not present (14201)
intor2.4.0
#14225 merged
Jul 15, 2025 -
Cherry pick
Change RerankerSpecter Dataset question key (14200)
intor2.4.0
#14224 merged
Jul 15, 2025 -
[Performance script] FSDP-UBR related recipe update
#14208 merged
Jul 14, 2025 -
Fixing file path suffix
#14179 merged
Jul 14, 2025 -
[automodel] fix loss_mask pad token
#14150 merged
Jul 14, 2025 -
Fix Llama Nemotron Nano Importer
#14222 merged
Jul 14, 2025 -
Fix the forward when final_loss_mask is not present
#14201 merged
Jul 14, 2025 -
Change RerankerSpecter Dataset question key
#14200 merged
Jul 14, 2025 -
Cherry pick
perf-scripts: Change b200 config to EP8 (14207)
intor2.4.0
#14223 merged
Jul 14, 2025 -
perf-scripts: Change b200 config to EP8
#14207 merged
Jul 14, 2025 -
Remove unused DynamicRetrievalServer and Bert dataset loader classes
#14209 merged
Jul 14, 2025 -
Cherry pick
diffusion mock data null args (14173)
intor2.4.0
#14217 merged
Jul 14, 2025 -
Cherry pick
Remove g2p_en (14204)
intor2.4.0
#14212 merged
Jul 14, 2025 -
Cherry pick
Add option to disable gloo process groups
(#14156) intor2.4.0
#14220 merged
Jul 14, 2025 -
Cherry pick
Fix nemotronh flops calculator (14161)
intor2.4.0
#14202 merged
Jul 14, 2025 -
Cherry pick
405b perf script updates (14176)
intor2.4.0
#14195 merged
Jul 14, 2025 -
Cherry pick
Fix dsv3 script (14007)
intor2.4.0
#14182 merged
Jul 14, 2025 -
diffusion mock data null args
#14173 merged
Jul 13, 2025 -
GPU-accelerated Phrase-Boosting (GPU-PB) for AED decoding
#14108 merged
Jul 12, 2025 -
Remove g2p_en
#14204 merged
Jul 12, 2025 -
Fix nemotronh flops calculator
#14161 merged
Jul 11, 2025 -
Add fix for evo2 generate/inference
#14027 merged
Jul 11, 2025 -
Add option to disable gloo process groups
#14156 merged
Jul 11, 2025 -
Fix "Safely import optional python packages (#13936)"
#14198 merged
Jul 11, 2025 -
fix QA comments NVBug
#14196 merged
Jul 11, 2025 -
Fix importerror in transformer_lm_model after nlp module removals
#14199 merged
Jul 11, 2025 -
Revert "Safely import optional python packages (#13936)"
#14197 merged
Jul 11, 2025 -
405b perf script updates
#14176 merged
Jul 11, 2025 -
Set flux test as optional
#14190 merged
Jul 10, 2025 -
remove nmt collection
#14191 merged
Jul 10, 2025 -
Safely import optional python packages
#13936 merged
Jul 10, 2025 -
remove rag collection
#14157 merged
Jul 10, 2025 -
remove nlp modules
#14127 merged
Jul 10, 2025 -
Add option to suppress import checks in
Dockerfile.speech
#14185 merged
Jul 10, 2025 -
Fix dsv3 script
#14007 merged
Jul 10, 2025 -
Cherry pick
Add checkpoint info for NIM Embedding Expor Tutorial (14177)
intor2.4.0
#14178 merged
Jul 9, 2025 -
Add checkpoint info for NIM Embedding Expor Tutorial
#14177 merged
Jul 9, 2025 -
add mmap_bin_files param
#14122 merged
Jul 9, 2025 -
Fix FLUX test with correct env var
#14149 merged
Jul 9, 2025
27 Pull requests opened by 15 people
-
[magpietts] added an argument 'binarize_atten_prior' to trigger whether apply prior binarization.
#14166 opened
Jul 9, 2025 -
Temporarily Remove Encoder PP Support
#14167 opened
Jul 9, 2025 -
[finetune] Add dataset_kwargs to prepare packed sequence data
#14169 opened
Jul 9, 2025 -
TS-parakeet models PR Part 1: models, data, utils and example scripts
#14174 opened
Jul 9, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `4560888...` (2025-07-10)
#14180 opened
Jul 10, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `e11d285...` (2025-07-10)
#14181 opened
Jul 10, 2025 -
Fix ASR decoding issues with CUDA graphs in training
#14184 opened
Jul 10, 2025 -
Update AVLM
#14188 opened
Jul 10, 2025 -
remove language_modeling
#14192 opened
Jul 10, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `0999058...` (2025-07-11)
#14193 opened
Jul 11, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `3a2a972...` (2025-07-11)
#14194 opened
Jul 11, 2025 -
Support dump perf recipe diff from base recipe
#14206 opened
Jul 11, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `0999058...` (2025-07-12)
#14210 opened
Jul 12, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `3a2a972...` (2025-07-12)
#14211 opened
Jul 12, 2025 -
Allow exception in hf ckpt load attempt before fallback to standard l…
#14214 opened
Jul 12, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `ee082bf...` (2025-07-13)
#14215 opened
Jul 13, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `0999058...` (2025-07-13)
#14216 opened
Jul 13, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `0999058...` (2025-07-14)
#14218 opened
Jul 14, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `44fa0ea...` (2025-07-14)
#14219 opened
Jul 14, 2025 -
Improving formatting in README
#14230 opened
Jul 14, 2025 -
Change Llama Embedding Tutorial to use SFT by default
#14231 opened
Jul 14, 2025 -
Improve NEST GPU Utilization 3/N
#14234 opened
Jul 14, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `304a0eb...` (2025-07-15)
#14235 opened
Jul 15, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `250b794...` (2025-07-15)
#14236 opened
Jul 15, 2025 -
update ffmpeg install
#14237 opened
Jul 15, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `c351a47...` (2025-07-16)
#14245 opened
Jul 16, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `e6c510f...` (2025-07-16)
#14246 opened
Jul 16, 2025
11 Issues closed by 4 people
-
parakeet-tdt-0.6b-v2 Multi-threaded transcription crashes
#13988 closed
Jul 15, 2025 -
Incompatible with numpy>2.0
#12378 closed
Jul 14, 2025 -
performance degradation after migrating from NeMo 1 to 2
#13851 closed
Jul 14, 2025 -
Does NeMo support dynamic YaRN?
#13737 closed
Jul 13, 2025 -
Size mismatch when finetune Canary-180m in Vietnamese
#13657 closed
Jul 12, 2025 -
Megatron DDP strategy based MegatronLossReduction
#13630 closed
Jul 11, 2025 -
How to pretrain Qwen 2.5 7B with 128K context length?
#13730 closed
Jul 11, 2025 -
transcribe failed
#14107 closed
Jul 10, 2025 -
pruning-distillation guidance or docs for Multimodal model
#12975 closed
Jul 10, 2025 -
Is this WER reasonable on LibriSpeech (train-clean-100) using fast-conformer_ctc_bpe (Large) ?
#13707 closed
Jul 10, 2025 -
[Bug]Is there any bug in the implementation of the WarmupAnnealHoldPolicy?
#14171 closed
Jul 9, 2025
10 Issues opened by 9 people
-
How to use the pretraining recipe on CPU only, for debugging purpose?
#14244 opened
Jul 15, 2025 -
AutoResume does not work with when using optimizer_cpu_offload=True in OptimizerConfig
#14243 opened
Jul 15, 2025 -
Turn on relative position embedding in T5 config
#14189 opened
Jul 10, 2025 -
Failed to build casual-conv1d - 00_NeMo_Primer
#14187 opened
Jul 10, 2025 -
Bug: `max_steps` is missing in learning rate annealing schedulers, causing incorrect decay
#14186 opened
Jul 10, 2025 -
Long-form diarization possibility for Sortformer model.
#14183 opened
Jul 10, 2025 -
wrong typehint in ctc-word-spotting graph builder?
#14175 opened
Jul 9, 2025 -
Low accuracy using examples/llm/finetune/automodel.py
#14170 opened
Jul 9, 2025 -
Add dataset_kwargs in FineTuningDataModule
#14168 opened
Jul 9, 2025
55 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add support of Llama4 VLM PTQ
#13567 commented on
Jul 15, 2025 • 16 new comments -
autoConfigurator performance calculation and config name update
#14071 commented on
Jul 15, 2025 • 13 new comments -
Canary2 with NFA
#14121 commented on
Jul 9, 2025 • 2 new comments -
Load master weights from checkpoint
#14072 commented on
Jul 14, 2025 • 2 new comments -
S2S data to audio-input/text-output multimodal conversation converter
#14124 commented on
Jul 15, 2025 • 1 new comment -
Enable packed sequence in SFT with Chat dataset
#14116 commented on
Jul 12, 2025 • 1 new comment -
Add calculate_per_token_loss support for Context Parallel
#14065 commented on
Jul 16, 2025 • 0 new comments -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `6b70889...` (2025-06-29)
#14062 commented on
Jul 13, 2025 • 0 new comments -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `03f7793...` (2025-06-29)
#14061 commented on
Jul 13, 2025 • 0 new comments -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `03f7793...` (2025-06-28)
#14054 commented on
Jul 12, 2025 • 0 new comments -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `6b70889...` (2025-06-28)
#14053 commented on
Jul 12, 2025 • 0 new comments -
Streaming Sortformer Tutorial, postprocessing yaml files and tests
#14052 commented on
Jul 16, 2025 • 0 new comments -
Add timestamps option to streaming inference script
#14030 commented on
Jul 11, 2025 • 0 new comments -
Aaftabv row wise der
#14024 commented on
Jul 10, 2025 • 0 new comments -
Gaod/dpskv3/perf analysis
#14015 commented on
Jul 10, 2025 • 0 new comments -
Toggle skip
#13958 commented on
Jul 15, 2025 • 0 new comments -
Remove unnecessary padding
#13957 commented on
Jul 16, 2025 • 0 new comments -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `03f7793...` (2025-06-30)
#14066 commented on
Jul 14, 2025 • 0 new comments -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `6b70889...` (2025-06-30)
#14067 commented on
Jul 14, 2025 • 0 new comments -
Change to enable full iteration CUDA graph for LLMs
#14077 commented on
Jul 12, 2025 • 0 new comments -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `1d4a1f7...` (2025-07-01)
#14078 commented on
Jul 15, 2025 • 0 new comments -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `8a416d0...` (2025-07-01)
#14079 commented on
Jul 15, 2025 • 0 new comments -
Small fix for clustering diarizer
#14093 commented on
Jul 15, 2025 • 0 new comments -
Remove nemo1 multimodal and vision
#14095 commented on
Jul 15, 2025 • 0 new comments -
Nfa lhotse tar
#14096 commented on
Jul 15, 2025 • 0 new comments -
Add interface to use high priority stream for communicator groups
#14151 commented on
Jul 9, 2025 • 0 new comments -
Fix model training/eval state after PTL validation loop
#14152 commented on
Jul 11, 2025 • 0 new comments -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `4560888...` (2025-07-09)
#14164 commented on
Jul 9, 2025 • 0 new comments -
[Request] Benchmark tok/sec for Deepseek V3 pretrain recipe
#14158 commented on
Jul 9, 2025 • 0 new comments -
Loading .nemo model throws unhandled exception
#13858 commented on
Jul 11, 2025 • 0 new comments -
Empty output with speech_to_text_buffered_infer_rnnt.py
#13797 commented on
Jul 11, 2025 • 0 new comments -
export_ckpt failed due to AssertionError: dtype mismatch between source and target state dicts
#13455 commented on
Jul 11, 2025 • 0 new comments -
Q: Forced alignment with streaming
#13863 commented on
Jul 12, 2025 • 0 new comments -
DuplexS2SModel
#13902 commented on
Jul 13, 2025 • 0 new comments -
Adjusting "global_batch_size" and "micro_batch_size" has no impact on how long each training step takes when using HFAutoModel.
#13887 commented on
Jul 13, 2025 • 0 new comments -
TDT Head Stagnation in parakeet-tdt_ctc-110m Fine-tuning on Persian Data
#14140 commented on
Jul 13, 2025 • 0 new comments -
hugging face saved model inference
#13891 commented on
Jul 14, 2025 • 0 new comments -
How to pretrain from scratch the Qwen3 4B model but with MoE?
#13845 commented on
Jul 14, 2025 • 0 new comments -
How to pretrain from scratch the Qwen 3 4B or 7B dense model (with my own data)?
#13844 commented on
Jul 14, 2025 • 0 new comments -
eval_beamsearch_ngram_ctc.py won't run
#13064 commented on
Jul 16, 2025 • 0 new comments -
SDE with GPU acceleration
#8657 commented on
Jul 14, 2025 • 0 new comments -
Add safetensor option when saving and restoring models
#11549 commented on
Jul 13, 2025 • 0 new comments -
Feature/wsd scheduler
#12611 commented on
Jul 9, 2025 • 0 new comments -
Transducer with Transformer-Decoder (GPT-like)
#13030 commented on
Jul 10, 2025 • 0 new comments -
add new tokenizer system support
#13090 commented on
Jul 11, 2025 • 0 new comments -
Add Parakeet Hybrid RNNT CTC BPE Model with target language support
#13360 commented on
Jul 11, 2025 • 0 new comments -
Add CallbackGroup & Metadata factory function
#13437 commented on
Jul 14, 2025 • 0 new comments -
IPL callback and two scripts
#13671 commented on
Jul 10, 2025 • 0 new comments -
Bump vllm from 0.8.5.post1 to 0.9.0 in /requirements
#13758 commented on
Jul 13, 2025 • 0 new comments -
Llama Nemotron VL
#13819 commented on
Jul 13, 2025 • 0 new comments -
Experimental Magpie Decoder Only Model
#13833 commented on
Jul 16, 2025 • 0 new comments -
Sketch dist-ckpt content versioning
#13839 commented on
Jul 15, 2025 • 0 new comments -
feat: print expert groups on megatron init
#13874 commented on
Jul 10, 2025 • 0 new comments -
ASR models knowledge distillation example
#13939 commented on
Jul 16, 2025 • 0 new comments -
[Draft] Downsampled main transformer
#13952 commented on
Jul 14, 2025 • 0 new comments