Handling large amount of data where all of it is not present at any moment #13815
Unanswered
paniabhisek
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
We want to train large amount of data. All data can not be downloaded at a time. So we are deciding to download in chunk by chunk.
But while running megatron_gpt_pretraining.py, I need to give the data files in config file. Does it read the data files sequentially and sends to training one by one? If I give the file names, at the latter part of the list, which are not present currently, will it work ?
Is there any better way to do it?
Beta Was this translation helpful? Give feedback.
All reactions