1 Answer
- Newest
- Most votes
- Most comments
1
Hi Ayman,
Try increasing the number of training data or set max_seq_len hyper-parameter to be small (For example a value of 128) to see if the error keeps persisting.
The way that the computation works is that all text is processed, combined and then split into sample (each of length equal to max input length). Then, the examples are batched as per the batch size. If you are using 8 GPU machines, you need to have at least 8 non-empty batches. That is, you either need to have large enough data such that there are 8 batches or you need to decrease the batch size or you need to reduce the max input length.
answered 4 months ago
Relevant content
- asked 7 months ago
- asked 8 months ago
- Accepted Answerasked 4 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 8 months ago