- Más nuevo
- Más votos
- Más comentarios
Hi plamd, I have not figured out a solution yet, but am going to try Hugging Face Llama-2 model training in SageMaker. I am suspecting this is an issue with the llama-2 with jumpstart. see this: https://docs.aws.amazon.com/sagemaker/latest/dg/hugging-face.html https://docs.aws.amazon.com/sagemaker/latest/dg/hugging-face.html good luck!
@reza - in my case, this seems to happen when a validation testset is explicitly specified. When I omit the validation testset and just use a training test set, the training runs are passing (some portion of the training set will be used for validation - controlled via validation_split_ratio hyperparam). This is quite limiting (and the error is really misleading) but it's the only way I've been able to get this working.
@plamd thanks for the info, really useful. I need to be able to select my test set manually. I have submitted a formal case, will let you know if I learn something new.
Contenido relevante
- OFICIAL DE AWSActualizada hace un año
- OFICIAL DE AWSActualizada hace 2 años
Also getting this when I try to fine-tune a Llama2 chat model via the Sagemaker Jumpstart Studio UI (tried with the 7b and 70b chat variants). Here is the stacktrace I get:
For the 70b model, the training fails after ~38 minutes and it seems we do get billed for that time.
Any ideas whether this is wrong error reporting or a bug on sagemaker side?